## Projected Gradient Descent III — Strong convexity

Strong convexity provides an improved lower bound for a convex function; therefore are in our right to expect the improvement of the rate of convergence of PGD. In this post we will find out how much we actually gain.

First let us analyze the non-smooth (Lipschitz) case.

Theorem 1

Let  $f$  be  $\alpha$-strongly convex and  $L$-Lipschitz  on  $\mathcal{X}.$  PGD with  $\eta_s = \frac{2}{\alpha (s+1)}$ attains

$\displaystyle f\left(\sum_{s=1}^t \frac{2s}{t(t+1)} x_s\right) - f(x^*) \le \frac{2L^2}{\alpha(t+1)}$

Proof.  Beginning with the same analysis as in the non-strongly convex case, we get

$\displaystyle f(x_s) - f(x^*) \le \frac{\eta_s}{2}L^2 + \left(\frac{1}{2\eta_s} - \frac{\alpha}{2}\right) \|x_s - x^*\|_2^2 - \frac{1}{2\eta_s} \|x_{s+1} - x^*\|_2^2.$

Multiplying by $s$ gives

$\displaystyle s\left(f(x_s) - f(x^*)\right) \le \frac{L^2}{\alpha} + \frac{\alpha}{4} \Big(s(s-1) \|x_s - x^*\|_2^2 - s(s+1)\|x_{s+1} - x^*\|_2^2 \Big).$

Summing over  $s = \overline{1, t}$  and using convexity, we obtain the result.  ♦

We now turn ourselves to the smooth case, that is the function now is  $\beta$-smooth and  $\alpha$-strongly convex.  Remember that previously the “three point identity” proved to be useful. For the strongly convex case it has an improvement:

Lemma (Three point identity in the strongly convex case)

For  $\beta$-smooth and  $\alpha$-strongly convex function,

$\displaystyle f(x^+) - f(y) \le \left\langle g_{\mathcal{X}}(x), x-y \right\rangle - \frac{1}{2\beta} \left\| g_{\mathcal{X}}(x)\right\|_2^2 - \frac{\alpha}{2} \|x-y\|_2^2$

The proof of this lemma in fact repeats that for the usual three-point identity, and we omit it. We now give the rate of PGD for strongly convex smooth functions in terms of condition number  $\displaystyle Q = \beta/\alpha.$  Surprisingly, the analysis is simpler than for the non-strongly convex case.

Theorem

Let  $f$  be  $\beta$-smooth and  $\alpha$-strongly convex on  $\mathcal{X}.$  PGD with  $\eta = \frac{1}{\beta}$  attains the rate

$\displaystyle f(x_s) - f(x^*) \le \beta R^2 \exp \left(-\frac{t}{Q}\right)$

Proof.  First,  $f(x_s) - f(x^*) \le \beta \|x_s-x^*\|_2^2$  by  smoothness. Furthermore, it turns out that there is a linear rate of localization in the argument in the strongly convex case (note that the solution is unique):

$\displaystyle \|x_{t+1}-x^*\|_2^2 \\ \le \|x_{t}-x^*\|_2^2 - \frac{2}{\beta} \left\langle g_{\mathcal{X}}(x_t), x_t - x^*\right\rangle + \frac{1}{\beta} \left\|g_{\mathcal{X}}(x_t)\right\|_2^2 \\ \le \left(1 - \frac{\alpha}{\beta}\right) \|x_{t}-x^*\|_2^2 \le\left(1 - \frac{\alpha}{\beta}\right)^t \|x_{1}-x^*\|_2^2 \le \exp\left(-\frac{t}{Q}\right)\|x_{1}-x^*\|_2^2,$

and we obtain the claim.  ♦