Projected Gradient Descent III — Strong convexity

Strong convexity provides an improved lower bound for a convex function; therefore are in our right to expect the improvement of the rate of convergence of PGD. In this post we will find out how much we actually gain.

First let us analyze the non-smooth (Lipschitz) case.


Theorem 1

Let  f  be  \alpha-strongly convex and  L-Lipschitz  on  \mathcal{X}.  PGD with  \eta_s = \frac{2}{\alpha (s+1)} attains

\displaystyle f\left(\sum_{s=1}^t \frac{2s}{t(t+1)} x_s\right) - f(x^*) \le \frac{2L^2}{\alpha(t+1)}


Proof.  Beginning with the same analysis as in the non-strongly convex case, we get

\displaystyle f(x_s) - f(x^*) \le \frac{\eta_s}{2}L^2 + \left(\frac{1}{2\eta_s} - \frac{\alpha}{2}\right) \|x_s - x^*\|_2^2 - \frac{1}{2\eta_s} \|x_{s+1} - x^*\|_2^2.

Multiplying by s gives

\displaystyle s\left(f(x_s) - f(x^*)\right) \le \frac{L^2}{\alpha} + \frac{\alpha}{4} \Big(s(s-1) \|x_s - x^*\|_2^2 - s(s+1)\|x_{s+1} - x^*\|_2^2 \Big).

Summing over  s = \overline{1, t}  and using convexity, we obtain the result.  ♦

We now turn ourselves to the smooth case, that is the function now is  \beta-smooth and  \alpha-strongly convex.  Remember that previously the “three point identity” proved to be useful. For the strongly convex case it has an improvement:


Lemma (Three point identity in the strongly convex case)

For  \beta-smooth and  \alpha-strongly convex function,

\displaystyle f(x^+) - f(y) \le \left\langle g_{\mathcal{X}}(x), x-y \right\rangle - \frac{1}{2\beta} \left\| g_{\mathcal{X}}(x)\right\|_2^2 - \frac{\alpha}{2} \|x-y\|_2^2


The proof of this lemma in fact repeats that for the usual three-point identity, and we omit it. We now give the rate of PGD for strongly convex smooth functions in terms of condition number  \displaystyle Q = \beta/\alpha.  Surprisingly, the analysis is simpler than for the non-strongly convex case.


Theorem

Let  f  be  \beta-smooth and  \alpha-strongly convex on  \mathcal{X}.  PGD with  \eta = \frac{1}{\beta}  attains the rate

\displaystyle f(x_s) - f(x^*) \le \beta R^2 \exp \left(-\frac{t}{Q}\right)


Proof.  First,  f(x_s) - f(x^*) \le \beta \|x_s-x^*\|_2^2  by  smoothness. Furthermore, it turns out that there is a linear rate of localization in the argument in the strongly convex case (note that the solution is unique):

\displaystyle \|x_{t+1}-x^*\|_2^2 \\ \le \|x_{t}-x^*\|_2^2 - \frac{2}{\beta} \left\langle g_{\mathcal{X}}(x_t), x_t - x^*\right\rangle + \frac{1}{\beta} \left\|g_{\mathcal{X}}(x_t)\right\|_2^2 \\ \le \left(1 - \frac{\alpha}{\beta}\right) \|x_{t}-x^*\|_2^2 \le\left(1 - \frac{\alpha}{\beta}\right)^t \|x_{1}-x^*\|_2^2 \le \exp\left(-\frac{t}{Q}\right)\|x_{1}-x^*\|_2^2,

and we obtain the claim.  ♦

Advertisements

About Dmitry Ostrovsky

PhD student with interests in learning theory, statistics, and optimization, working with Anatoli Juditsky and Zaid Harchaoui.
This entry was posted in Course on Convex Optimization. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s