## Sandwiching smooth convex functions

If a function $f(x)$ has a Lipschitz gradient, i.e. for any $x$ and $y$

$\displaystyle \|\nabla f(y) - \nabla f(x)\|_2 \le \beta \|y-x\|_2,$

then

$\displaystyle |f(y) - f(x) - \left\langle\nabla f(x), y-x \right\rangle | \le \frac{\beta}{2}\|y-x\|_2^2.$

If  $\nabla f$  is differentiable, this follows from the representation of the gradient via the Hessian and the fact that all eigenvalues of the Hessian are upper-bounded by  $\beta.$  Otherwise express  $f(x) - f(y)$  as the integral over  $t,$  $\xi = (1-t)x + ty,$  put the gradient under the integral sign, and use the Cauchy-Schwarz inequality.

Note that if additionally  $f$  is convex, then we have that the error of its first order approximation is bounded both from above and below:

$\displaystyle 0 \le f(y) - f(x) - \left\langle\nabla f(x), y-x \right\rangle \le \frac{\beta}{2}\|y-x\|_2^2\,\,\,\,\,\,\,\,\,\,\,\, (*)$

As shown below, these bounds might be taken as an equivalent definition of  $\beta$-smooth convex function, i.e. if a smooth function satisfies these inequalities, then it is convex and  $\beta$-smooth. In fact, the trivial lower bound in  $(*)$  may always be sharpened.

Lemma (Sandwiching a smooth convex  function)

Function  $f$  is convex and  $\beta$-smooth on its domain (i.e. differentiable with  $\beta$-Lipschitz gradient), if, and only if,  $(*)$  holds. Moreover,  $(*)$  is in fact equivalent to

$\displaystyle \frac{1}{2\beta} \left\|\nabla f(y) - \nabla f(x) \right\|_2^2 \le f(y) - f(x) - \left\langle\nabla f(x), y-x \right\rangle \le \frac{\beta}{2}\left\| y - x \right\|_2^2\,\,\,\,\,\,\,\,\,\,\,\, (**)$

for any  $x,y \in \text{Dom}(f).$

Proof.  We must prove the following equivalence for a smooth function $f:$

$\displaystyle (**) \Leftrightarrow (*) \Leftrightarrow f\,\,\text{is convex and\,\,} \beta\text{-smooth}.$

We already have  $f\,\,\text{is convex and\,\,} \beta\text{-smooth} \Rightarrow (*)$  and  $(**) \Rightarrow f\,\,\text{is convex and\,\,} \beta\text{-smooth}$ (convexity follows from the lower bound of  $(*)$  which is weaker than that of  $(**)$).  We are now to show  $\displaystyle (*) \Rightarrow (**).$  Indeed, suppose  $(*)$  holds, and pick  $\displaystyle z = y - \frac{1}{\beta} \left(\nabla f(y) - \nabla f(x) \right),$  so that

$\displaystyle f(y) - f(x) \\ = f(z) - f(x) - \big[f(z) - f(y)\big] \\ \ge \left\langle \nabla f(x), z-x \right\rangle - \big[\left\langle \nabla f(y), z-y \right\rangle + \frac{\beta}{2} \left\|z-y\right\|_2^2\big] \\ = \left\langle \nabla f(x), y-x \right\rangle + \left\langle \nabla f(x) - \nabla f(y), z-y \right\rangle - \frac{\beta}{2} \left\|z-y\right\|_2^2 \\ =\left\langle \nabla f(x), y-x \right\rangle + \frac{1}{2\beta} \left\|\nabla f(x) - \nabla f(y)\right\|_2^2,$

and hence  $\displaystyle (*) \Rightarrow (**).$  ♦

The lower bound from the lemma which we have just proved has a useful consequence for the gradient of a smooth function.

For a $\beta$-smooth convex function, one has

$\displaystyle \left\langle \nabla f(x) - \nabla f(y), x-y \right\rangle \ge \frac{1}{\beta}\left\| \nabla f(x) - \nabla f(y)\right\|^2_2$

Proof.  Sum the lower bounds for the sandwiching lemma for the points  $x, y$  and $y,x.$  ♦

A [non necessarily smooth] function is called  $\alpha$strongly convex if the trivial convexity lower bound is improved by the quadratic term:

$\displaystyle f(y) - f(x) \ge \left\langle g(x), y-x \right\rangle + \frac{\alpha}{2}\|y-x\|_2^2,$

where $g(x)$  is any subgradient of  $f(x).$  It is easy to see that  $f(x)$  is  $\alpha$-strongly convex i.f.f.  $f(x) - \frac{\alpha}{2}\|x\|_2^2$  is convex. Another useful elementary property is that strongly convex functions admit unique minimizers.

As a result, if the function is  $\beta$-smooth and  $\alpha$-strongly convex, the function is sandwiched between 2 quadratics:

Sandwiching a smooth and strongly convex  function:

$\displaystyle \frac{\alpha}{2}\left\| y - x \right\|_2^2 \le f(y) - f(x) - \left\langle\nabla f(x), y-x \right\rangle \le \frac{\beta}{2}\left\| y - x \right\|_2^2$

Another consequence is the improved coercivity of the gradient, proved by considering $f(x) - \frac{\alpha}{2}\|x\|^2_2$  which is convex and  $(\beta-\alpha)$-smooth:

Lemma (Coercivity of the gradient in the strongly convex case):

$\displaystyle \left\langle \nabla f(x) - \nabla f(y), x-y \right\rangle \ge \frac{\alpha \beta}{\alpha + \beta} \|x-y\|_2^2 + \frac{1}{\alpha + \beta} \left\|\nabla f(x) - \nabla f(y)\right\|_2^2$

PhD student with interests in learning theory, statistics, and optimization, working with Anatoli Juditsky and Zaid Harchaoui.
This entry was posted in Convex Optimization, Course on Convex Optimization, Memos. Bookmark the permalink.