## Sandwiching smooth convex functions

If a function $f(x)$ has a Lipschitz gradient, i.e. for any $x$ and $y$

$\displaystyle \|\nabla f(y) - \nabla f(x)\|_2 \le \beta \|y-x\|_2,$

then

$\displaystyle |f(y) - f(x) - \left\langle\nabla f(x), y-x \right\rangle | \le \frac{\beta}{2}\|y-x\|_2^2.$

If  $\nabla f$  is differentiable, this follows from the representation of the gradient via the Hessian and the fact that all eigenvalues of the Hessian are upper-bounded by  $\beta.$  Otherwise express  $f(x) - f(y)$  as the integral over  $t,$  $\xi = (1-t)x + ty,$  put the gradient under the integral sign, and use the Cauchy-Schwarz inequality.

Note that if additionally  $f$  is convex, then we have that the error of its first order approximation is bounded both from above and below:

$\displaystyle 0 \le f(y) - f(x) - \left\langle\nabla f(x), y-x \right\rangle \le \frac{\beta}{2}\|y-x\|_2^2\,\,\,\,\,\,\,\,\,\,\,\, (*)$

As shown below, these bounds might be taken as an equivalent definition of  $\beta$-smooth convex function, i.e. if a smooth function satisfies these inequalities, then it is convex and  $\beta$-smooth. In fact, the trivial lower bound in  $(*)$  may always be sharpened.

Lemma (Sandwiching a smooth convex  function)

Function  $f$  is convex and  $\beta$-smooth on its domain (i.e. differentiable with  $\beta$-Lipschitz gradient), if, and only if,  $(*)$  holds. Moreover,  $(*)$  is in fact equivalent to

$\displaystyle \frac{1}{2\beta} \left\|\nabla f(y) - \nabla f(x) \right\|_2^2 \le f(y) - f(x) - \left\langle\nabla f(x), y-x \right\rangle \le \frac{\beta}{2}\left\| y - x \right\|_2^2\,\,\,\,\,\,\,\,\,\,\,\, (**)$

for any  $x,y \in \text{Dom}(f).$

Proof.  We must prove the following equivalence for a smooth function $f:$

$\displaystyle (**) \Leftrightarrow (*) \Leftrightarrow f\,\,\text{is convex and\,\,} \beta\text{-smooth}.$

We already have  $f\,\,\text{is convex and\,\,} \beta\text{-smooth} \Rightarrow (*)$  and  $(**) \Rightarrow f\,\,\text{is convex and\,\,} \beta\text{-smooth}$ (convexity follows from the lower bound of  $(*)$  which is weaker than that of  $(**)$).  We are now to show  $\displaystyle (*) \Rightarrow (**).$  Indeed, suppose  $(*)$  holds, and pick  $\displaystyle z = y - \frac{1}{\beta} \left(\nabla f(y) - \nabla f(x) \right),$  so that

$\displaystyle f(y) - f(x) \\ = f(z) - f(x) - \big[f(z) - f(y)\big] \\ \ge \left\langle \nabla f(x), z-x \right\rangle - \big[\left\langle \nabla f(y), z-y \right\rangle + \frac{\beta}{2} \left\|z-y\right\|_2^2\big] \\ = \left\langle \nabla f(x), y-x \right\rangle + \left\langle \nabla f(x) - \nabla f(y), z-y \right\rangle - \frac{\beta}{2} \left\|z-y\right\|_2^2 \\ =\left\langle \nabla f(x), y-x \right\rangle + \frac{1}{2\beta} \left\|\nabla f(x) - \nabla f(y)\right\|_2^2,$

and hence  $\displaystyle (*) \Rightarrow (**).$  ♦

The lower bound from the lemma which we have just proved has a useful consequence for the gradient of a smooth function.

For a $\beta$-smooth convex function, one has

$\displaystyle \left\langle \nabla f(x) - \nabla f(y), x-y \right\rangle \ge \frac{1}{\beta}\left\| \nabla f(x) - \nabla f(y)\right\|^2_2$

Proof.  Sum the lower bounds for the sandwiching lemma for the points  $x, y$  and $y,x.$  ♦

A [non necessarily smooth] function is called  $\alpha$strongly convex if the trivial convexity lower bound is improved by the quadratic term:

$\displaystyle f(y) - f(x) \ge \left\langle g(x), y-x \right\rangle + \frac{\alpha}{2}\|y-x\|_2^2,$

where $g(x)$  is any subgradient of  $f(x).$  It is easy to see that  $f(x)$  is  $\alpha$-strongly convex i.f.f.  $f(x) - \frac{\alpha}{2}\|x\|_2^2$  is convex. Another useful elementary property is that strongly convex functions admit unique minimizers.

As a result, if the function is  $\beta$-smooth and  $\alpha$-strongly convex, the function is sandwiched between 2 quadratics:

Sandwiching a smooth and strongly convex  function:

$\displaystyle \frac{\alpha}{2}\left\| y - x \right\|_2^2 \le f(y) - f(x) - \left\langle\nabla f(x), y-x \right\rangle \le \frac{\beta}{2}\left\| y - x \right\|_2^2$

Another consequence is the improved coercivity of the gradient, proved by considering $f(x) - \frac{\alpha}{2}\|x\|^2_2$  which is convex and  $(\beta-\alpha)$-smooth:

Lemma (Coercivity of the gradient in the strongly convex case):

$\displaystyle \left\langle \nabla f(x) - \nabla f(y), x-y \right\rangle \ge \frac{\alpha \beta}{\alpha + \beta} \|x-y\|_2^2 + \frac{1}{\alpha + \beta} \left\|\nabla f(x) - \nabla f(y)\right\|_2^2$