## Sandwiching smooth convex functions

If a function $f(x)$ has a Lipschitz gradient, i.e. for any $x$ and $y$ $\displaystyle \|\nabla f(y) - \nabla f(x)\|_2 \le \beta \|y-x\|_2,$

then $\displaystyle |f(y) - f(x) - \left\langle\nabla f(x), y-x \right\rangle | \le \frac{\beta}{2}\|y-x\|_2^2.$

If $\nabla f$  is differentiable, this follows from the representation of the gradient via the Hessian and the fact that all eigenvalues of the Hessian are upper-bounded by $\beta.$  Otherwise express $f(x) - f(y)$  as the integral over $t,$ $\xi = (1-t)x + ty,$  put the gradient under the integral sign, and use the Cauchy-Schwarz inequality.

Note that if additionally $f$  is convex, then we have that the error of its first order approximation is bounded both from above and below: $\displaystyle 0 \le f(y) - f(x) - \left\langle\nabla f(x), y-x \right\rangle \le \frac{\beta}{2}\|y-x\|_2^2\,\,\,\,\,\,\,\,\,\,\,\, (*)$

As shown below, these bounds might be taken as an equivalent definition of $\beta$-smooth convex function, i.e. if a smooth function satisfies these inequalities, then it is convex and $\beta$-smooth. In fact, the trivial lower bound in $(*)$  may always be sharpened.

Lemma (Sandwiching a smooth convex  function)

Function $f$  is convex and $\beta$-smooth on its domain (i.e. differentiable with $\beta$-Lipschitz gradient), if, and only if, $(*)$  holds. Moreover, $(*)$  is in fact equivalent to $\displaystyle \frac{1}{2\beta} \left\|\nabla f(y) - \nabla f(x) \right\|_2^2 \le f(y) - f(x) - \left\langle\nabla f(x), y-x \right\rangle \le \frac{\beta}{2}\left\| y - x \right\|_2^2\,\,\,\,\,\,\,\,\,\,\,\, (**)$

for any $x,y \in \text{Dom}(f).$

Proof.  We must prove the following equivalence for a smooth function $f:$ $\displaystyle (**) \Leftrightarrow (*) \Leftrightarrow f\,\,\text{is convex and\,\,} \beta\text{-smooth}.$

We already have $f\,\,\text{is convex and\,\,} \beta\text{-smooth} \Rightarrow (*)$  and $(**) \Rightarrow f\,\,\text{is convex and\,\,} \beta\text{-smooth}$ (convexity follows from the lower bound of $(*)$  which is weaker than that of $(**)$).  We are now to show $\displaystyle (*) \Rightarrow (**).$  Indeed, suppose $(*)$  holds, and pick $\displaystyle z = y - \frac{1}{\beta} \left(\nabla f(y) - \nabla f(x) \right),$  so that $\displaystyle f(y) - f(x) \\ = f(z) - f(x) - \big[f(z) - f(y)\big] \\ \ge \left\langle \nabla f(x), z-x \right\rangle - \big[\left\langle \nabla f(y), z-y \right\rangle + \frac{\beta}{2} \left\|z-y\right\|_2^2\big] \\ = \left\langle \nabla f(x), y-x \right\rangle + \left\langle \nabla f(x) - \nabla f(y), z-y \right\rangle - \frac{\beta}{2} \left\|z-y\right\|_2^2 \\ =\left\langle \nabla f(x), y-x \right\rangle + \frac{1}{2\beta} \left\|\nabla f(x) - \nabla f(y)\right\|_2^2,$

and hence $\displaystyle (*) \Rightarrow (**).$  ♦

The lower bound from the lemma which we have just proved has a useful consequence for the gradient of a smooth function.

Lemma (Coercivity of the gradient)

For a $\beta$-smooth convex function, one has $\displaystyle \left\langle \nabla f(x) - \nabla f(y), x-y \right\rangle \ge \frac{1}{\beta}\left\| \nabla f(x) - \nabla f(y)\right\|^2_2$

Proof.  Sum the lower bounds for the sandwiching lemma for the points $x, y$  and $y,x.$  ♦

A [non necessarily smooth] function is called $\alpha$strongly convex if the trivial convexity lower bound is improved by the quadratic term: $\displaystyle f(y) - f(x) \ge \left\langle g(x), y-x \right\rangle + \frac{\alpha}{2}\|y-x\|_2^2,$

where $g(x)$  is any subgradient of $f(x).$  It is easy to see that $f(x)$  is $\alpha$-strongly convex i.f.f. $f(x) - \frac{\alpha}{2}\|x\|_2^2$  is convex. Another useful elementary property is that strongly convex functions admit unique minimizers.

As a result, if the function is $\beta$-smooth and $\alpha$-strongly convex, the function is sandwiched between 2 quadratics:

Sandwiching a smooth and strongly convex  function: $\displaystyle \frac{\alpha}{2}\left\| y - x \right\|_2^2 \le f(y) - f(x) - \left\langle\nabla f(x), y-x \right\rangle \le \frac{\beta}{2}\left\| y - x \right\|_2^2$

Another consequence is the improved coercivity of the gradient, proved by considering $f(x) - \frac{\alpha}{2}\|x\|^2_2$  which is convex and $(\beta-\alpha)$-smooth:

Lemma (Coercivity of the gradient in the strongly convex case): $\displaystyle \left\langle \nabla f(x) - \nabla f(y), x-y \right\rangle \ge \frac{\alpha \beta}{\alpha + \beta} \|x-y\|_2^2 + \frac{1}{\alpha + \beta} \left\|\nabla f(x) - \nabla f(y)\right\|_2^2$ ## About Dmitry Ostrovsky

PhD student with interests in statistics, optimization, and machine learning.
This entry was posted in Convex Optimization, Course on Convex Optimization, Memos. Bookmark the permalink.