Sandwiching smooth convex functions

If a function f(x) has a Lipschitz gradient, i.e. for any x and y

\displaystyle \|\nabla f(y) - \nabla f(x)\|_2 \le \beta \|y-x\|_2,


\displaystyle |f(y) - f(x) - \left\langle\nabla f(x), y-x \right\rangle | \le \frac{\beta}{2}\|y-x\|_2^2.

If  \nabla f  is differentiable, this follows from the representation of the gradient via the Hessian and the fact that all eigenvalues of the Hessian are upper-bounded by  \beta.  Otherwise express  f(x) - f(y)  as the integral over  t,  \xi = (1-t)x + ty,  put the gradient under the integral sign, and use the Cauchy-Schwarz inequality.

Note that if additionally  f  is convex, then we have that the error of its first order approximation is bounded both from above and below:

\displaystyle 0 \le f(y) - f(x) - \left\langle\nabla f(x), y-x \right\rangle \le \frac{\beta}{2}\|y-x\|_2^2\,\,\,\,\,\,\,\,\,\,\,\, (*)

As shown below, these bounds might be taken as an equivalent definition of  \beta-smooth convex function, i.e. if a smooth function satisfies these inequalities, then it is convex and  \beta-smooth. In fact, the trivial lower bound in  (*)  may always be sharpened.

Lemma (Sandwiching a smooth convex  function)

Function  f  is convex and  \beta-smooth on its domain (i.e. differentiable with  \beta-Lipschitz gradient), if, and only if,  (*)  holds. Moreover,  (*)  is in fact equivalent to

\displaystyle \frac{1}{2\beta} \left\|\nabla f(y) - \nabla f(x) \right\|_2^2 \le f(y) - f(x) - \left\langle\nabla f(x), y-x \right\rangle \le \frac{\beta}{2}\left\| y - x \right\|_2^2\,\,\,\,\,\,\,\,\,\,\,\, (**)

for any  x,y \in \text{Dom}(f).

 Proof.  We must prove the following equivalence for a smooth function f:

\displaystyle (**) \Leftrightarrow (*) \Leftrightarrow f\,\,\text{is convex and\,\,} \beta\text{-smooth}.

We already have  f\,\,\text{is convex and\,\,} \beta\text{-smooth} \Rightarrow (*)  and  (**) \Rightarrow f\,\,\text{is convex and\,\,} \beta\text{-smooth} (convexity follows from the lower bound of  (*)  which is weaker than that of  (**)).  We are now to show  \displaystyle (*) \Rightarrow (**).  Indeed, suppose  (*)  holds, and pick  \displaystyle z = y - \frac{1}{\beta} \left(\nabla f(y) - \nabla f(x) \right),  so that

\displaystyle f(y) - f(x) \\ = f(z) - f(x) - \big[f(z) - f(y)\big] \\ \ge \left\langle \nabla f(x), z-x \right\rangle - \big[\left\langle \nabla f(y), z-y \right\rangle + \frac{\beta}{2} \left\|z-y\right\|_2^2\big] \\ = \left\langle \nabla f(x), y-x \right\rangle + \left\langle \nabla f(x) - \nabla f(y), z-y \right\rangle - \frac{\beta}{2} \left\|z-y\right\|_2^2 \\ =\left\langle \nabla f(x), y-x \right\rangle + \frac{1}{2\beta} \left\|\nabla f(x) - \nabla f(y)\right\|_2^2,

and hence  \displaystyle (*) \Rightarrow (**).  ♦

The lower bound from the lemma which we have just proved has a useful consequence for the gradient of a smooth function.

Lemma (Coercivity of the gradient)

For a \beta-smooth convex function, one has

\displaystyle \left\langle \nabla f(x) - \nabla f(y), x-y \right\rangle \ge \frac{1}{\beta}\left\| \nabla f(x) - \nabla f(y)\right\|^2_2

Proof.  Sum the lower bounds for the sandwiching lemma for the points  x, y  and y,x.  ♦

A [non necessarily smooth] function is called  \alphastrongly convex if the trivial convexity lower bound is improved by the quadratic term:

\displaystyle f(y) - f(x) \ge \left\langle g(x), y-x \right\rangle + \frac{\alpha}{2}\|y-x\|_2^2,

where g(x)  is any subgradient of  f(x).  It is easy to see that  f(x)  is  \alpha-strongly convex i.f.f.  f(x) - \frac{\alpha}{2}\|x\|_2^2  is convex. Another useful elementary property is that strongly convex functions admit unique minimizers.

As a result, if the function is  \beta-smooth and  \alpha-strongly convex, the function is sandwiched between 2 quadratics:

Sandwiching a smooth and strongly convex  function:

\displaystyle \frac{\alpha}{2}\left\| y - x \right\|_2^2 \le f(y) - f(x) - \left\langle\nabla f(x), y-x \right\rangle \le \frac{\beta}{2}\left\| y - x \right\|_2^2

Another consequence is the improved coercivity of the gradient, proved by considering f(x) - \frac{\alpha}{2}\|x\|^2_2  which is convex and  (\beta-\alpha)-smooth:

Lemma (Coercivity of the gradient in the strongly convex case):

\displaystyle \left\langle \nabla f(x) - \nabla f(y), x-y \right\rangle \ge \frac{\alpha \beta}{\alpha + \beta} \|x-y\|_2^2 + \frac{1}{\alpha + \beta} \left\|\nabla f(x) - \nabla f(y)\right\|_2^2

About Dmitry Ostrovsky

PhD student with interests in statistics, optimization, and machine learning.
This entry was posted in Convex Optimization, Course on Convex Optimization, Memos. Bookmark the permalink.

2 Responses to Sandwiching smooth convex functions

  1. Pingback: Subgradient Projection method II — smooth objective | Look at the corners!

  2. Pingback: Subgradient projection method III — Strong convexity | Look at the corners!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s