Formulations of Mirror Descent

A very nice reference for this post is [this blog post].

Mirror Descent is a first-order algorithm to solve a convex optimization problem  \min_{x \in X} f(x).  In this post we will learn several equivalent formulations of this algorithm and discuss how they are related to each other.

Let  \omega(x)  be a distance-generating function (d.-g.f.), 1-strongly convex on the domain X and differentiable on its interior, and consider the Bregman divergence on  X \times \text{int}(X):

\displaystyle D(x,y) = \omega(x) - \omega(y) - \big\langle x-y,\nabla\omega(y)\big\rangle


Projection formulation

\displaystyle \nabla\omega(y) = \nabla\omega(x) - t \nabla f(x)

\displaystyle x^+ = \pi_{X}(y) := \arg\min_{\xi \in X} D(\xi,y)


Here  \nabla f(\cdot)  is any subgradient of the objective function, and  t  is the stepsize parameter. This formulation, used extensively in [Bubeck 2015], permits to draw analogy with the Projected Subgradient Descent for which \omega(x) = \frac{1}{2}\|x\|_2^2,  \nabla\omega(x) = x,  and  D(x,y) = \frac{1}{2}\|x-y\|_2^2.  Nevertheless, this formulation is a bit delusive since it is not clear how to compute the point  y  itself, and under what conditions such  y  exists. Actually the existence of  y  is guaranteed as soon as  \omega(x)  is strongly convex and proper on  X.  Supposing it is indeed the case, let us obtain another formulation by some elementary convex analysis. Consider the Legendre conjugate of  \omega(\cdot)  with respect to X:

\displaystyle \omega_X^*(g) := \max_{x \in X}\, \langle g,x \rangle - \omega(x).

Since  \omega_X^*(g)  is the pointwise maximum of convex functions, we have

\displaystyle \widehat x = \nabla \omega_X^*(g),

where  \widehat x  is the corresponding maximizer, unique due to the strong convexity of  \omega. Combining that observation with the previous formulation,

\displaystyle x^+ = \arg\min_{\xi \in X} D(\xi,y) = \arg\min_{\xi \in X}\, \omega(\xi) - \omega(y) - \langle \nabla \omega(y),\xi-y \rangle

\displaystyle = \arg\min_{\xi \in X}\, \omega(\xi) - \langle \nabla \omega(y),\xi \rangle

\displaystyle = \arg\min_{\xi \in X}\, \omega(\xi) - \langle \nabla \omega(x)-t\nabla f(x),\xi \rangle,  we obtain


Mirror formulation

\displaystyle x^+ = \nabla\omega_X^*[\nabla\omega(x) - t \nabla f(x)]


The interesting point is that actually this formulation is more general: it does not require (in contrast with the first one) any assumptions besides strong convexity. Moreover, it is easy to see from the first-order optimality conditions that the map  \nabla\omega_X^*(\cdot)  is given by the following relation:

\displaystyle \nabla\omega_X^*(g) = \partial^{-1}(\omega + I_X)(g) = (\nabla \omega + N_X)^{-1}(g)

where  I_X  is the indicator function, N_X  is the normal cone, and the multi-map \nabla\omega + N_X: X \to \mathbb{R}^n  is invertible (injective and surjective). The mirror formulation reads: first go ‘beyond the mirror’ by the map \nabla(\omega + I_X)(\cdot) = \nabla \omega(\cdot)  on  X,  then take a step beyond the mirror, and finally go back by the inverse map.

Yet another formulation is obtained from the last line of the chaining argument above.


Proximal point formulation

\displaystyle x^+ = \arg\min_{\xi \in X}\, \langle \nabla f(x), \xi \rangle + \frac{D(\xi,x)}{t}


In other words, the algorithm is trying to minimize the local linear model of the function while staying close to the previous search point. Finally, elementary algebra gives the following


Regularization formulation

\displaystyle x^+ = \arg\min_{\xi \in X}\, -\langle \omega(x) - t \nabla f(x), \xi \rangle + \omega(\xi)


It is most often used to actually implement the algorithm.

Advertisements

About Dmitry Ostrovsky

PhD student with interests in learning theory, statistics, and optimization, working with Anatoli Juditsky and Zaid Harchaoui.
This entry was posted in Course on Convex Optimization, Memos. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s