A very nice reference for this post is [this blog post].

Mirror Descent is a first-order algorithm to solve a convex optimization problem In this post we will learn several equivalent formulations of this algorithm and discuss how they are related to each other.

Let be a *distance-generating function *(d.-g.f.), 1-strongly convex on the domain and differentiable on its interior, and consider the Bregman divergence on

**Projection formulation**

Here is any subgradient of the objective function, and is the stepsize parameter. This formulation, used extensively in [Bubeck 2015], permits to draw analogy with the Projected Subgradient Descent for which and Nevertheless, this formulation is a bit delusive since it is not clear how to compute the point itself, and under what conditions such exists. Actually the existence of is guaranteed as soon as is strongly convex and proper on Supposing it is indeed the case, let us obtain another formulation by some elementary convex analysis. Consider the Legendre conjugate of with respect to

Since is the pointwise maximum of convex functions, we have

where is the corresponding maximizer, unique due to the strong convexity of Combining that observation with the previous formulation,

we obtain

**Mirror formulation**

The interesting point is that actually this formulation is more general: it does not require (in contrast with the first one) any assumptions besides strong convexity. Moreover, it is easy to see from the first-order optimality conditions that the map is given by the following relation:

where is the indicator function, is the normal cone, and the multi-map is invertible (injective and surjective). The mirror formulation reads: first go ‘beyond the mirror’ by the map on then take a step beyond the mirror, and finally go back by the inverse map.

Yet another formulation is obtained from the last line of the chaining argument above.

**Proximal point formulation**

In other words, the algorithm is trying to minimize the local linear model of the function while staying close to the previous search point. Finally, elementary algebra gives the following

**Regularization formulation**

It is most often used to actually implement the algorithm.

### Like this:

Like Loading...

## About Dmitry Ostrovsky

PhD student with interests in learning theory, statistics, and optimization, working with Anatoli Juditsky and Zaid Harchaoui.