A very nice reference for this post is [this blog post].
Mirror Descent is a first-order algorithm to solve a convex optimization problem In this post we will learn several equivalent formulations of this algorithm and discuss how they are related to each other.
Let be a distance-generating function (d.-g.f.), 1-strongly convex on the domain and differentiable on its interior, and consider the Bregman divergence on
Here is any subgradient of the objective function, and is the stepsize parameter. This formulation, used extensively in [Bubeck 2015], permits to draw analogy with the Projected Subgradient Descent for which and Nevertheless, this formulation is a bit delusive since it is not clear how to compute the point itself, and under what conditions such exists. Actually the existence of is guaranteed as soon as is strongly convex and proper on Supposing it is indeed the case, let us obtain another formulation by some elementary convex analysis. Consider the Legendre conjugate of with respect to
Since is the pointwise maximum of convex functions, we have
where is the corresponding maximizer, unique due to the strong convexity of Combining that observation with the previous formulation,
The interesting point is that actually this formulation is more general: it does not require (in contrast with the first one) any assumptions besides strong convexity. Moreover, it is easy to see from the first-order optimality conditions that the map is given by the following relation:
where is the indicator function, is the normal cone, and the multi-map is invertible (injective and surjective). The mirror formulation reads: first go ‘beyond the mirror’ by the map on then take a step beyond the mirror, and finally go back by the inverse map.
Yet another formulation is obtained from the last line of the chaining argument above.
Proximal point formulation
In other words, the algorithm is trying to minimize the local linear model of the function while staying close to the previous search point. Finally, elementary algebra gives the following
It is most often used to actually implement the algorithm.