A very nice reference for this post is [this blog post].
Mirror Descent is a first-order algorithm to solve a convex optimization problem In this post we will learn several equivalent formulations of this algorithm and discuss how they are related to each other.
Let be a distance-generating function (d.-g.f.), 1-strongly convex on the domain
and differentiable on its interior, and consider the Bregman divergence on
Projection formulation
Here is any subgradient of the objective function, and
is the stepsize parameter. This formulation, used extensively in [Bubeck 2015], permits to draw analogy with the Projected Subgradient Descent for which
and
Nevertheless, this formulation is a bit delusive since it is not clear how to compute the point
itself, and under what conditions such
exists. Actually the existence of
is guaranteed as soon as
is strongly convex and proper on
Supposing it is indeed the case, let us obtain another formulation by some elementary convex analysis. Consider the Legendre conjugate of
with respect to
Since is the pointwise maximum of convex functions, we have
where is the corresponding maximizer, unique due to the strong convexity of
Combining that observation with the previous formulation,
we obtain
Mirror formulation
The interesting point is that actually this formulation is more general: it does not require (in contrast with the first one) any assumptions besides strong convexity. Moreover, it is easy to see from the first-order optimality conditions that the map is given by the following relation:
where is the indicator function,
is the normal cone, and the multi-map
is invertible (injective and surjective). The mirror formulation reads: first go ‘beyond the mirror’ by the map
on
then take a step beyond the mirror, and finally go back by the inverse map.
Yet another formulation is obtained from the last line of the chaining argument above.
Proximal point formulation
In other words, the algorithm is trying to minimize the local linear model of the function while staying close to the previous search point. Finally, elementary algebra gives the following
Regularization formulation
It is most often used to actually implement the algorithm.