To learn convex optimization methods and stack this structure in my head, I decided to write the material down by topics. The focus will be on (constrained) large-scale, or dimension-free, optimization corresponding to first-order methods, with applications to machine learning in mind. I don’t pretend to original contribution here; the material will be taken from the [Bubeck], [Nesterov], and [Nemirovski].
In this post we introduce the main framework we will subsequently be working with, the so-called black box model.
Suppose that we are given an optimization problem of the form
where will always be a convex function, and a convex and compact (i.e. closed and bounded) set with non-empty interior. Let denote the unknown minimum value, or the problem value.
We will study problem classes, each being a subset of problems characterized by some restrictions on and characteristics of (for example, is bounded by some constant and is contained in a euclidean ball with radius ). The black box model is then constructed as follows. Assume that an oracle is given, that is, a “machine’,’ which for any query point provides an answer with some information on in this point (e.g. the value of ). To solve the problem up to the desired accuracy level we employ some algorithm which will always take the following form:
- Query the oracle at the point
- Using the information to date, do the following. Check the termination condition:
- “Yes” -> output the guess on the minimum value and terminate;
- “No” -> chose a new point and go to 1.
We hope that the output of the method will at some point find a nearly optimal point (such that the value of the objective function in it differs from the problem value by no greater than ‘Information to date’ implies (1) the information from all of the previous oracle responses, and (2) the full information on
Definition (Oracle complexity of the problem class)
Oracle complexity of a problem class (with respect to the oracle ) is
where is the number of queries to which a method commits to solve the problem instance up to the accuracy level
For a given class often the “reasonable” oracle is determined by it (and therefore we omit in the complexity notation, slightly confusing the notation). With fixed problem class and oracle, is, of course, a decreasing function of
The study of a class usually consists in providing the upper and lower bounds on its oracle complexity as a function of If they coincide up to a multiplicative constant (which may depend on every quantity here except for ), then is considered to be fully studied.
Upper and lower bounds are provided differently. An upper bound may be given by some fixed method for which it may be proved that the method performs not more than a given number of oracle calls, for a fixed for any problem of the class. A lower bound provides a problem instance from which takes not less than some specified number of oracle calls, for any method to be used. Besides that, bounds are proved with the adversary idea, namely, that a hard problem instance may be built gradually as the method calls the oracle (it is easier to play for the adversary since he makes the second step in this two-step game, that is why such bounds are lower bounds).
As previously noted, our interest will be in huge-dimensional problems meaning that the dimension of the problem class may be very large (from ten thousands to billions and more). Here the situation is as follows. One may often get upper and lower bounds on complexities with the same dependence on but with multiplicative constants growing with Whereas for many problem classes these constants grow dramatically fast (exponentially and even faster) with it occurs that for some of them (essentially when the first-order information about the function is disposable), one may come up with lower bounds are independent of the problem dimension. Given such lower bounds, the challenge is in proving dimension-free upper bounds by discovering the corresponding ([almost] dimension-free) efficient methods.