To learn convex optimization methods and stack this structure in my head, I decided to write the material down by topics. The focus will be on (constrained) large-scale, or dimension-free, optimization corresponding to *first-order methods, *with applications to machine learning in mind.** **I don’t pretend to original contribution here; the material will be taken from the [Bubeck], [Nesterov], and [Nemirovski].

In this post we introduce the main framework we will subsequently be working with, the so-called* black box model.*

Suppose that we are given an optimization problem of the form

where will always be a convex function, and a convex and compact (i.e. closed and bounded) set with non-empty interior. Let denote the unknown minimum value, or the *problem value.*

We will study *problem classes, *each being a subset of problems characterized by some restrictions on and characteristics of (for example, is bounded by some constant and is contained in a euclidean ball with radius ). The *black box model* is then constructed as follows. Assume that an *oracle* is given, that is, a “machine’,’ which for any query point provides an answer with some information on in this point (e.g. the value of ). To solve the problem up to the desired accuracy level we employ some algorithm which will always take the following form:

- Query the oracle at the point
- Using the information to date, do the following. Check the termination condition:
- “Yes” -> output the guess on the minimum value and terminate;
- “No” -> chose a new point and go to 1.

We hope that the output of the method will** **at some point find a nearly optimal point (such that the value of the objective function in it differs from the problem value by no greater than ‘Information to date’ implies (1) the information from all of the previous oracle responses, and (2) the full information on

Definition (Oracle complexity of the problem class)

Oracle complexityof a problem class (with respect to the oracle ) iswhere is the number of queries to which a method commits to solve the problem instance up to the accuracy level

For a given class often the “reasonable” oracle is determined by it (and therefore we omit in the complexity notation, slightly confusing the notation). With fixed problem class and oracle, is, of course, a decreasing function of

The study of a class usually consists in providing the upper* *and lower bounds** **on its oracle complexity as a function of If they coincide up to a multiplicative constant (which may depend on every quantity here except for ), then is considered to be fully studied.

Upper and lower bounds are provided differently. An upper bound may be given by some *fixed method* for which it may be proved that the method performs not more than a given number of oracle calls, for a fixed for *any problem of the class*. A lower bound provides a *problem instance* from which takes not less than some specified number of oracle calls, for *any method to be used*. Besides that, bounds are proved with the *adversary* idea, namely, that a hard problem instance may be built *gradually* as the method calls the oracle (it is easier to play for the adversary since he makes the second step in this two-step game, that is why such bounds are lower bounds).

As previously noted, our interest will be in huge-dimensional problems meaning that the dimension of the problem class may be very large (from ten thousands to billions and more). Here the situation is as follows. One may often get upper and lower bounds on complexities with the same dependence on but with multiplicative constants growing with Whereas for many problem classes these constants grow dramatically fast (exponentially and even faster) with it occurs that for some of them (essentially when the first-order information about the function is disposable), one may come up with lower bounds are *independent* *of the problem dimension*. Given such lower bounds, the challenge is in proving *dimension-free upper bounds* by discovering the corresponding *([**almost]* *dimension-free)* *efficient methods.*

Мы уже отчаялись ждать, потеряли всякую надежду, а наша жизнь лишилась всякого смысла. И тут, свершилось, новый пост в Mimicking the Oracle!

Да-с.

Pingback: Subgradient Projection method | Look at the corners!