McDiarmid’s inequality

In the previous post of the mini-course, we proved Azuma–Hoeffding inequality. We were able to relax the assumption that  X_i  were independent but we were dealing only with their sum. Now we are to demonstrate that martingale method lets also to control more general functions  F  (now of independent arguments). The idea is to apply Azuma–Hoeffding inequality to the generic random sequence called Doob martingale.

For  F: \mathbb{R}^n \to \mathbb{R},  consider its “discrete partial derivatives”

\displaystyle D_i F (x) = \sup_{y} F(x_1, ..., x_{i-1}, y, x_{i+1}, ..., x_n) - \inf_{y} F(x_1,\, ...,\, x_{i-1},\, y,\, x_{i+1},\, ...,\, x_n).

It occurs that if all of them are uniformly bounded, then  F  is well-concentrated.

Theorem  (McDiarmid’s inequality)

Let  X = (X_i)_{i=1}^n  be independent. For any  F: \mathbb{R}^n \to \mathbb{R}  it holds

\displaystyle P\left(F(X_1,\, ...,\, X_n) - \mathbb{E} F(X_1,\, ...,\, X_n)\ge t\right) \le \exp\left(-\frac{2 t^2}{\sum_{i=1}^n \|D_i F\|^2_{\infty}}\right).

In other words,  F  is  \frac{1}{4}\sum_{i=1}^n \|D_i F\|^2_{\infty}-subgaussian.

Proof.  Note that  F \left(X_1,\, ...,\, X_n \right) = \mathbb{E} F\left(X_1,\, ...,\, X_n \,\middle|\, X_1,\, ...,\, X_n\right)  which prompts the following expansion:

\displaystyle F(X_1,\, ..., \, X_n) - \mathbb{E} F(X_1,\, ..., \, X_n) = \sum_{i=1}^n \Delta_i,

\displaystyle \Delta_i = \mathbb{E} F\left(X_1,\, ...,\, X_n \,\middle|\, X_1,\, ...,\, X_i\right) -\mathbb{E} F\left(X_1,\, ...,\, X_n \,\middle|\, X_1,\, ...,\, X_{i-1}\right),

It is trivially checked (with the properties of conditional expectation) that  \Delta_i  is indeed an MDS with respect to the filtration  \mathcal{F}_i = \mathcal{F}(X_1,\, ...,\, X_i)  (the corresponding martingale  Y_i = \mathbb{E} F\left(X_1,\, ...,\, X_n \,\middle|\, X_1,\, ...,\, X_i\right)  is called Doob martingale).

Note that  \Delta_i  is conditionally bounded:  a_i \le \Delta_i\le b_i  for some non-random  a_i, b_i  with  |a_i - b_i| \le \|D_i F\|_{\infty}.  The application of Azuma–Hoeffding inequality yields the result. ♦

Remark 1.  McDiarmid’s inequality can be applied to control the deviations of an empirical process. Namely, let  \mathcal{F}  be a class of bounded functions  f: \mathcal{X} \to [-1,1],  and define

\displaystyle F(X_1, ..., X_n) = \max_{f \in \mathcal{F}} \frac{1}{n} \sum_{i=1}^n f(X_i).

Noting that F has differences bounded by 1/n, we obtain that with probability \delta,

\displaystyle F - \mathbb{E} F \le \sqrt{\frac{2\log\left(1/\delta\right)}{n}}.

This result is often referred to as McDiarmid’s inequality in the machine learning community (since this guys most often need such formulated  F  only). It is in fact the same bound as one gets for just one sum (Hoeffding’s inequality!), when  \mathcal{F}  is a singleton.

Remark 2.  McDiarmid’s inequality, just as Hoeffding’s inequality, may be sharpened by taking into account the variance. The resulting inequality is called Talagrand’s inequality; we state here its empirical process version. Under the same assumptions as in Remark 1, with probability at least  \delta,

\displaystyle F - \mathbb{E} F \le \sqrt{\frac{2 v_\mathcal{F} \log(1/\delta)}{n}} + \frac{2\log(1/\delta)}{3n},

where  v_\mathcal{F} = \sup_{f \in \mathcal{F}} \text{Var}\left[f(X_1)\right] + 2\,\mathbb{E} F.


About Dmitry Ostrovsky

PhD student with interests in learning theory, statistics, and optimization, working with Anatoli Juditsky and Zaid Harchaoui.
This entry was posted in Course on Concentration Inequalities and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s