McDiarmid’s inequality

In the previous post of the mini-course, we proved Azuma–Hoeffding inequality. We were able to relax the assumption that  X_i  were independent but we were dealing only with their sum. Now we are to demonstrate that martingale method lets also to control more general functions  F  (now of independent arguments). The idea is to apply Azuma–Hoeffding inequality to the generic random sequence called Doob martingale.

For  F: \mathbb{R}^n \to \mathbb{R},  consider its “discrete partial derivatives”

\displaystyle D_i F (x) = \sup_{y} F(x_1, ..., x_{i-1}, y, x_{i+1}, ..., x_n) - \inf_{y} F(x_1,\, ...,\, x_{i-1},\, y,\, x_{i+1},\, ...,\, x_n).

It occurs that if all of them are uniformly bounded, then  F  is well-concentrated.


Theorem  (McDiarmid’s inequality)

Let  X = (X_i)_{i=1}^n  be independent. For any  F: \mathbb{R}^n \to \mathbb{R}  it holds

\displaystyle P\left(F(X_1,\, ...,\, X_n) - \mathbb{E} F(X_1,\, ...,\, X_n)\ge t\right) \le \exp\left(-\frac{2 t^2}{\sum_{i=1}^n \|D_i F\|^2_{\infty}}\right).

In other words,  F  is  \frac{1}{4}\sum_{i=1}^n \|D_i F\|^2_{\infty}-subgaussian.


Proof.  Note that  F \left(X_1,\, ...,\, X_n \right) = \mathbb{E} F\left(X_1,\, ...,\, X_n \,\middle|\, X_1,\, ...,\, X_n\right)  which prompts the following expansion:

\displaystyle F(X_1,\, ..., \, X_n) - \mathbb{E} F(X_1,\, ..., \, X_n) = \sum_{i=1}^n \Delta_i,

\displaystyle \Delta_i = \mathbb{E} F\left(X_1,\, ...,\, X_n \,\middle|\, X_1,\, ...,\, X_i\right) -\mathbb{E} F\left(X_1,\, ...,\, X_n \,\middle|\, X_1,\, ...,\, X_{i-1}\right),

It is trivially checked (with the properties of conditional expectation) that  \Delta_i  is indeed an MDS with respect to the filtration  \mathcal{F}_i = \mathcal{F}(X_1,\, ...,\, X_i)  (the corresponding martingale  Y_i = \mathbb{E} F\left(X_1,\, ...,\, X_n \,\middle|\, X_1,\, ...,\, X_i\right)  is called Doob martingale).

Note that  \Delta_i  is conditionally bounded:  a_i \le \Delta_i\le b_i  for some non-random  a_i, b_i  with  |a_i - b_i| \le \|D_i F\|_{\infty}.  The application of Azuma–Hoeffding inequality yields the result. ♦

Remark 1.  McDiarmid’s inequality can be applied to control the deviations of an empirical process. Namely, let  \mathcal{F}  be a class of bounded functions  f: \mathcal{X} \to [-1,1],  and define

\displaystyle F(X_1, ..., X_n) = \max_{f \in \mathcal{F}} \frac{1}{n} \sum_{i=1}^n f(X_i).

Noting that F has differences bounded by 1/n, we obtain that with probability \delta,

\displaystyle F - \mathbb{E} F \le \sqrt{\frac{2\log\left(1/\delta\right)}{n}}.

This result is often referred to as McDiarmid’s inequality in the machine learning community (since this guys most often need such formulated  F  only). It is in fact the same bound as one gets for just one sum (Hoeffding’s inequality!), when  \mathcal{F}  is a singleton.

Remark 2.  McDiarmid’s inequality, just as Hoeffding’s inequality, may be sharpened by taking into account the variance. The resulting inequality is called Talagrand’s inequality; we state here its empirical process version. Under the same assumptions as in Remark 1, with probability at least  \delta,

\displaystyle F - \mathbb{E} F \le \sqrt{\frac{2 v_\mathcal{F} \log(1/\delta)}{n}} + \frac{2\log(1/\delta)}{3n},

where  v_\mathcal{F} = \sup_{f \in \mathcal{F}} \text{Var}\left[f(X_1)\right] + 2\,\mathbb{E} F.

About Dmitry Ostrovsky

PhD student with interests in statistics, optimization, and machine learning.
This entry was posted in Course on Concentration Inequalities and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s