## McDiarmid’s inequality

In the previous post of the mini-course, we proved Azuma–Hoeffding inequality. We were able to relax the assumption that $X_i$  were independent but we were dealing only with their sum. Now we are to demonstrate that martingale method lets also to control more general functions $F$  (now of independent arguments). The idea is to apply Azuma–Hoeffding inequality to the generic random sequence called Doob martingale.

For $F: \mathbb{R}^n \to \mathbb{R},$  consider its “discrete partial derivatives” $\displaystyle D_i F (x) = \sup_{y} F(x_1, ..., x_{i-1}, y, x_{i+1}, ..., x_n) - \inf_{y} F(x_1,\, ...,\, x_{i-1},\, y,\, x_{i+1},\, ...,\, x_n).$

It occurs that if all of them are uniformly bounded, then $F$  is well-concentrated.

Theorem  (McDiarmid’s inequality)

Let $X = (X_i)_{i=1}^n$  be independent. For any $F: \mathbb{R}^n \to \mathbb{R}$  it holds $\displaystyle P\left(F(X_1,\, ...,\, X_n) - \mathbb{E} F(X_1,\, ...,\, X_n)\ge t\right) \le \exp\left(-\frac{2 t^2}{\sum_{i=1}^n \|D_i F\|^2_{\infty}}\right).$

In other words, $F$  is $\frac{1}{4}\sum_{i=1}^n \|D_i F\|^2_{\infty}$-subgaussian.

Proof.  Note that $F \left(X_1,\, ...,\, X_n \right) = \mathbb{E} F\left(X_1,\, ...,\, X_n \,\middle|\, X_1,\, ...,\, X_n\right)$  which prompts the following expansion: $\displaystyle F(X_1,\, ..., \, X_n) - \mathbb{E} F(X_1,\, ..., \, X_n) = \sum_{i=1}^n \Delta_i,$ $\displaystyle \Delta_i = \mathbb{E} F\left(X_1,\, ...,\, X_n \,\middle|\, X_1,\, ...,\, X_i\right) -\mathbb{E} F\left(X_1,\, ...,\, X_n \,\middle|\, X_1,\, ...,\, X_{i-1}\right),$

It is trivially checked (with the properties of conditional expectation) that $\Delta_i$  is indeed an MDS with respect to the filtration $\mathcal{F}_i = \mathcal{F}(X_1,\, ...,\, X_i)$  (the corresponding martingale $Y_i = \mathbb{E} F\left(X_1,\, ...,\, X_n \,\middle|\, X_1,\, ...,\, X_i\right)$  is called Doob martingale).

Note that $\Delta_i$  is conditionally bounded: $a_i \le \Delta_i\le b_i$  for some non-random $a_i, b_i$  with $|a_i - b_i| \le \|D_i F\|_{\infty}.$  The application of Azuma–Hoeffding inequality yields the result. ♦

Remark 1.  McDiarmid’s inequality can be applied to control the deviations of an empirical process. Namely, let $\mathcal{F}$  be a class of bounded functions $f: \mathcal{X} \to [-1,1],$  and define $\displaystyle F(X_1, ..., X_n) = \max_{f \in \mathcal{F}} \frac{1}{n} \sum_{i=1}^n f(X_i).$

Noting that $F$ has differences bounded by 1/n, we obtain that with probability $\delta,$ $\displaystyle F - \mathbb{E} F \le \sqrt{\frac{2\log\left(1/\delta\right)}{n}}.$

This result is often referred to as McDiarmid’s inequality in the machine learning community (since this guys most often need such formulated $F$  only). It is in fact the same bound as one gets for just one sum (Hoeffding’s inequality!), when $\mathcal{F}$  is a singleton.

Remark 2.  McDiarmid’s inequality, just as Hoeffding’s inequality, may be sharpened by taking into account the variance. The resulting inequality is called Talagrand’s inequality; we state here its empirical process version. Under the same assumptions as in Remark 1, with probability at least $\delta,$ $\displaystyle F - \mathbb{E} F \le \sqrt{\frac{2 v_\mathcal{F} \log(1/\delta)}{n}} + \frac{2\log(1/\delta)}{3n},$

where $v_\mathcal{F} = \sup_{f \in \mathcal{F}} \text{Var}\left[f(X_1)\right] + 2\,\mathbb{E} F.$ 