Intuition behind concentration inequalities

In the mini-course to come, we will go through lots of concentration inequalities. There is a unite way to think about them which I will try to explain in this post.

Consider a sequence of i.i.d. random variables $X_i \sim \mathcal{N}(\mu, \sigma^2),$ $i = \overline{1, n}.$  Consider their empirical mean $\displaystyle F(X_1,\, \dotsc,\, X_n) = \frac{1}{n}\sum_{i = 1}^n X_i.$

The interesting point about it is that for any $n,$ $\mathbf{E} F(X_1,\, \dotsc,\, X_n) = \mu$  (so, in a sense, the scaling $\frac{1}{n}$ is natural), but $F(X_1,\, \dotsc,\, X_n)$  concentrates above its mean: $\displaystyle \text{Var}\, F(X_1,\, \dotsc,\, X_n) \le \frac{\sigma^2}{n}$

In fact, as we will see in the next post, $\displaystyle P\left(F(X_1,\, \dotsc,\, X_n) - \mu \ge x \right) \le \exp\left(-\frac{n x}{2\sigma^2}\right),$

i.e. $\displaystyle F(X_1,\, \dotsc,\, X_n) - \mu \le \sigma^2/n$  with exponentially high probability.

This is the simplest example of the  concentration of measure  phenomenon,  when a function which is weakly dependent on many independent random variables concentrates around its mean or median.  A collateral goal of the course is to observe similar properties in different settings: with broader classes of functions $F$  (Lipschitz functions, norms, quadratic forms, etc.), distributions of $X_i,$  and mutual dependency in the sequence $(X_1,\, \dotsc,\, X_n),$  e.g. martingales. 