## Subgaussian distributions

In this post we will discuss subgaussian distributions. In a nutshell, these are the ones that have as light tails as the Gaussian distribution.

Let us start out with a simple (centered) Gaussian random variable  $X \sim \mathcal{N}(0, \sigma^2).$  The well-known fact about it is that for any  $x \ge 0,$

$\displaystyle P(X > x) \le \frac{1}{2}\, \exp \left(-\frac{x^2}{2\sigma^2}\right).$

Probably, the easiest way to see is as follows. Consider  a complex random variable  $\zeta = \xi + \imath \eta$  where i.i.d.  $\xi, \eta \sim \mathcal{N}(0, 1)$  (note that $X \sim \sigma \xi$),  and observe that  $|\zeta|^2 \sim \chi_2^2 \sim \text{Exp}(1/2);$  $1/2$  multiplier is due to the symmetry of the normal density. The bound is also tight in a certain sense (one may check it by estimates involving the integration of gaussian moments by parts).

Another, more general (albeit maybe less beautiful) way to get this inequality is the so-called Chernoff’s technique. The main observation is that

$\displaystyle P(X > x) = P(e^{\lambda X} > e^{\lambda x}) \le \frac{\mathbb E\, e^{\lambda X}}{e^{\lambda x}},$

where the last line (due to Markov’s inequality) is true for any $\lambda > 0$  for which the expectation, called the  moment generating function (MGF),  exists. Since for  $X \sim \mathcal{N}(0,\sigma^2),$  $\mathbb{E}e^{\lambda X} = e^{\sigma^2 \lambda^2/2}$  for any  $\lambda > 0,$  we can minimize the right-hand side in  $\lambda,$ obtaining  $\displaystyle P(X > x) \le \exp (-x^2/2\sigma^2).$  As we see, such behaviour of MGF is the only thing needed to get a tail bound, which motivates the following

Definition

A random variable  $X$  with  $\mathbb{E} X =\mu$ is  $\sigma^2$subgaussian, if  for any $\lambda \ge 0$

$\log \displaystyle \mathbb{E}\, e^{\lambda (X-\mu)} \le \frac{\lambda^2\sigma^2 }{2}.$

Parameter  $\sigma^2$  is sometimes called the  variance proxy.  We just proved a bound on the tails of subgaussian distributions (we won’t prove the second bound here, it follows from the isoperimetric inequality, see e.g. [Johnstone], p. 46):

Theorem

If  $X$  is  $\sigma^2$-subgaussian, then

$\displaystyle P(X - \mu > t) \le \exp \left(-\frac{t^2}{2 \sigma^2}\right);$

$\displaystyle P(X - \text{med}(X)> t) \le \frac{1}{2}\exp \left(-\frac{t^2}{2 \sigma^2}\right).$

In other words,  $X$  is upper-bounded either by  $\mu + \sigma$  or  $\text{med}(X) + \sigma$  with high probability (pick the one you like the most). Another characterization of a subgaussian distribution, which I will give without a proof, is that its  $q-\text{th}$ central absolute moment behaves as  $O(\Gamma(q/2)) = O(q^{q/2}),$  giving, for some absolute constant,

$\left\|X - \mu \right\|_q = \mathbb{E}^{\frac{1}{q}} \left| X - \mu \right|^q \le C \sigma \sqrt{q}.$

It turns out that a bounded random variable can be characterized as subgaussian in a simple way.

Hoeffding’s lemma

Any  $X \in [a, b]$  is  $\sigma^2$-subgaussian with

$\displaystyle \sigma^2 = \frac{(b-a)^2}{4}.$

Proof.  Without the loss of generality, let  $\mu = 0$  (otherwise consider  $X - \mu$  instead). Consider the  cumulant  $\displaystyle L(\lambda) := \log \mathbb{E} e^{\lambda (X - \mu)}.$  Since  $L(0) = 0,$  the Taylor expansion of  $L(\lambda)$  at  $0$  is

$\displaystyle L(\lambda) = \lambda L'(0) + \frac{\lambda^2}{2} L''(\lambda_-)$

for some  $\lambda_- \in [0,\lambda].$  We have  $L'(0) = \mathbb{E} X = 0.$  We check that  $L''(\lambda) = \text{Var}[Z],$  where  $Z \in [a,b]$  has density given by

$\displaystyle Q_{\lambda}(dx) = \frac{e^{\lambda x} P(dx)}{\int_a^b e^{\lambda x} P(dx)},$

where  $P(dx)$  is the density of  $X.$  Since  $Z \in [a,b],$

$\displaystyle \text{Var}[Z] \le \max_{z \in [a, b]} \displaystyle\left[ z - \frac{a + b}{2} \right]^2 = \left(\frac{b-a}{2}\right)^2.$

Noting that this holds for any  $\lambda,$  we obtain the claim. ♦

It is straightforward to see that independent subgaussian random variables admit a very simple algebra. Namely, if  $X_{1, 2}$  are independent subgaussians with parameters  $\sigma^2_{1,2}$ correspondingly, then  $X_1 + X_2$  is subgaussian with  $\sigma^2_1+\sigma^2_2.$  As a consequence, we have the following

Theorem  (Subgaussian concentration)

For  $X_1, \,\dotsc,\, X_n$  $\sigma^2$-subgaussian and independent, it holds

$\displaystyle P\left( \sum_{i = 1}^n X_i -\mu_i \ge t \right) \le \exp \left(-\frac{t^2}{2 \sum_{i=1}^n \sigma^2_i} \right).$

Its version corresponding to the case of bounded random variables is formulated below for completeness

Theorem  (Hoeffding’s inequality)

For  $X_1, \,\dotsc,\, X_n$  independent,  $X_i \in [a_i, b_i],$  it holds

$\displaystyle P\left( \sum_{i = 1}^n X_i -\mu_i \ge t \right) \le \exp \left(-\frac{2 t^2}{\sum_{i=1}^n (b_i - a_i)^2} \right).$

Both of these bounds are typical concentration inequalities as they were described in the previous post. Indeed, let  $X_i$  be i.i.d. with  $\mu_i = \mu$  and  $\sigma^2_i = \sigma^2.$  The sum of  $X_i$ deviates from its expectation, which is  $O(n),$  only by  $O(\sqrt{n}).$  Putting it another way, we may normalize to  $n$  and get the arithmetical mean has the average of  $O(1)$  but deviates from it only by  $O(1/\sqrt{n}).$

Finally, I give (the proof is quite technical so I omit it here) probably the most general result about subgaussian distributions:

Theorem  (Lipschitz functions are subgaussian)

Let  $f: \mathbb{R}^n \to \mathbb{R}$  be  $L$-Lipschitz with respect to  $\| \cdot\|_2,$  i.e.  $\forall x,x' \in \mathbb{R}^n,$

$|f(x) - f(x')| \le L \left\|x-x'\right\|_2.$

If  $z \sim \mathcal{N}(0, I_n),$  then  $f(z)$  is  $L$-subgaussian.

This theorem readily provides a lot of subgaussian distributions. Let us give several examples:

1. $\|z\|_p$  with  $2 \le p \le \infty$  is 1-subgaussian (and hence  $\max\limits_{i=1}^n z_i$  as well).
2. $(z^T Q z)^{1/2}$  with  $Q \succeq 0$  is  $\|Q\|_\infty$-subgaussian,  where  $\|Q\|_\infty$  is the operator norm of  $Q;$
3. Let  $Z \in \mathbb{R}^{n \times n}$  have i.i.d. standard Gaussian entries. Then its Schatten norm  $\|Z\|_p$  is 1-subgaussian for any  $p \ge 2,$  including the Hilbert-Schmidt norm  $\|Z\|_2$   and, most importantly,  the operator norm  $\|Z\|_\infty.$  To see this, use rotational invariance of  $\|Z\|_2.$

## About Dmitry Ostrovsky

PhD student with interests in statistics, optimization, and machine learning.
This entry was posted in Course on Concentration Inequalities. Bookmark the permalink.