Subgaussian distributions

In this post we will discuss subgaussian distributions. In a nutshell, these are the ones that have as light tails as the Gaussian distribution.

Let us start out with a simple (centered) Gaussian random variable  X \sim \mathcal{N}(0, \sigma^2).  The well-known fact about it is that for any  x \ge 0,

\displaystyle P(X > x) \le \frac{1}{2}\, \exp \left(-\frac{x^2}{2\sigma^2}\right).

Probably, the easiest way to see is as follows. Consider  a complex random variable  \zeta = \xi + \imath \eta  where i.i.d.  \xi, \eta \sim \mathcal{N}(0, 1)  (note that X \sim \sigma \xi),  and observe that  |\zeta|^2 \sim \chi_2^2 \sim \text{Exp}(1/2);  1/2  multiplier is due to the symmetry of the normal density. The bound is also tight in a certain sense (one may check it by estimates involving the integration of gaussian moments by parts).

Another, more general (albeit maybe less beautiful) way to get this inequality is the so-called Chernoff’s technique. The main observation is that

\displaystyle P(X > x) = P(e^{\lambda X} > e^{\lambda x}) \le \frac{\mathbb E\, e^{\lambda X}}{e^{\lambda x}}, 

where the last line (due to Markov’s inequality) is true for any \lambda > 0  for which the expectation, called the  moment generating function (MGF),  exists. Since for  X \sim \mathcal{N}(0,\sigma^2),  \mathbb{E}e^{\lambda X} = e^{\sigma^2 \lambda^2/2}  for any  \lambda > 0,  we can minimize the right-hand side in  \lambda, obtaining  \displaystyle P(X > x) \le \exp (-x^2/2\sigma^2).  As we see, such behaviour of MGF is the only thing needed to get a tail bound, which motivates the following


Definition

A random variable  X  with  \mathbb{E} X =\mu is  \sigma^2subgaussian, if  for any \lambda \ge 0

\log \displaystyle \mathbb{E}\, e^{\lambda (X-\mu)} \le \frac{\lambda^2\sigma^2 }{2}.


Parameter  \sigma^2  is sometimes called the  variance proxy.  We just proved a bound on the tails of subgaussian distributions (we won’t prove the second bound here, it follows from the isoperimetric inequality, see e.g. [Johnstone], p. 46):


Theorem 

If  X  is  \sigma^2-subgaussian, then

\displaystyle P(X - \mu > t) \le \exp \left(-\frac{t^2}{2 \sigma^2}\right);

\displaystyle P(X - \text{med}(X)> t) \le \frac{1}{2}\exp \left(-\frac{t^2}{2 \sigma^2}\right).


In other words,  X  is upper-bounded either by  \mu + \sigma  or  \text{med}(X) + \sigma  with high probability (pick the one you like the most). Another characterization of a subgaussian distribution, which I will give without a proof, is that its  q-\text{th} central absolute moment behaves as  O(\Gamma(q/2)) = O(q^{q/2}),  giving, for some absolute constant,


\left\|X - \mu \right\|_q = \mathbb{E}^{\frac{1}{q}} \left| X - \mu \right|^q \le C \sigma \sqrt{q}.


It turns out that a bounded random variable can be characterized as subgaussian in a simple way.


Hoeffding’s lemma

Any  X \in [a, b]  is  \sigma^2-subgaussian with

\displaystyle \sigma^2 = \frac{(b-a)^2}{4}.


Proof.  Without the loss of generality, let  \mu = 0  (otherwise consider  X - \mu  instead). Consider the  cumulant  \displaystyle L(\lambda) := \log \mathbb{E} e^{\lambda (X - \mu)}.  Since  L(0) = 0,  the Taylor expansion of  L(\lambda)  at  0  is

\displaystyle L(\lambda) = \lambda L'(0) + \frac{\lambda^2}{2} L''(\lambda_-)

for some  \lambda_- \in [0,\lambda].  We have  L'(0) = \mathbb{E} X = 0.  We check that  L''(\lambda) = \text{Var}[Z],  where  Z \in [a,b]  has density given by

\displaystyle Q_{\lambda}(dx) = \frac{e^{\lambda x} P(dx)}{\int_a^b e^{\lambda x} P(dx)},

where  P(dx)  is the density of  X.  Since  Z \in [a,b],

\displaystyle \text{Var}[Z] \le \max_{z \in [a, b]} \displaystyle\left[ z - \frac{a + b}{2} \right]^2 = \left(\frac{b-a}{2}\right)^2.

Noting that this holds for any  \lambda,  we obtain the claim. ♦

It is straightforward to see that independent subgaussian random variables admit a very simple algebra. Namely, if  X_{1, 2}  are independent subgaussians with parameters  \sigma^2_{1,2} correspondingly, then  X_1 + X_2  is subgaussian with  \sigma^2_1+\sigma^2_2.  As a consequence, we have the following


Theorem  (Subgaussian concentration) 

For  X_1, \,\dotsc,\, X_n  \sigma^2-subgaussian and independent, it holds

\displaystyle P\left( \sum_{i = 1}^n X_i -\mu_i \ge t \right) \le \exp \left(-\frac{t^2}{2 \sum_{i=1}^n \sigma^2_i} \right).


Its version corresponding to the case of bounded random variables is formulated below for completeness


Theorem  (Hoeffding’s inequality) 

For  X_1, \,\dotsc,\, X_n  independent,  X_i \in [a_i, b_i],  it holds

\displaystyle P\left( \sum_{i = 1}^n X_i -\mu_i \ge t \right) \le \exp \left(-\frac{2 t^2}{\sum_{i=1}^n (b_i - a_i)^2} \right).


Both of these bounds are typical concentration inequalities as they were described in the previous post. Indeed, let  X_i  be i.i.d. with  \mu_i = \mu  and  \sigma^2_i = \sigma^2.  The sum of  X_i deviates from its expectation, which is  O(n),  only by  O(\sqrt{n}).  Putting it another way, we may normalize to  n  and get the arithmetical mean has the average of  O(1)  but deviates from it only by  O(1/\sqrt{n}).

Finally, I give (the proof is quite technical so I omit it here) probably the most general result about subgaussian distributions:


Theorem  (Lipschitz functions are subgaussian) 

Let  f: \mathbb{R}^n \to \mathbb{R}  be  L-Lipschitz with respect to  \| \cdot\|_2,  i.e.  \forall x,x' \in \mathbb{R}^n,

|f(x) - f(x')| \le L \left\|x-x'\right\|_2.

If  z \sim \mathcal{N}(0, I_n),  then  f(z)  is  L-subgaussian.


This theorem readily provides a lot of subgaussian distributions. Let us give several examples:

  1. \|z\|_p  with  2 \le p \le \infty  is 1-subgaussian (and hence  \max\limits_{i=1}^n z_i  as well).
  2. (z^T Q z)^{1/2}  with  Q \succeq 0  is  \|Q\|_\infty-subgaussian,  where  \|Q\|_\infty  is the operator norm of  Q;
  3. Let  Z \in \mathbb{R}^{n \times n}  have i.i.d. standard Gaussian entries. Then its Schatten norm  \|Z\|_p  is 1-subgaussian for any  p \ge 2,  including the Hilbert-Schmidt norm  \|Z\|_2   and, most importantly,  the operator norm  \|Z\|_\infty.  To see this, use rotational invariance of  \|Z\|_2.
Advertisements

About Dmitry Ostrovsky

PhD student with interests in learning theory, statistics, and optimization, working with Anatoli Juditsky and Zaid Harchaoui.
This entry was posted in Course on Concentration Inequalities. Bookmark the permalink.

2 Responses to Subgaussian distributions

  1. Pingback: Subexponentials I: Bernstein inequality and deviations bound for a quadratic form | Look at the corners!

  2. Pingback: Azuma-Hoeffding inequality | Look at the corners!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s