In this post we will discuss subgaussian distributions. In a nutshell, these are the ones that have as light tails as the Gaussian distribution.
Let us start out with a simple (centered) Gaussian random variable The well-known fact about it is that for any
Probably, the easiest way to see is as follows. Consider a complex random variable where i.i.d. (note that ), and observe that multiplier is due to the symmetry of the normal density. The bound is also tight in a certain sense (one may check it by estimates involving the integration of gaussian moments by parts).
Another, more general (albeit maybe less beautiful) way to get this inequality is the so-called Chernoff’s technique. The main observation is that
where the last line (due to Markov’s inequality) is true for any for which the expectation, called the moment generating function (MGF), exists. Since for for any we can minimize the right-hand side in obtaining As we see, such behaviour of MGF is the only thing needed to get a tail bound, which motivates the following
A random variable with is –subgaussian, if for any
Parameter is sometimes called the variance proxy. We just proved a bound on the tails of subgaussian distributions (we won’t prove the second bound here, it follows from the isoperimetric inequality, see e.g. [Johnstone], p. 46):
If is -subgaussian, then
In other words, is upper-bounded either by or with high probability (pick the one you like the most). Another characterization of a subgaussian distribution, which I will give without a proof, is that its central absolute moment behaves as giving, for some absolute constant,
It turns out that a bounded random variable can be characterized as subgaussian in a simple way.
Any is -subgaussian with
Proof. Without the loss of generality, let (otherwise consider instead). Consider the cumulant Since the Taylor expansion of at is
for some We have We check that where has density given by
where is the density of Since
Noting that this holds for any we obtain the claim. ♦
It is straightforward to see that independent subgaussian random variables admit a very simple algebra. Namely, if are independent subgaussians with parameters correspondingly, then is subgaussian with As a consequence, we have the following
Theorem (Subgaussian concentration)
For -subgaussian and independent, it holds
Its version corresponding to the case of bounded random variables is formulated below for completeness
Theorem (Hoeffding’s inequality)
For independent, it holds
Both of these bounds are typical concentration inequalities as they were described in the previous post. Indeed, let be i.i.d. with and The sum of deviates from its expectation, which is only by Putting it another way, we may normalize to and get the arithmetical mean has the average of but deviates from it only by
Finally, I give (the proof is quite technical so I omit it here) probably the most general result about subgaussian distributions:
Theorem (Lipschitz functions are subgaussian)
Let be -Lipschitz with respect to i.e.
If then is -subgaussian.
This theorem readily provides a lot of subgaussian distributions. Let us give several examples:
- with is 1-subgaussian (and hence as well).
- with is -subgaussian, where is the operator norm of
- Let have i.i.d. standard Gaussian entries. Then its Schatten norm is 1-subgaussian for any including the Hilbert-Schmidt norm and, most importantly, the operator norm To see this, use rotational invariance of