In this post we move on from subgaussian distributions to another important class of distributions called subexponential. The simplest and most common example of such distributions is chi-square. As usual, we are interested in tail probability bounds for such distributions, and the way to obtain them will be Chernoff’s technique.
We start by noting that where Since we already know from one of the previous posts that the union bound gives for :
Now, the observation that is distributed as the sum of i.i.d. appeals to using a naive union bound:
The typical value of according to it is which is completely wrong. Indeed, the independency of different terms in was not used. In fact, for its intuitively ‘worst concentrated’ counterpart the naive bound still holds, but in fact, is The naive bound performs even worse for since , being the empirical average of i.i.d. random variables with the unit expectation each, should concentrate as w.h.p. (see post “Intuition behind concentration inequalities”), and therefore w.h.p. which is generally smaller than .
The looseness of the union bound is due to the fact that it does not exploit the independence of different terms inside The natural way to capture this independence is the so-called Chernoff’s technique. Instead of we will consider the “generalized ”:
with not even requiring that We have
but, unlike the subgaussian case, this formula holds not for all Indeed, for
Now we use the well-known bound
noticing (with a bit of simple algebra) that when
(in words: “the right tail of is heavier than the right tail”), and hence
Summing over we obtain
We make life a bit easier, noting, as in (1), that
an import observation is that for the last bound (1′) is already sufficient.
Finally, minimizing in feasible we obtain the tail bound:
A convenient reparametrization of this comes with In terms of these new parameters, we now introduce a new class of distributions which is closed under addition of independent random variables:
Definition (Subexponential distribution)
A random variable with is –subexponential, if
For example, is subexponential . Note also that -subgaussian is -subexponential (and vice versa). For subexponential distributions we have, restating (3),
The bounds of this form — quadratic-over-linear in the exponent — are sometimes called Bernstein-type inequalities. We will examine this basic inequality more thoroughly in the next post. Now we return to the behaviour of the sum of independent subexponentials. Recall (3): a sum of independent -subexponentials is subexponential with and we get
Theorem 2 (Bernstein’s inequality)
For be independent -subexponentials,
The ‘crude’ message of this bound is that when we add up i.i.d. -subexponentials, the mean is whereas is w.h.p. — at a first glance we have the same concentration as for the subgaussian case. However, the devil is in the detail: if we added up -subgaussians, there wouldn’t be the additional linear term in the denominator. For the moderate deviations the so-called near zone, the linear term is small, and we have subgaussian tails, as if But in the far zone, the tails are “a bit heavier”, corresponding to the exponential distribution with rate parameter We will discuss this subtlety in detail in the next post.
We finish our ride for today by stating the deviation bound for a quadratic form of a standard Gaussian vector. Let be a standard Gaussian vector of some dimension, possibly infinite, and a quadratic form of with symmetric matrix (we don’t require that ). Then, due to the rotational invariance of the Gaussian distribution, is distributed the same as the “generalized chi-square” with the eigenvalues of Introducing Schatten -norms and coming back to (3), we have
Theorem 3 (Deviations of a quadratic form)
In particular, for
The message of the last inequality is that with high probability