In this post we move on from subgaussian distributions to another important class of distributions called subexponential. The simplest and most common example of such distributions is chi-square. As usual, we are interested in tail probability bounds for such distributions, and the way to obtain them will be Chernoff’s technique.
We start by noting that where
Since we already know from one of the previous posts that
the union bound gives for
:
Now, the observation that is distributed as the sum of
i.i.d.
appeals to using a naive union bound:
The typical value of according to it is
which is completely wrong. Indeed, the independency of different terms in
was not used. In fact, for its intuitively ‘worst concentrated’ counterpart
the naive bound still holds, but in fact,
is
The naive bound performs even worse for
since
, being the empirical average of i.i.d. random variables with the unit expectation each, should concentrate as
w.h.p. (see post “Intuition behind concentration inequalities”), and therefore w.h.p.
which is generally smaller than
.
The looseness of the union bound is due to the fact that it does not exploit the independence of different terms inside The natural way to capture this independence is the so-called Chernoff’s technique. Instead of
we will consider the “generalized
”:
with not even requiring that
We have
but, unlike the subgaussian case, this formula holds not for all Indeed, for
Now we use the well-known bound
noticing (with a bit of simple algebra) that when
(in words: “the right tail of is heavier than the right tail”), and hence
Note that
Summing over we obtain
(2)
We make life a bit easier, noting, as in (1), that
(2′)
an import observation is that for the last bound (1′) is already sufficient.
Finally, minimizing in feasible we obtain the tail bound:
A convenient reparametrization of this comes with
In terms of these new parameters, we now introduce a new class of distributions which is closed under addition of independent random variables:
Definition (Subexponential distribution)
A random variable
with
is
–subexponential, if
![]()
For example, is subexponential
. Note also that
-subgaussian is
-subexponential (and vice versa). For subexponential distributions we have, restating (3),
Theorem 1
For
![]()
-subexponential,
The bounds of this form — quadratic-over-linear in the exponent — are sometimes called Bernstein-type inequalities. We will examine this basic inequality more thoroughly in the next post. Now we return to the behaviour of the sum of independent subexponentials. Recall (3): a sum of independent -subexponentials is subexponential with
and we get
Theorem 2 (Bernstein’s inequality)
For
be independent
-subexponentials,
The ‘crude’ message of this bound is that when we add up i.i.d.
-subexponentials, the mean
is
whereas
is w.h.p.
— at a first glance we have the same concentration as for the subgaussian case. However, the devil is in the detail: if we added up
-subgaussians, there wouldn’t be the additional linear term
in the denominator. For the moderate deviations
the so-called near zone, the linear term is small, and we have subgaussian tails, as if
But in the far zone,
the tails are “a bit heavier”, corresponding to the exponential distribution with rate parameter
We will discuss this subtlety in detail in the next post.
We finish our ride for today by stating the deviation bound for a quadratic form of a standard Gaussian vector. Let be a standard Gaussian vector of some dimension, possibly infinite, and
a quadratic form of
with symmetric matrix
(we don’t require that
). Then, due to the rotational invariance of the Gaussian distribution,
is distributed the same as the “generalized chi-square” with
the eigenvalues of
Introducing Schatten
-norms
and coming back to (3), we have
Theorem 3 (Deviations of a quadratic form)
Alternatively,
In particular, for
The message of the last inequality is that with high probability
Pingback: Subexponential distributions, II — Regimes and Bernstein vs Hoeffding | Look at the corners!
Pingback: Azuma-Hoeffding inequality | Look at the corners!
At the very beginning why do you need the union bound? You could simply calculate: $P(X \ge t) = P( \xi^2 \ge t ) = P( |\xi| \ge \sqrt{t}) \le \exp( -(\sqrt{t})^2/2 ) = \exp(-t)$.
Csaba, you’re right of course, and thank you for the feedback.
The union bound here is for the “methodological purpose”, as seen by the next display (passing from \Chi_1^2 to \Chi_n^2). Basically, the whole first paragraph is just to introduce the idea that one can use the union bound to control the sum of i.i.d. r.v.’s. Of course, union bounding alone is not enough, one needs Chernoff on top, as shown in the remaining part of the post.