In this post we move on from *subgaussian* distributions to another important class of distributions called *subexponential*. The simplest and most common example of such distributions is chi-square. As usual, we are interested in tail probability bounds for such distributions, and the way to obtain them will be Chernoff’s technique.

We start by noting that where Since we already know from one of the previous posts that the union bound gives for :

Now, the observation that is distributed as the sum of i.i.d. appeals to using a naive union bound:

The typical value of according to it is which is completely wrong. Indeed, the independency of different terms in was not used. In fact, for its intuitively ‘worst concentrated’ counterpart the naive bound still holds, but in fact, is The naive bound performs even worse for since , being the empirical average of i.i.d. random variables with the unit expectation each, should concentrate as w.h.p. (see post “Intuition behind concentration inequalities”), and therefore w.h.p. which is generally smaller than .

The looseness of the union bound is due to the fact that it does not exploit the independence of different terms inside The natural way to capture this independence is the so-called *Chernoff’s technique.* Instead of we will consider the “generalized ”:

with not even requiring that We have

but, unlike the subgaussian case, this formula holds *not for all* Indeed, for

Now we use the well-known bound

noticing (with a bit of simple algebra) that when

(in words: “the right tail of is heavier than the right tail”), and hence

Note that

Summing over we obtain

(2)

We make life a bit easier, noting, as in (1), that

(2′)

an import observation is that for the last bound (1′) is already sufficient.

Finally, minimizing in feasible we obtain the tail bound:

A convenient reparametrization of this comes with In terms of these new parameters, we now introduce a new class of distributions which is closed under addition of independent random variables:

Definition (Subexponential distribution)A random variable with is –

subexponential,if

For example, is subexponential . Note also that -subgaussian is -subexponential (and vice versa). For subexponential distributions we have, restating (3),

Theorem 1For -subexponential,

The bounds of this form — quadratic-over-linear in the exponent — are sometimes called *Bernstein-type inequalities. *We will examine this basic inequality more thoroughly in the next post. Now we return to the behaviour of the sum of independent subexponentials. Recall (3): a sum of independent -subexponentials is subexponential with and we get

Theorem 2 (Bernstein’s inequality)For be independent -subexponentials,

The ‘crude’ message of this bound is that when we add up i.i.d. -subexponentials, the mean is whereas is w.h.p. — at a first glance we have the same concentration as for the subgaussian case. However, the devil is in the detail: if we added up -subgaussians, there wouldn’t be the additional linear term in the denominator. For the moderate deviations the so-called * near zone,* the linear term is small, and we have subgaussian tails, as if But in the

*the tails are “a bit heavier”, corresponding to the exponential distribution with rate parameter We will discuss this subtlety in detail in the next post.*

`far zone,`We finish our ride for today by stating the deviation bound for a quadratic form of a standard Gaussian vector. Let be a standard Gaussian vector of some dimension, possibly infinite, and a quadratic form of with symmetric matrix (we don’t require that ). Then, due to the rotational invariance of the Gaussian distribution, is distributed the same as the “generalized chi-square” with the eigenvalues of Introducing Schatten -norms and coming back to (3), we have

Theorem 3 (Deviations of a quadratic form)Alternatively,

In particular, for

The message of the last inequality is that with high probability

Pingback: Subexponential distributions, II — Regimes and Bernstein vs Hoeffding | Look at the corners!

Pingback: Azuma-Hoeffding inequality | Look at the corners!

At the very beginning why do you need the union bound? You could simply calculate: $P(X \ge t) = P( \xi^2 \ge t ) = P( |\xi| \ge \sqrt{t}) \le \exp( -(\sqrt{t})^2/2 ) = \exp(-t)$.

Csaba, you’re right of course, and thank you for the feedback.

The union bound here is for the “methodological purpose”, as seen by the next display (passing from \Chi_1^2 to \Chi_n^2). Basically, the whole first paragraph is just to introduce the idea that one can use the union bound to control the sum of i.i.d. r.v.’s. Of course, union bounding alone is not enough, one needs Chernoff on top, as shown in the remaining part of the post.