Subexponential distributions III — Johnson–Lindenstrauss lemma

In this post we will finish our affair with subexponential distributions. We will consider yet another example where they can be applied, an important statement called Johnson–Lindenstrauss lemma. Essentially, it states that  n  points in a high-dimensional Euclidean space may be projected in a subspace with dimension only  O(\log n)  such that all pairwise distances are well-preserved.

Theorem  (Johnson–Lindenstrauss)

Let  \delta > 0  and  x_1, .., x_n \in \mathbb{R}^d.  For any

\displaystyle m \ge \frac{16}{\delta^2}\,\log n

there exists a linear map F: \mathbb{R}^d \to \mathbb{R}^m  such that  for any  \displaystyle 1 \le i \le j \le n,

\displaystyle (1-\delta)\|x_i-x_j\|_2^2 \le \left\|F(x_i)-F(x_j)\right\|_2^2 \le(1+\delta)\|x_i-x_j\|_2^2.

Remark 1.  Note that  m  depends on  n  but not on  d.  It is no wonder since the affine dimension of  n  points  is  obviously at most  n-1.

Remark 2.  One may actually require that  F  is a projection, but then the proof becomes a bit more technical.

Remark 3.  In the disrete case, the analogous statement holds for the Hamming distance.

Proof.  Consider the following random map, which is ‘almost’ a projection due to the normalization 1/\sqrt{m}:

\displaystyle F(x) = \frac{Y}{\sqrt{m}} \,\, x,

for the random matrix  \displaystyle Y \in \mathbb{R}^{m \times d}  with i.i.d.  \displaystyle \mathcal{N}(0,1)  entries.  Note that  \displaystyle \frac{\|Yx\|_2^2}{\|x\|_2^2} \sim \chi_m^2,  which is  (4m, 4)-subexponential.  As a result, we get that for any fixed  x \in \mathbb{R}^d,

\displaystyle P\left(\left|\frac{\left\|F(x)\right\|_2^2}{\left\|x\right\|_2^2} - 1\right| \ge \delta \right) \le 2 \exp\left( -\frac{m\delta^2}{8}\right).

Applying the union bound for  \displaystyle {n \choose 2}  points of the form  x = x_i - x_j,  1 \le i \le j \le n,  we obtain

\displaystyle P\left(\exists i \ne j\,\,\, \text{s.t.}\, \left|\frac{\left\|F(x_i - x_j)\right\|_2^2}{\left\|x_i - x_j\right\|_2^2} - 1\right| \ge \delta \right) \le n^2 \exp\left( -\frac{m\delta^2}{8}\right),

and picking  m  as in the premise of the theorem forces the probability be less than  1, which guarantees the existence of the required map.  Moreover, choosing

\displaystyle m \ge \frac{16}{\delta^2} \, \log \left(\frac{n}{\sqrt{\alpha}}\right)

guarantees that a random map will preserve distances up to precision  \delta  with probability at least  1 - \alpha.  ♦

Remark.  As an immediate consequence, we get that the random Gaussian matrix will satisfy RIP with high probability. A matrix  F \in \mathbb{R}^{m \times d}  satisfies Restricted Isometry Property (RIP) with constant  \delta  if the double inequality of the Theorem  holds for any  s-sparse vector  x.  Note that since this inequality is homogeneous in  x,  we can just prove that it holds only for sparse unit vectors; there are  n = {d \choose s} \le \left(\frac{ed}{s}\right)^s  such vectors. Taking the logarithm, we get that the random Gaussian matrix will be  \delta-RIP w.h.p. if

\displaystyle m = O\left(\frac{1}{\delta^2}\right) s \log\left(\frac{d}{s}\right),

The Compressed Sensing theory then shows that a  s-sparse signal in a high-dimensional ambient space  \mathbb{R}^d  may be recovered via  m  noisy linear measurements

\displaystyle y = Fx + \frac{\sigma}{\sqrt{m}} \xi, \quad \xi \sim \mathcal{N}(0,I_m).

About Dmitry Ostrovsky

PhD student with interests in statistics, optimization, and machine learning.
This entry was posted in Course on Concentration Inequalities. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s