## Subexponential distributions III — Johnson–Lindenstrauss lemma

In this post we will finish our affair with subexponential distributions. We will consider yet another example where they can be applied, an important statement called Johnson–Lindenstrauss lemma. Essentially, it states that  $n$  points in a high-dimensional Euclidean space may be projected in a subspace with dimension only  $O(\log n)$  such that all pairwise distances are well-preserved.

Theorem  (Johnson–Lindenstrauss)

Let  $\delta > 0$  and  $x_1, .., x_n \in \mathbb{R}^d.$  For any

$\displaystyle m \ge \frac{16}{\delta^2}\,\log n$

there exists a linear map $F: \mathbb{R}^d \to \mathbb{R}^m$  such that  for any  $\displaystyle 1 \le i \le j \le n,$

$\displaystyle (1-\delta)\|x_i-x_j\|_2^2 \le \left\|F(x_i)-F(x_j)\right\|_2^2 \le(1+\delta)\|x_i-x_j\|_2^2.$

Remark 1.  Note that  $m$  depends on  $n$  but not on  $d.$  It is no wonder since the affine dimension of  $n$  points  is  obviously at most  $n-1.$

Remark 2.  One may actually require that  $F$  is a projection, but then the proof becomes a bit more technical.

Remark 3.  In the disrete case, the analogous statement holds for the Hamming distance.

Proof.  Consider the following random map, which is ‘almost’ a projection due to the normalization $1/\sqrt{m}:$

$\displaystyle F(x) = \frac{Y}{\sqrt{m}} \,\, x,$

for the random matrix  $\displaystyle Y \in \mathbb{R}^{m \times d}$  with i.i.d.  $\displaystyle \mathcal{N}(0,1)$  entries.  Note that  $\displaystyle \frac{\|Yx\|_2^2}{\|x\|_2^2} \sim \chi_m^2,$  which is  $(4m, 4)$-subexponential.  As a result, we get that for any fixed  $x \in \mathbb{R}^d,$

$\displaystyle P\left(\left|\frac{\left\|F(x)\right\|_2^2}{\left\|x\right\|_2^2} - 1\right| \ge \delta \right) \le 2 \exp\left( -\frac{m\delta^2}{8}\right).$

Applying the union bound for  $\displaystyle {n \choose 2}$  points of the form  $x = x_i - x_j,$  $1 \le i \le j \le n,$  we obtain

$\displaystyle P\left(\exists i \ne j\,\,\, \text{s.t.}\, \left|\frac{\left\|F(x_i - x_j)\right\|_2^2}{\left\|x_i - x_j\right\|_2^2} - 1\right| \ge \delta \right) \le n^2 \exp\left( -\frac{m\delta^2}{8}\right),$

and picking  $m$  as in the premise of the theorem forces the probability be less than  $1,$ which guarantees the existence of the required map.  Moreover, choosing

$\displaystyle m \ge \frac{16}{\delta^2} \, \log \left(\frac{n}{\sqrt{\alpha}}\right)$

guarantees that a random map will preserve distances up to precision  $\delta$  with probability at least  $1 - \alpha.$  ♦

Remark.  As an immediate consequence, we get that the random Gaussian matrix will satisfy RIP with high probability. A matrix  $F \in \mathbb{R}^{m \times d}$  satisfies Restricted Isometry Property (RIP) with constant  $\delta$  if the double inequality of the Theorem  holds for any  $s$-sparse vector  $x.$  Note that since this inequality is homogeneous in  $x,$  we can just prove that it holds only for sparse unit vectors; there are  $n = {d \choose s} \le \left(\frac{ed}{s}\right)^s$  such vectors. Taking the logarithm, we get that the random Gaussian matrix will be  $\delta$-RIP w.h.p. if

$\displaystyle m = O\left(\frac{1}{\delta^2}\right) s \log\left(\frac{d}{s}\right),$

The Compressed Sensing theory then shows that a  $s$-sparse signal in a high-dimensional ambient space  $\mathbb{R}^d$  may be recovered via  $m$  noisy linear measurements

$\displaystyle y = Fx + \frac{\sigma}{\sqrt{m}} \xi, \quad \xi \sim \mathcal{N}(0,I_m).$