## Thresholding for sparse signals. Square-Root Lasso

In this post we continue with soft thresholding. Our estimator $\widehat \theta$  is an optimal solution (unique due to strong convexity) to $\displaystyle \min_{\theta} \|\theta -y\|_2^2 + \lambda \|\theta\|_1,$

and the observation model is $y = \theta^* + z,$ $z \sim\sigma_n\,\mathcal{N}(0,I_n),$  note that we included the noise variance $\sigma_n := \frac{\sigma}{\sqrt{n}}$ in $z$.  We are interested in how the loss of $\widehat \theta$ compares with ‘the oracle’ $\theta^*$  the loss for which is zero. We will assume that the signal is sparse; specifically, its support is $S \subset [n]$  with $s:=|S|.$

Posted in Sparsity, Statistics, Uncategorized | 1 Comment

## Thresholding over the l1-ball

Since long ago I wanted to write a post with an elementary introduction to soft / hard thresholding. The time has come to pay the dues, so here we go.

## Formulations of Mirror Descent

A very nice reference for this post is [this blog post].

Mirror Descent is a first-order algorithm to solve a convex optimization problem $\min_{x \in X} f(x).$  In this post we will learn several equivalent formulations of this algorithm and discuss how they are related to each other.

## Brunn-Minkowski, Prékopa-Leindler, and isoperimetry

Found a great overview of the subject with a lot of insight. Another brief intro is here.

## Lagrange duality via the Fenchel conjugate. The dual of ERM

A recurring trick in optimization is the characterization of Lagrange dual problem in terms of the much simpler Fenchel duality. This trick is used in dual methods for Machine Learning and Signal Processing, ADMM, Augmented Lagrangian, etc.

Posted in Optimization tricks | 1 Comment

## ‘Shear’ convex polyhedra: solution

Here I describe the solution to the problem from the last post. Spoilers!

## ‘Shear’ convex polyhedra

I came up with what seems quite a beautiful geometric construction solving the problem described below. On a compact convex set $C \subset \mathbb{R}^n$  consider the function $\displaystyle w_1(x) = \max_{y \in C}\, y_1 - \min_{y \in C}\, y_1$ $\displaystyle\text{s.t.}\,\,\,\, y_2=x_2,\, ...,\, y_n = x_n.$

where we denote $x = (x_1,\, ...,\, x_n).$  In other words, “hanging” in the point $x,$  we measure the “width” of the set along the first axis of the Cartesian system (note that $w_1(x)$  actually doesn’t depend on $x_1$). In the same fashion we define functions $w_2(x),\, ...,\, w_n(x)$  ( $w_i(x)$ measures the “width” along the $i$-th axis). Now consider the following functionals of $C$: $\displaystyle w^*(C) = \max_{x \in C} \left[\sum_{i=1}^n w_i(x)\right];$ $\displaystyle w_i^*(C) = \max_{x \in C}\, w_i(x);$ $\displaystyle W^*(C) = \sum_{i=1}^n w_i^*(C).$

Posted in Pretty little things | Tagged , | 1 Comment