Since long ago I wanted to write a post with an elementary introduction to soft / hard thresholding. The time has come to pay the dues, so here we go.

We are concerned with the standard problem of estimating a function on the unit grid on

Given an estimator of let us evaluate the error in terms of the mean square error restricted to the grid,

where is the empirical norm of the signal sampled signal which approximates the functional -norm on a finite grid. Since the loss is rotation-invariant, we may transfer the problem in the frequency domain by means of the Discrete Fourier transform, dealing now with the “small noise”-observations

for which Now, given a level consider the *soft* *thresholding *estimator

which can also be defined as the unique (due to strong convexity) solution of

and its counterpart, the *hard thresholding* estimator with if and zero otherwise.

TheoremGiven the norm of the true signal both estimators with thresholding level provide that with probability at least

Moreover, this is the minimax risk for an -ball.

**Remark.** As an illustration of this result, consider first the Bayesian formulation with the restriction As such, there is no hope of consistent estimation, since and are of the same order (if that ‘argument’ did not convince you, apply the Van Trees inequality). And indeed, one may apply the *shrinkage estimator *(also called the Wiener filter)

with the shrinkage parameter (to get the optimal shrinkage, one may guess that the least favorable distribution is Gaussian and then explicitly minimize the corresponding Bayesian risk). The corresponding MSE is just 1. But if we restrict the priors to be such that Theorem 1 will give

**Proof. **We will prove only the upper bound. For that, first notice that the loss of hard thresholding can never be greater than that of soft thresholding with the same threshold (it is the same if we are below the threshold and obviously not greater when we are above), hence we can only analyze soft thresholding.

The proof is based on a very simple idea of the *oracle feasibility*. It is easy to check that the soft thresholding estimator is the global minimizer of hence

Expanding the squared norm and subtracting from both sides, we get

Note that now in the left-hand side we have exactly what we needed, so we need to get things done with the right-hand side. The idea is to choose the minimal such that the sum of the last two terms (where the unknown messes things up) is negative. Since and using the concentration of (it is Lipschitz, and the expectation may be worked out easily), we can get that

whence with probability at least

Picking such and using Young’s inequality once more to upper bound we finish the proof. ♦