Since long ago I wanted to write a post with an elementary introduction to soft / hard thresholding. The time has come to pay the dues, so here we go.
We are concerned with the standard problem of estimating a function on the unit grid on
Given an estimator of let us evaluate the error in terms of the mean square error restricted to the grid,
where is the empirical norm of the signal sampled signal which approximates the functional -norm on a finite grid. Since the loss is rotation-invariant, we may transfer the problem in the frequency domain by means of the Discrete Fourier transform, dealing now with the “small noise”-observations
for which Now, given a level consider the soft thresholding estimator
which can also be defined as the unique (due to strong convexity) solution of
and its counterpart, the hard thresholding estimator with if and zero otherwise.
Given the norm of the true signal both estimators with thresholding level provide that with probability at least
Moreover, this is the minimax risk for an -ball.
Remark. As an illustration of this result, consider first the Bayesian formulation with the restriction As such, there is no hope of consistent estimation, since and are of the same order (if that ‘argument’ did not convince you, apply the Van Trees inequality). And indeed, one may apply the shrinkage estimator (also called the Wiener filter)
with the shrinkage parameter (to get the optimal shrinkage, one may guess that the least favorable distribution is Gaussian and then explicitly minimize the corresponding Bayesian risk). The corresponding MSE is just 1. But if we restrict the priors to be such that Theorem 1 will give
Proof. We will prove only the upper bound. For that, first notice that the loss of hard thresholding can never be greater than that of soft thresholding with the same threshold (it is the same if we are below the threshold and obviously not greater when we are above), hence we can only analyze soft thresholding.
The proof is based on a very simple idea of the oracle feasibility. It is easy to check that the soft thresholding estimator is the global minimizer of hence
Expanding the squared norm and subtracting from both sides, we get
Note that now in the left-hand side we have exactly what we needed, so we need to get things done with the right-hand side. The idea is to choose the minimal such that the sum of the last two terms (where the unknown messes things up) is negative. Since and using the concentration of (it is Lipschitz, and the expectation may be worked out easily), we can get that
whence with probability at least
Picking such and using Young’s inequality once more to upper bound we finish the proof. ♦