Compressed sensing review (1): Reconstruction Algorithms

CS reconstruction algorithms are always the most astonishing thing for people who know compressed sensing at the first time. Because only a few sampling (much less than Shannon-Nyquist sampling rate) can perfectly reconstruct the whole signal is really a big surprise. Rice’s single-pixel camera can recover a complex image after sampling only several random projections of its pixels:

Rice's Single-pixel camera

Another successful example is the CS reconstruction for fMRI imaging (reconstruction, denoising, deblurring). Since 1) the fMRI image is naturally compressible in certain transform domains, e.g., wavelet, DCT, DFT; 2) fMRI scanners naturally sample the image in spatial frequency domain, where the signal is sparse, CS can reconstruct the fMRI images from a few samples via fMRI scanners. SparseMRI by Michael Lustig, David Donoho and John M. Pauly is a successful application of CS in fMRI imaging.

The more astonishing thing than the accurate reconstruction is that the magical reconstruction can be easily obtained by easy convex optimization (e.g., linear programming). Roughly to say, in compressed sensing, we have measurements (samples) y, measuring matrix A, and we know y=Ax, wherein x is the sparse signal we need to recover. Since the number of measurements \|y\|_0 are much less than the dimension of the signal \|x\|_0, directly solving linear system y=Ax is impossible and ill-posed. Thus we have to further restrict the solution x in a small region where uniqueness of x can be guaranteed, just like we did in regularization. Since we know x is sparse, we can restrict the number of nonzero entries in x, i.e.,


Sometimes it is equivalent to

\min_x~~\|y-Ax\|_2^2~~~~~s.t.~~\|x\|_0\leq k.

Greedy algorithms can solve this problem by selecting the most significant variable in x for decreasing the least square error \|y-Ax\|_2^2 once a time. The greedy search starts from x=0. In each iteration, the most important and unselected variable (different importance criterion can be designed) is added to the support set, then \|y-Ax\|_2^2 is minimized on the support set composed of the selected variables. The iterations stop when the number of the selected variables reach k. Famous representatives are Orthogonal matching pursuit (OMP)compressive sampling matching pursuit (CoSaMP) and original lasso.

However, the \ell_0 norm of x is not convex and thus the convergence to the global optimum is hard to be ensured for greedy algorithms. In compressed sensing, E. J. Candès has proved that when the RIP of the measurement matrix A satisfies certain condition (\delta_{2k}\leq\sqrt{2}-1), the \ell_0 norm minimization can be equivalently replaced by its convex relaxation \ell_1 norm (we will go back to this in later notes). Thus the problems are relaxed as:



To see why \|x\|_1 can encourage a sparse solution x, we show the convergence processes of \ell_1 constrained least square and \ell_2 constrained least square: The black spot is the starting point and the red one is the final solution. The above optimization problems are convex and have global optimums. Thus many algorithms based on convex optimization methods are proposed. But the \ell_1 norm in them is not smooth and thus the gradient w.r.t x near x_i=0 is cannot be obtained. Unfortunately most convex optimization methods are based on gradient. Hence the \ell_1 based compressed sensing problem should be transformed to more convenient forms. The most common method to eliminate the \ell_1 norm is to double the size of x. In particular, the \ell_1 norm$ can be equivalently denoted by \|x\|_1=u+v, where u=\max(0,x) is the positive part of x and v=-\min(0,x) is the negative part of x. This is exactly the technique that has been applied in Basis Pursuit (BP) and LP decoding.

\min_{u,v}~~\sum_i(u_i+v_i)~~~~~s.t.~~y=A(u-v), u\geq 0, v\geq 0.

Then standard linear programming can be applied to solve this problem. However, the time cost of this LP is not satisfying and another easier optimization formulation using \ell_1 penalty is often adopted:


This is an unconstrained formulation of compressed sensing. Its equivalence to the problem \min_x~~\|y-Ax\|_2^2~~~~~s.t.~~\|x\|_1\leq\epsilon has been mentioned in many papers, but how to calculate \lambda from \epsilon and vice versa to obtain an equivalent pair is still an open problem. In existing algorithms, \lambda is determined by so called “continuation” or “warm start” or “homotopy” method, in which a series of above optimization problems with different \lambda from large to small are solved, the solution of the previous optimization is adopted as the initial point of the next one, a solution path is then constructed and the preferred solution is selected from the path. Least angle regression (LARS) and its variants are the representatives.

Gradient Projection for Sparse Reconstruction (GPSR) doubles the variable size of the \ell_1 penalize compressed sensing problem:

\min_{u,v}~~\|y-A(u-v)\|_2^2+\lambda(1^Tu+1^Tv)~~~~~s.t.~~u,v\geq 0.

Gradient projection method is applied to optimize the above problem. GPSR decreases the objective function on the negative gradient direction and projects the solution onto the nonnegative quadrant in each iteration. Another algorithm doubles the variable size and solves the \ell_1 penalize compressed sensing problem is an interior point method for large-scale \ell_1 regularized regression. It applies a truncated newton interior point method to solve the following problem:

\min_{x,u}~~\|y-Ax\|_2^2+\lambda 1^Tu~~~~~s.t.~~-u\leq x\leq u.

A logarithmic barrier is conveniently build from the bound constraint in above problem to keep the central path from the bound in primal interior point method. Then a preconditioned conjugate gradient method is utilized to approximate the Hessian matrix. Thus the computation is accelerated.

Another method to eliminate the \ell_1 norm is to replace it with a smooth approximation that is differentiable everywhere. An efficient smoothing method is proposed in Nesterov‘s Smooth minimization of non-smooth functions paper. It firstly writes the nonsmooth function minimization in a minimax way:

\min_x\|x\|_1=\min_x\max_uu^Tx, -1\leq u_i\leq 1.

Then smoothing \ell_1 norm is to add a prox-function of u to the minimax objective function:

smoothed~~\min_x\|x\|_1=\min_x\max_uu^Tx+\frac{\mu}{2}\|u\|_2^2, -1\leq u_i\leq 1.

The \ell_1 norm and smoothed \ell_1 norm are compared in the following figure. \mu is a parameter to adjust the smoothing accuracy.

L1 norm and its smooth approximation

When \ell_1 norm is replaced by its smooth approximation, most existing convex optimization methods are available to the compresses sensing problem. Nesterov’s method is a fast gradient method with the optimal \mathcal O(1/k^2) convergence rate. It combines the current gradient and the historical gradient information to determine the optimization direction of each iteration. In compressed sensing, NESTA and FISTA are based on Nesterov’s method, the former adopts the smooth approximation of \ell_1 norm.

Iterative thresholding is a another kind of compressed sensing algorithm with fast speed for solving the \ell_1 penalized least square problem. The algorithm is composed of two stages: optimization of the least square term and decreasing of \ell_1 norm. The former stage is accomplished by solving the optimization problem without \ell_1 penalty, while the later stage is finished by thresholding of the magnitudes of entries in x. There are two kinds of thresholding have been used. One is hard thresholding, which directly assigns the entries with smallest magnitudes to zero. Another is soft thresholding, which subscribes all the entries’ magnitudes by \delta and assigns the ones whose magnitudes are less than \delta to zero.

Representative algorithms are message passing, iterative splitting and thresholding (IST) and iterated hard shrinkage. Iterative thresholding methods are competitive in speed. But it is not easy to demonstrate their convergence to the global optimum in lots of situations.

Bregman iterative algorithm and Fixed point continuation (FPC) are developed by CAAM at Rice University around 2007. For \ell_1 penalized least square problem, the Bregman distance is defined by:

D(x,x^k)=\|x\|_1-\|x^k\|_1-\langle p, x-x^k\rangle, p=\partial \|x\|_1.

The subgradient p is defined as:

p_i={\rm sign}(x_i), {\rm if} ~~x_i\neq 0

p_i\in[-1,1], {\rm if} ~~x_i= 0

In each iteration, a subproblem with Bregman distance regularization is solved:

x^{k+1}=\arg\min_x\|y-Ax\|_2^2+\lambda D(x,x^k).

This gives a soft-thresholding of x. And then the subgradient p is updated:


The speed of Bregman iterative algorithm and FPC (FPC_AS is faster) is very appealing and this kind of method has very good theoretical properties in convergence and optimality. Moreover, it has connection to widely used soft-thresholding method and augmented Lagrangian method. FPC and FPC_AS has successful applications in totalvariation minimization in image denoising.

In this note, the existing compressed sensing algorithms are classified into four sorts: those based on greedy search, those based on convex optimization, those based on iterative thresholding and those based on Bregman iteration. I believe there are a great amount of compressed sensing algorithms that are not included in this note. To find the other algorithms, their papers and codes, you can refer to Compressed sensing: A Big Picture.

There are also some model selection algorithms proposed in statistics literatures which can be applied to compressed sensing problems, e.g., Least angle regression, Elastic net and so on. These algorithms will be discussed in another note about the relationship between compressed sensing and statistics.


About tianyizhou

Research Assistant at University Washington, Seattle, Working on Machine Learning & Statistics in MODE lab leaded by Prof. Carlos Guestrin, and MELODI lab leaded by Prof. Jeff Bilmes.
This entry was posted in notes and tagged , , , , , , , , . Bookmark the permalink.

6 Responses to Compressed sensing review (1): Reconstruction Algorithms

  1. idmsj says:

    Great review! But I wonder if it’s okay to miss reconstruction with noise here, which leads to quadratic constrained LP.

    Latex eqns not properly displayed several places, e.g. $\ell_1$ and $\ell_2$ above the 2nd figure, the $\mathcal O(1/k^2)$ convergence rate of Nestorov method, should the subgradient be defined explicitly?

    • tianyizhou says:

      Thx a lot for the suggestions! The LaTeX eqns are already revised and the definition of subgradient is given in the post.

      I am not very sure about the “reconstruction with noise” you mentioned, do you mean \min_x\|x\|_1~~~s.t.~~\|y-Ax\|_2^2\leq\epsilon?

  2. codemasters says:

    Good overview and explanation for people new to compressive reconstruction. Thanks!

  3. Hao Wang says:

    Forward your new note “relationship between compressed sensing and statistics”
    Could you please assess the algorithm “pathwise coordinate descent” proposed by Friedman
    Your note is the most comprehensive introduction of compress sensing I have ever seen.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s