Hamming Compressed Sensing-recovering k-bit quantization from 1-bit measurements with linear non-iterative algorithm

We developed a new compressed sensing type signal acquisition paradigm called “Hamming Compressed Sensing (HCS)” to recover signal’s k-bit quantization rather than itself. Directly recovering quantization is much more preferred in practical digital systems. HCS provides a linear, non-iterative quantization recovery algorithm requiring a small amount of measurements, with strong theoretical guarantee of the recovery success. The underlying idea of HCS is to sacrifice the inevitable quantization error for reducing measurements and improving recovery efficiency substantially. Another compelling property of HCS is that it provides a very low-cost sampling scheme.

You can find the submitted HCS paper on Arxiv and especially more information (such as experimental results) on the HCS homepage.

Here is the abstract:

Compressed sensing (CS) and 1-bit CS cannot directly recover quantized signals and require time consuming recovery. In this paper, we introduce \textit{Hamming compressed sensing} (HCS) that directly recovers a k-bit quantized signal of dimensional $n$ from its 1-bit measurements via invoking $n$ times of Kullback-Leibler divergence based nearest neighbor search. Compared with CS and 1-bit CS, HCS allows the signal to be dense, takes considerably less (linear) recovery time and requires substantially less measurements ($\mathcal O(\log n)$). Moreover, HCS recovery can accelerate the subsequent 1-bit CS dequantizer. We study a quantized recovery error bound of HCS for general signals and “HCS+dequantizer” recovery error bound for sparse signals. Extensive numerical simulations verify the appealing accuracy, robustness, efficiency and consistency of HCS.

HCS aims to:

Recovers a k-bit quantized signal from its 1-bit measurements.

Highlighting Features of HCS:

1. HCS provides simple and low-cost sampling and recovery schemes: the sampling and sensing are integrated to 1-bit measurements, while the recovery and quantization are integrated to quantized recovery. Furthermore, both the 1-bit measurement and the quantized recovery do not require analog-to-digital converters (ADC) with a high sampling rate.

2. The recovery in HCS only requires to compute nk Kullback-Leibler (KL) divergences for obtaining k-bit recovery of an n-dimensional signal, and thus is a non-iterative, linear algorithm. Only simple computations are required.

3. According to the theoretical analysis of HCS, merely m=O(log n) 1-bit measurements are sufficient to produce a successful quantized recovery with high probability. Note there is no sparse assumption to the signal x.
Posted in Tianyi's work | Tagged , , , | 3 Comments

News about GoDec code and ICML 2011 paper

We recently published a google site for Go Decomposition (GoDec), presented on ICML 2011. Now you can find all the available information and upcoming news about GoDec on


On the new site, there are 3 resources about GoDec we would like to highlight and share with you:

1. Videos for the background modeling experiments on 4 200-frame video sequences, all the sequences are decomposed as X=L+S+G and shown on Youtube.

2. The MatLab code for GoDec algorithm and background modeling. Note this is a beta version, so further revision and modification for acceleration and clear readability can be expected.

3. Video for ICML 2011 presentation can be found here.

Posted in Tianyi's work | Tagged , , , , , , , | 5 Comments

GoDec: Randomized Low-rank & Sparse Matrix Decomposition in Noisy Case

This paper is accepted by ICML 2011 for presentation. Now the final version is ready and can be downloaded from here GO.


Low-rank and sparse structures have been profoundly studied in matrix completion and compressed sensing. In this paper, we develop “Go Decomposition” (GoDec) to efficiently and robustly estimate the low-rank part $L$ and the sparse part $S$ of a matrix $X=L+S+G$ with noise $G$. GoDec alternatively assigns the low-rank approximation of $X-S$ to $L$ and the sparse approximation of $X-L$ to $S$. The algorithm can be significantly accelerated by bilateral random projections (BRP). We also propose GoDec for matrix completion as an important variant. We prove that the objective value $\|X-L-S\|_F^2$ converges to a local minimum, while $L$ and $S$ linearly converge to local optimums. Theoretically, we analyze the influence of $L$, $S$ and $G$ to the asymptotic/convergence speeds in order to discover the robustness of GoDec. Empirical studies suggest the efficiency, robustness and effectiveness of GoDec comparing with representative matrix decomposition and completion tools, e.g., Robust PCA and OptSpace.

The most interesting aspect of GoDec is that it builds an extremely simple model and develops an fast, randomized algorithm to produce an approximated low-rank + sparse decomposition when rank and cardinality are constrained, and the extra noise exists. This is always the case in real applications, where the model complexity needs to be pre-determined, and neither the low-rank nor sparse part is the noise.

Another interesting aspect is that the linear convergence, asymptotic speed and robustness to the noise of GoDec can be well proved in the scheme of “alternating projections on manifolds”. Actually, the naive GoDec can be written as alternating projections of one variable on two manifolds. By analysing the properties of the manifolds and projections, some significant theoretical conclusions can be drawn.

An important variant of GoDec can be applied to matrix completion problem with competitive efficiency.

The GoDec algorithm itself is fairly simple for implementation.

Here is two applications of GoDec in computer vision (similar effect with RPCA, but with different model assumption and algorithm, and much faster speed):

Experiment on background modeling of four 200-frame surveillance video sequences

Top left: lobby in an office building (resolution 128X160, learning time 39.75 seconds). Top right: shopping center (resolution 256X320, learning time 203.72 seconds). Bottom left: Restaurant (resolution 120X160, learning time 36.84 seconds). Bottom right: Hall of a business building (resolution 144X176, learning time 47.38 seconds).

Experiment on shadow/light removal of face images from four individuals in Yale B database.

Each individual has 64 images with resolution 192X168 and needs 24 seconds learning time.

It is worthy to note the difference between GoDec and Robust PCA with its extension stable principal component pursuit (SPCP). They have intersections on their application tasks. But their assumptions to data, their models and optimization algorithms are extirely different, thus they works for different situations.

Posted in Tianyi's work | Tagged , , , , , , , , | 2 Comments

Multi-label Learning via Structured Decomposition and Group Sparsity

This paper is now available on arxiv:


In multi-label learning, each sample is associated with several labels. Existing works indicate that exploring correlations between labels improve the prediction performance. However, embedding the label correlations into the training process significantly increases the problem size. Moreover, the mapping of the label structure in the feature space is not clear.

In this paper, we propose a novel multi-label learning method “Structured Decomposition + Group Sparsity (SDGS)”. In SDGS, we learn a feature subspace for each label from the structured decomposition of the training data, and predict the labels of a new sample from its group sparse representation on the multi-subspace obtained from the structured decomposition.

In particular, in the training stage, we decompose the data matrix $X\in R^{n\times p}$ as $X=\sum_{i=1}^kL^i+S$, wherein the rows of $L^i$ associated with samples that belong to label $i$ are nonzero and consist a low-rank matrix, while the other rows are all-zeros, the residual $S$ is a sparse matrix. The row space of $L_i$ is the feature subspace corresponding to label $i$. This decomposition can be efficiently obtained via randomized optimization. In the prediction stage, we estimate the group sparse representation of a new sample on the multi-subspace via group \emph{lasso}. The nonzero representation coefficients tend to concentrate on the subspaces of labels that the sample belongs to, and thus an effective prediction can be obtained.

SDGS finds the mapping of labels in the feature space, where the label correlations are naturally preserved in the corresponding mappings. Thus it explores the label structure without increasing the problem size. SDGS is robust to the imbalance between positive and negative samples, because it uses group sparsity in the multi-subspace to select the labels, which considers both the discriminative and relative information between the mappings of labels in feature subspace.

Posted in Tianyi's work | Tagged , , , , , , | Leave a comment

NESVM: a Fast Gradient Method for Support Vector Machines

This is a work we finished one year ago and is accepted as a regular paper in ICDM 2010. NESVM is a fast SVM algorithm for primal linear and kernel SVM problems, with the optimal convergence rate of first-order method and simple computation in each iteration. The PDF of the paper can be downloaded here:

NESVM: a Fast Gradient Method for Support Vector Machines

You can also find it on arXiv. SVM is a fundamental classification tool for various machine learning tasks. However, its efficiency is not satisfying for large-scale classification problems. There exist several appealing SVM solvers, e.g., LIBSVM, SVM-Light, SVM-Perf, Pegasos. Most of them focus on solving dual SVM or solving primal SVM in an online way.

Compared with existing SVM solvers, NESVM is competitive in three aspects:

  • NESVM is a fast gradient method with the optimal convergence rate at \mathcal O(1/k^2) and linear time complexity. It is based on Nesterov’s method proposed in paper Smooth minimization of non-smooth functions. Only two matrix-vector multiplications are required in each iteration round.
  • NESVM directly tackles the original primal problem of batch SVM and its implementation is fairly simple, thus it can be easily extended to and embedded in many other complex classification tasks.
  • NESVM proposes novel smooth and differentiable alternatives for hinge loss and \ell_1 norm, which are commonly used in machine learning and difficult to optimize because they are not differentiable everywhere.

We compared the performance of NESVM with other popular SVM solvers on several datasets, here is an example of event recognition in computer vision:

Time costs vs. constant C in event recognition

I will post the code of NESVM in a few days.

Posted in Tianyi's work | Tagged , , , , | 2 Comments

Compressed sensing review (1): Reconstruction Algorithms

CS reconstruction algorithms are always the most astonishing thing for people who know compressed sensing at the first time. Because only a few sampling (much less than Shannon-Nyquist sampling rate) can perfectly reconstruct the whole signal is really a big surprise. Rice’s single-pixel camera can recover a complex image after sampling only several random projections of its pixels:

Rice's Single-pixel camera

Another successful example is the CS reconstruction for fMRI imaging (reconstruction, denoising, deblurring). Since 1) the fMRI image is naturally compressible in certain transform domains, e.g., wavelet, DCT, DFT; 2) fMRI scanners naturally sample the image in spatial frequency domain, where the signal is sparse, CS can reconstruct the fMRI images from a few samples via fMRI scanners. SparseMRI by Michael Lustig, David Donoho and John M. Pauly is a successful application of CS in fMRI imaging.

The more astonishing thing than the accurate reconstruction is that the magical reconstruction can be easily obtained by easy convex optimization (e.g., linear programming). Roughly to say, in compressed sensing, we have measurements (samples) y, measuring matrix A, and we know y=Ax, wherein x is the sparse signal we need to recover. Since the number of measurements \|y\|_0 are much less than the dimension of the signal \|x\|_0, directly solving linear system y=Ax is impossible and ill-posed. Thus we have to further restrict the solution x in a small region where uniqueness of x can be guaranteed, just like we did in regularization. Since we know x is sparse, we can restrict the number of nonzero entries in x, i.e.,


Sometimes it is equivalent to

\min_x~~\|y-Ax\|_2^2~~~~~s.t.~~\|x\|_0\leq k.

Greedy algorithms can solve this problem by selecting the most significant variable in x for decreasing the least square error \|y-Ax\|_2^2 once a time. The greedy search starts from x=0. In each iteration, the most important and unselected variable (different importance criterion can be designed) is added to the support set, then \|y-Ax\|_2^2 is minimized on the support set composed of the selected variables. The iterations stop when the number of the selected variables reach k. Famous representatives are Orthogonal matching pursuit (OMP)compressive sampling matching pursuit (CoSaMP) and original lasso.

However, the \ell_0 norm of x is not convex and thus the convergence to the global optimum is hard to be ensured for greedy algorithms. In compressed sensing, E. J. Candès has proved that when the RIP of the measurement matrix A satisfies certain condition (\delta_{2k}\leq\sqrt{2}-1), the \ell_0 norm minimization can be equivalently replaced by its convex relaxation \ell_1 norm (we will go back to this in later notes). Thus the problems are relaxed as:



To see why \|x\|_1 can encourage a sparse solution x, we show the convergence processes of \ell_1 constrained least square and \ell_2 constrained least square: The black spot is the starting point and the red one is the final solution. The above optimization problems are convex and have global optimums. Thus many algorithms based on convex optimization methods are proposed. But the \ell_1 norm in them is not smooth and thus the gradient w.r.t x near x_i=0 is cannot be obtained. Unfortunately most convex optimization methods are based on gradient. Hence the \ell_1 based compressed sensing problem should be transformed to more convenient forms. The most common method to eliminate the \ell_1 norm is to double the size of x. In particular, the \ell_1 norm$ can be equivalently denoted by \|x\|_1=u+v, where u=\max(0,x) is the positive part of x and v=-\min(0,x) is the negative part of x. This is exactly the technique that has been applied in Basis Pursuit (BP) and LP decoding.

\min_{u,v}~~\sum_i(u_i+v_i)~~~~~s.t.~~y=A(u-v), u\geq 0, v\geq 0.

Then standard linear programming can be applied to solve this problem. However, the time cost of this LP is not satisfying and another easier optimization formulation using \ell_1 penalty is often adopted:


This is an unconstrained formulation of compressed sensing. Its equivalence to the problem \min_x~~\|y-Ax\|_2^2~~~~~s.t.~~\|x\|_1\leq\epsilon has been mentioned in many papers, but how to calculate \lambda from \epsilon and vice versa to obtain an equivalent pair is still an open problem. In existing algorithms, \lambda is determined by so called “continuation” or “warm start” or “homotopy” method, in which a series of above optimization problems with different \lambda from large to small are solved, the solution of the previous optimization is adopted as the initial point of the next one, a solution path is then constructed and the preferred solution is selected from the path. Least angle regression (LARS) and its variants are the representatives.

Gradient Projection for Sparse Reconstruction (GPSR) doubles the variable size of the \ell_1 penalize compressed sensing problem:

\min_{u,v}~~\|y-A(u-v)\|_2^2+\lambda(1^Tu+1^Tv)~~~~~s.t.~~u,v\geq 0.

Gradient projection method is applied to optimize the above problem. GPSR decreases the objective function on the negative gradient direction and projects the solution onto the nonnegative quadrant in each iteration. Another algorithm doubles the variable size and solves the \ell_1 penalize compressed sensing problem is an interior point method for large-scale \ell_1 regularized regression. It applies a truncated newton interior point method to solve the following problem:

\min_{x,u}~~\|y-Ax\|_2^2+\lambda 1^Tu~~~~~s.t.~~-u\leq x\leq u.

A logarithmic barrier is conveniently build from the bound constraint in above problem to keep the central path from the bound in primal interior point method. Then a preconditioned conjugate gradient method is utilized to approximate the Hessian matrix. Thus the computation is accelerated.

Another method to eliminate the \ell_1 norm is to replace it with a smooth approximation that is differentiable everywhere. An efficient smoothing method is proposed in Nesterov‘s Smooth minimization of non-smooth functions paper. It firstly writes the nonsmooth function minimization in a minimax way:

\min_x\|x\|_1=\min_x\max_uu^Tx, -1\leq u_i\leq 1.

Then smoothing \ell_1 norm is to add a prox-function of u to the minimax objective function:

smoothed~~\min_x\|x\|_1=\min_x\max_uu^Tx+\frac{\mu}{2}\|u\|_2^2, -1\leq u_i\leq 1.

The \ell_1 norm and smoothed \ell_1 norm are compared in the following figure. \mu is a parameter to adjust the smoothing accuracy.

L1 norm and its smooth approximation

When \ell_1 norm is replaced by its smooth approximation, most existing convex optimization methods are available to the compresses sensing problem. Nesterov’s method is a fast gradient method with the optimal \mathcal O(1/k^2) convergence rate. It combines the current gradient and the historical gradient information to determine the optimization direction of each iteration. In compressed sensing, NESTA and FISTA are based on Nesterov’s method, the former adopts the smooth approximation of \ell_1 norm.

Iterative thresholding is a another kind of compressed sensing algorithm with fast speed for solving the \ell_1 penalized least square problem. The algorithm is composed of two stages: optimization of the least square term and decreasing of \ell_1 norm. The former stage is accomplished by solving the optimization problem without \ell_1 penalty, while the later stage is finished by thresholding of the magnitudes of entries in x. There are two kinds of thresholding have been used. One is hard thresholding, which directly assigns the entries with smallest magnitudes to zero. Another is soft thresholding, which subscribes all the entries’ magnitudes by \delta and assigns the ones whose magnitudes are less than \delta to zero.

Representative algorithms are message passing, iterative splitting and thresholding (IST) and iterated hard shrinkage. Iterative thresholding methods are competitive in speed. But it is not easy to demonstrate their convergence to the global optimum in lots of situations.

Bregman iterative algorithm and Fixed point continuation (FPC) are developed by CAAM at Rice University around 2007. For \ell_1 penalized least square problem, the Bregman distance is defined by:

D(x,x^k)=\|x\|_1-\|x^k\|_1-\langle p, x-x^k\rangle, p=\partial \|x\|_1.

The subgradient p is defined as:

p_i={\rm sign}(x_i), {\rm if} ~~x_i\neq 0

p_i\in[-1,1], {\rm if} ~~x_i= 0

In each iteration, a subproblem with Bregman distance regularization is solved:

x^{k+1}=\arg\min_x\|y-Ax\|_2^2+\lambda D(x,x^k).

This gives a soft-thresholding of x. And then the subgradient p is updated:


The speed of Bregman iterative algorithm and FPC (FPC_AS is faster) is very appealing and this kind of method has very good theoretical properties in convergence and optimality. Moreover, it has connection to widely used soft-thresholding method and augmented Lagrangian method. FPC and FPC_AS has successful applications in totalvariation minimization in image denoising.

In this note, the existing compressed sensing algorithms are classified into four sorts: those based on greedy search, those based on convex optimization, those based on iterative thresholding and those based on Bregman iteration. I believe there are a great amount of compressed sensing algorithms that are not included in this note. To find the other algorithms, their papers and codes, you can refer to Compressed sensing: A Big Picture.

There are also some model selection algorithms proposed in statistics literatures which can be applied to compressed sensing problems, e.g., Least angle regression, Elastic net and so on. These algorithms will be discussed in another note about the relationship between compressed sensing and statistics.

Posted in notes | Tagged , , , , , , , , | 6 Comments

Compressed sensing review (0)

I plan to write an easy understanding review of compressed sensing, which is a popular topic that attracts considerable attention from 2004 and appears being related to more and more fields in recent years. The review will be posted in this blog as a series of notes about several aspects of compressed sensing:

  1. Compressed sensing reconstruction algorithms (\ell_1 minimization, sparse signal recovery, reweighted/adaptive \ell_1);
  2. Sparse coding (dictionary learning, K-SVD, MOD, MoTIF, FOCUSS);
  3. Compressed sensing measurement (RIPs);
  4. Why compressed sensing can exactly recover a sparse signal? (Incoherence, uncertainty principles)
  5. Compressed sensing and statistics (feature selection, lasso, structure sparsity, covariance selection, group testing, permutation test)
  6. Compressed sensing and learning (sparse dimension reduction, sparse PCA, data/label compression, model selection)

A brief history of compressed sensing:

The fundamental ideas of compressed sensing can be found from some literatures since  1960. David Dohono develops and completes the theories of compressed sensing in his early papers, e.g., UNCERTAINTY PRINCIPLES AND SIGNAL RECOVERY. Terence Tao and Emmanuel Candes construct the whole theoretical framework in several papers they published from 2005-2006, which can be found here. The rest, as they say, is history.

A brief introduction to compressed sensing:

For a sparse signal x, a few measurements (often linear and random, for example, y=Ax) much less than what Shannon-Nyquist sampling theorem determines are sufficient to exactly recover the signal x by simple convex optimization.

Below is an illustration:

Illustration of compressed sensing

Posted in notes | Tagged , , | 2 Comments

Compressed sensing games

Compressed sensing has been proved having intimate connections to tremendous problems in signal processing and harmonic analysis. The fundamental idea of compressed sensing and its extension matrix completion can be illustrated as:

It is possible to recover a signal (e.g., series, matrix, data, …) from a few measurements (e.g., FFT, DCT, sampling, linear, nonlinear, …) of it, if the signal has some particular properties (e.g., sparse, grouping, low-rank, low rank+sparse, on manifold, …).

What is more interesting is that compressed sensing is also strongly related to several intelligent games. To name a few:

1. Sudoku.

Example of Sudoku

The objective is to fill a 9×9 grid with digits so that each column, each row, and each of the nine 3×3 sub-grids that compose the grid (also called “boxes”, “blocks”, “regions”, or “sub-squares”) contains all of the digits from 1 to 9. The measurements are the known entries, and the properties is the ruleset. The problem can be formulated into a linear system with sparse solutions, which is exactly a compressed sensing problem. An interesting paper named “Linear Systems, Sparse Solutions, and Sudoku” can be found at:


The link of corresponding code in Python and MatLab can be found at


2. 12 balls problem. There are 12 balls with the same appearance. One of them has different weight compared with the others. One balance is given and only 3 measurements are permitted. The goal is to identify the different ball and justify whether it is heavier or lighter. Here is an example solution:

Example of 12-balls problem

Example of 12-balls problem

The signal is a vector with 1 or -1 on the index w.r.t the different ball and 0 on the other entries. The measurements are the 3 weighting experiments on the balance. So again, a compressed sensing can be formulated. It can also be formulated as a Group testing problem. There is a paper illustrating the relationship between compressed sensing and group testing:

Group Testing and Sparse Signal Recovery

The above are only two representatives and I believe there exist many. It is very interesting that some of these problems are discrete optimization problems. This may provide potential to solve discrete optimization problems by using compressed sensing techniques.

Igor makes a website about compressed sensing games, where other 3 games related to compressed sensing are introduced. You can find it here. Thanks a lot, Igor!

Posted in notes | Tagged , , | 2 Comments

Manifold Elastic Net: A Unified Framework for Sparse Dimension Reduction

Manifold Elastic Net is the work we have done one year ago and it is about to be published on Data Mining and Knowledge Discovery (Springer) recently. You can either find it on the website of DMKD:


or the arXiv:


An earlier published work about Manifold Elastic Net on 2009 IEEE International Conference on Systems, Man, and Cybernetics (SMC 2009) can be found on:


The main contribution of this paper is to formulate the sparse dimension reduction problem into an penalized least square problem rather than a sparse PCA problem. For the former, there exists tremendous fast algorithms in compressed sensing and statistics. For the latter, the existing algorithms are not very satisfying in speed and the optimality of the solution.

Sparse dimension reduction can improve the dimension reduction results in several aspects, e.g., feature selection with clear interpretation, acceleration of subsequent machine learning tasks because of sparse representation. There is an interesting experiment in the paper to sequentially select the significant features on human face by using Manifold Elastic Net:

The solution paths of Manifold Elastic Net and feature selection on human face

The solution paths of Manifold Elastic Net and feature selection on human face

Another advantage of Manifold Elastic Net is that many existing manifold learning methods can be included into its framework. I also recommend Section 2.3 Classification error minimization of the paper, in which a novel indicator (label) matrix coding method is proposed. It can be used in other discriminative dimension reduction problems to obtain a least square regression formulation.

The framework of Manifold Elastic Net can also extended to sparse coding and manifold based compressed sensing. I believe there will be some future works about these important extensions.

Posted in Tianyi's work | Tagged , , , , , | Leave a comment

Fast Gradient Clustering

This is a work we have posted and given a spotlight talk on NIPS 09 Workshop on Discrete Optimization in Machine Learning: Structures, Algorithms and Applications (DISCML). You can find it on:


Fast Gradient Clustering (FGC) tackles the two fundamental clustering problems: the minimal sum-of-squares (MSSC) and normalized cut (N-cut). Both of these two problems are 0-1 programming with nonlinear object function, so various approximation algorithms are proposed for a practical solution. The most well-known are K-means and Spectral clustering.

However, the two approximation algorithms are loose relaxations of the original problem and thus has several shortcomings. K-means is sensitive to the initial clustering centroids. Spectral clustering is a loose convex relaxation and post-processing based on K-means is indispensable, which ruins the global optimality of the solutions.

In Fast Gradient Clustering, we adopt an unified and tighter SDP relaxation of the original clustering problems and develop an algorithm much faster then standard SDP solves to optimize it. The dual problem of the SDP relaxation is formulated as an unconstrained maximum eigenvalue minimization and then smoothed. FGC applies Nesterov’s method to optimize the smooth dual problem with fast convergence rate $O(1/k^2)$ and merely gradient decent time cost in each iterate. The gradient computation can further be accelerated by approximated matrix exponential calculation.

The experiments shows FGC is more stable in optimization and its clustering results are superior than K-means and Spectral clustering on common used datasets.

FGC has a lot of potential applications in image segmentation and text clustering owing to its robustness, high accuracy and fast speed.

Posted in Tianyi's work | Tagged , , , , , , , | Leave a comment