Compressed Labeling: An important extension of Hamming Compressed Sensing; at NIPS now

We are just informed that our submission “Compressed Labeling (CL) on Distilled Labelsets (DL) for Multi-label Learning” is accepted by Machine Learning Journal (Springer). Online first PDF can be downloaded here. CL is an important application and extension of Hamming Compressed Sensing (HCS) on 0-1 signals with specific structure or patterns, and label vectors in multi-label learning is such structured 0-1 signals. We tested CL-DL on 21 datasets (almost include all the available public datasets in multi-label learning). Its performance and speed are both impressive. This is a successful application of HCS in machine learning, except its applications in digital signal processing.

The draft abstract reads:

Directly applying single-label classification methods to the multi-label learning problems substantially limits both the performance and speed due to the imbalance, dependence and high dimensionality of the given label matrix. Existing methods either ignore these three problems or reduce one with the price of aggravating another. In this paper, we propose a $\{0,1\}$ label matrix compression and recovery method termed “compressed labeling (CL)” to simultaneously solve or at least reduce these three problems. CL first compresses the original label matrix to improve balance and independence by preserving the signs of its Gaussian random projections. Afterward, we directly train popular binary classifiers (e.g., support vector machines) for each new label. A fast recovery algorithm is developed to recover the original labels from the predicted new labels. In the recovery algorithm, a “labelset distilling method (LDM)” is designed to extract distilled labelsets (DLs), i.e., the frequently appeared label subsets from the original labels via recursive clustering and subtraction. Given a distilled labelset (DL) and an original label vector, we discover that the signs of their random projections have an explicit joint distribution that can be quickly computed from a geometric inference. Based on this observation, the original label vector is exactly determined after performing a series of Kullback-Leibler divergence based hypothesis tests on the distribution about the new labels. CL significantly improves the balance of the training samples and reduces the dependence between different labels. Moreover, it accelerates the learning process by training less binary classifies for compressed labels, and makes use of label dependence via DLs based tests. Theoretically, we prove the recovery bound of CL which verifies the effectiveness of CL for label compression and multi-label classification performance improvement brought by label correlations preserved in DLs. We show the effectiveness, efficiency and robustness of CL via 5 groups of experiments on 21 datasets from text classification, image annotation, scene classification, music categorization, genomics and web page classification.

The CL-DL simultaneously solves the following three main problems in multi-label learning in a general, simple framework:

1) Reduces the problem size to far less than k, wherein k is the number of labels. CL-DL only needs to train O(log k) binary classifiers, which is a significant improvement in efficiency.

2) Fully leverages the label correlations. CL-DL extracts the label correlations by finding the frequently appeared 0-1 patterns in the label vectors of training data, and uses the correlation by testing the frequent pattern’s attendance in the recovery of prediction. A clustering based binary matrix decomposition is developed to extract DLs.

3) Eliminates sample balance problem when transforming multi-label learning problems to multiple binary classification problems. The positive and negative samples for each binary classifier training have the balanced amounts. The binary classifier can be selected as any kind, such as SVM, logistic regression, etc.

For those who are at NIPS 2011: I am now in Granada, Spain attending NIPS 2011 that will start from tomorrow and end at Dec 18. Our paper about Hamming Compressed Sensing (HCS), entitled as “1-bit Compressed Quantization”  will be presented at the Workshop on Sparse Representation and Low-rank Approximation in the poster session at Dec 16. If you are at NIPS 2011 and interested in HCS, we can have some discussions.

For those who are interested in CL: Since I am in Spain, I am sorry the final PDF version of CL-DL paper will be available only after NIPS 2011. Its MatLab code will be released after its formal publication.

For those who are interested in Hamming Compressed Sensing (HCS) code: Thank you very much for your attentions and interests to HCS! I am so sorry to tell you that the code will not be available until the first HCS formal paper is accepted. I hope this will happen soon. CL-DL is a degenerated HCS and its code can be anticipated to be released earlier than HCS. Hope this can help.

PS: Granada is a beautiful, ancient, small city. Today is a little bit cloudy, hope it will be sunny tomorrow. Have to rest after 25 hours’ long journey.

Advertisements

About tianyizhou

Research Assistant at University Washington, Seattle, Working on Machine Learning & Statistics in MODE lab leaded by Prof. Carlos Guestrin, and MELODI lab leaded by Prof. Jeff Bilmes.
This entry was posted in Tianyi's work and tagged , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s