M3L: Efficient Max-Margin Multi-Label Classification

The objective in multi-label classification is to tag a data point with a set of relevant labels. Given a set of L labels, a data point can be tagged with any of the 2^L possible subsets. The main challenge therefore lies in optimising over this exponentially large label space subject to label correlations.

The following is efficient code for learning a max-margin multi-label classifier. The code implements the M3L algorithm and the problem formulation, optimisation details and run times on benchmark datasets can be found in the paper. The code contains specialised algorithms for the kernelised and the linear cases. The code is very efficient at learning 1-vs-All SVMs as it re-utilises the kernel cache. We were able to train on the RCV1 dataset (23K training data points, 47K dimensional sparse features and 103 labels) using an RBF kernel in 817 seconds while training in a 1-vs-All fashion using LIBSVM took 5600 seconds. For the linear case, we were able to train on RCV1 with nearly a million training points in 18 minutes using a single core of an 2.66 GHz Xeon processor (we had more or less reached the global optimum within 6 minutes). The algorithm learns label correlations present in the training dataset but prior knowledge about how labels are correlated can also be incorporated.

Download source code and Windows/Linux binaries

The code is in C++ and should compile on 32/64 bit Windows/Linux machines. Please set "#define WINDOWS 1" in "include/definitions.h" when compiling on Windows machines (see the included README.txt). This code is made available as is for non-commercial research purposes.

Usage

Training: The command
M3L -train [options] training_file model_file
reads in training data from training_file and saves the learnt M3L classifier in model_file subject to the following options

-C misclassification penalty. Default 0.5
-a optimisation algorithm:
- 0: Kernelised -- SMO (Default)
- 1: Linear -- Dual co-ordinate ascent with shrinkage (choose this if you have a linear kernel)
-t tau. Stopping parameter. Algorithm stops when all projected gradients have magnitude less than tau. Default 0.001
-k kernel type (valid only with -a 0).
- 0: Linear: x'*y
- 1: RBF: exp(-gamma*||x-y||^2) (Default)
- 2: Polynomial: (gamma*x'*y+u_0)^degree
-g gamma (kernel parameter for RBF and Polynomial kernels). Default 1
-d degree. Default 3
-r u_0. Default 1
-b bias. Default 1
-m cache size in MB. Default 1000 (the best results are obtained when the cache is large enough to hold the entire kernel matrix)
-R correlation matrix file. Look at example_R.txt for an example correlation matrix file (Defaults to using the identity matrix).

Testing: The command
M3L -test testing_file model_file output_file
applies the learnt M3L classifier in model_file to the test data in testing_file and saves the predictions in output_file.

The training, testing, model and output prediction files all follow the LIBSVM file format.

Toy Example

A toy example has been included with the source code and binaries. Unpack the code and try

> cd toyexample
> ../bin/OS/M3L -train -k 1 example_data.svm your_example_model_kernel_withoutR.txt

where OS can be one of linux, linux64, windows or windowsx64. This trains the M3L classifier with an RBF kernel on the toy data. No a priori information about label correlations is provided. Compare your learnt model to the one provided to make sure that there were no errors.

To incorporate the correlation matrix given in example_R.txt try

> ../bin/OS/M3L -train -k 1 -R example_R.txt example_data.svm your_example_model_kernel_withR.txt

Results for the linear kernel can be generated by

> ../bin/OS/M3L -train -a 1 example_data.svm your_example_model_linear_withoutR.txt

Outputs on the test file example_test.svm can be generated by

> ../bin/OS/M3L -test example_test.svm your_example_model_linear_withoutR.txt your_example_output_linear_withoutR.txt

The Hamming Loss that should be obtained is as follows:

Hamming Loss for label 0 : 14.000002%
Hamming Loss for label 1 : 1.000000%
Net Hamming Loss: 7.500000%

Please go through the included README and the reference below for details.

References

B. Hariharan, S. V. N. Vishwanathan and M. Varma. Efficient max-margin multi-label classification with applications to zero-shot learning. Machine Learning Journal, , 88(1):127--155, 2012.
Bibtex source | Abstract | Download in pdf format

B. Hariharan, L. Zelnik-Manor, S. V. N. Vishwanathan and M. Varma. Large scale max-margin multi-label classification with priors. In Proceedings of the International Conference on Machine Learning, Haifa, Israel, June 2010.
Bibtex source | Abstract | Download in pdf format

B. Hariharan, L. Zelnik-Manor, S. V. N. Vishwanathan and M. Varma. Proof of convergence of the M3L algorithm.
Download in pdf format

Home Page