The objective in extreme multi-label learning is to learn a classifier that can automatically tag a datapoint with the most relevant subset of labels from an extremely large label set. Parabel is a hybrid 1-vs-All/tree ensemble which efficiently learns a balanced label hierarchy from training data so as to cut overall training and prediction costs from linear to logarithmic in the number of labels. It then learns a probabilistic model based on the learnt label hierarchy for the joint label distribution conditioned on a data point. This model generalizes the multi-class hierarchical softmax model used in language modelling and word-vector embeddings to the multi-label setting. Finally, Parabel develops efficient log-time training and prediction algorithms based on the proposed probabilistic model. As a result, Parabel is able to match state-of-the-art 1-vs-All accuracies while being up to 600-900x faster at training and 60-13,000x faster at prediction. Parabel can also be up to 20x faster at training, have 10x lower model size and be up to 9% more accurate than PfastreXML. Parabel can train on millions of labels and datapoints within a few hours on a single core of a desktop and make predictions in milliseconds per test point. Please refer to the paper  for more details.
This code is made available as is for non-commercial research purposes. Please make sure that you have read the license agreement in LICENSE.doc/pdf. Please do not install or use Parabel unless you agree to the terms of the license.Download Parabel source code in C++ and Matlab as well as precompiled Windows/Linux binaries.
The code for Parabel is written in C++ and should compile on 64 bit Windows/Linux machines using a C++11 enabled compiler. Matlab wrappers have also been provided with the code. Installation and usage instructions are provided below. The default parameters provided in the Usage Section work reasonably on the benchmark datasets in the Extreme Classification Repository
Please visit the Extreme Classification Repository to download the benchmark datasets and compare Parabel's performance to baseline algorithms.
./parabel_train [input feature file name] [input label file name] [output model folder name] -T 1 -s 0 -t 3 -b 1.0 -c 1.0 -m 100 -tcl 0.1 -tce 0 -e 0.0001 -n 20 -k 0 -q 0Matlab:
parabel_train([input feature matrix], [input label matrix], [output model folder name], param)where:
-T = param.num_thread : Number of threads default=1 -s = param.start_tree : Starting index of the trees default=0 -t = param.num_tree : Number of trees to be grown default=3 -b = param.bias_feat : Additional feature value to be appended to datapoint's features. Used for bias in linear separators similar to Liblinear. default=1.0 -c = param.classifier_cost : Cost co-efficient for linear classifiers default=1.0 -m = param.max_leaf : Maximum no. of labels in a leaf node. Larger nodes will be split into 2 balanced child nodes. default=100 -tcl = param.classifier_threshold : Threshold value for sparsifying linear classifiers' trained weights to reduce model size. default=0.1 -tce = param.centroid_threshold : Threshold value for sparsifying label centroids to speed up label clustering. default=0 -e = param.clustering_eps : Eps value for terminating balanced spherical 2-Means clustering algorithm. Algorithm is terminated when successive iterations decrease objective by less than this value. default=0.0001 -n = param.classifier_maxitr : Maximum iterations of algorithm for training linear classifiers default=20 -k = param.classifier_kind : Kind of linear classifier to use. 0=L2R_L2LOSS_SVC, 1=L2R_LR (Refer to Liblinear) default=L2R_L2LOSS_SVC -q = param.quiet : Quiet option to restrict the output for reporting progress and debugging purposes 0=no quiet, 1=quiet default=0For C++, the feature and label input files are expected to be in sparse matrix text format (refer to Miscellaneous section). For Matlab, the feature and label matrices are Matlab's sparse matrices.
./parabel_predict [input feature file name] [input model folder name] [output score file name] -T 1 -s 0 -t 3 -B 10 -q 0Matlab:
output_score_mat = parabel_predict( [input feature matrix], [input model folder name], param )where:
-T = param.num_thread : Number of threads default=[value saved in trained model] -s = param.start_tree : Starting index of the trees for prediction default=[value saved in trained model] -t = param.num_tree : Number of trees to be used for prediction default=[value saved in trained model] -B = param.beam_width : Beam search width for fast, approximate prediction default=10 -q = param.quiet : Quiet option to restrict the output for reporting progress and debugging purposes 0=no quiet, 1=quiet default=[value saved in trained model]For C++, the feature and score files are expected to be in sparse matrix text format (refer to Miscellaneous section). For Matlab, the feature and score matrices are Matlab's sparse matrices.
[metrics] = get_all_metrics([test score matrix], [test label matrix], [inverse label propensity vector])
perl convert_format.pl [repository data file] [output feature file name] [output label file name]
[matrix] = read_text_mat([text matrix name]);To write a Matlab matrix into text format:
write_text_mat([Matlab sparse matrix], [text matrix name to be written to]);
[weights vector] = inv_propensity([training label matrix],A,B);A,B are the parameters of the inverse propensity model. Following values are to be used over the benchmark datasets:
Wikipedia-LSHTC: A=0.5, B=0.4 Amazon: A=0.6, B=2.6 Other: A=0.55, B=1.5