Slice: Scalable Linear Extreme Classifiers


Himanshu JainVenkatesh B.Bhanu Teja ChunduriManik Varma

Extreme multi-label learning aims to annotate each data point with the most relevant subset of labels from an extremely large label set. Slice is an efficient 1-vs-All based extreme classifier that is specially designed for low-dimensional dense features. Slice achieves almost the same prediction accuracy as leading 1-vs-All extreme classifiers such as DiSMEC [04] but can be orders of magnitude faster at training and prediction as it cuts down both costs from linear to logarithmic in the number of labels. For example, Slice can efficiently scale to datasets with as many as 100 million labels and 240 million training points. Slice is also much more accurate than all the other state-of-the-art extreme classifiers (PfastreXML [05], PPDSparse [03] and Parabel [02]) for low-dimensional dense deep learning features. Please refer to our WSDM-2019 paper [1] for more details.

Download Slice

This code is made available as is for non-commercial research purposes only. Please make sure that you have read the license agreement in LICENSE.doc/pdf. Please do not install or use Slice unless you agree to the terms of the license.

Download Slice source code in C++ as well as precompiled Windows/Linux binaries.

The code for Slice is written in C++ and should compile on 64 bit Windows/Linux machines using a C++11 enabled compiler. The code also uses the publically available implementation of the HNSW algorithm for Approximate Nearest Neighbor Search (ANNS). Installation and usage instructions are provided below. Running Slice with the default parameter settings should provide reasonable results on benchmark datasets but please verify that this works for you by running the sample_run.sh script provided in the toy example.

Please contact Himanshu Jain and Manik Varma if you have any questions or feedback.

Datasets

The following datasets were used in our WSDM-2019 paper [01] to compare Slice's performance against other methods. All points are represented using XML-CNN [12] features which were provided by the authors directly. Note that, these dataset features differ from those available on The Extreme Classification Repository where points have been represented using pretrained fasttext features. The raw text for these datasets is also available from the repository in case you would like to compute your own custom features.


Dataset Download Feature Label Number of Number of Avg. Points Avg. Labels Citations
Dimensionality Dimensionality Train Points Test Points per Label per Point


EURLex-4K Download 1024 3956 11,585 3,865 15.59 5.32 [1] + [12]
Wikipedia-500K Download 512 501,070 1,646,302 711,542 16.03 4.87 [1] + [12]
Amazon-670K Download 512 670,091 490,449 153,025 3.99 5.45 [1] + [12]

Table 1: Dataset statistics & download

Benchmark Results

These results were reported in our WSDM-2019 paper. Please let us know if you would like your algorithm's performance added to the table.


Method P@1 P@3 P@5 Training time (hr) Test time/point (ms)

EURLex-4K

Slice [1] 77.72 63.78 52.05 0.02 1.23
DiSMEC [4] 76.12 62.91 51.51 0.13 4.36
Parabel [2] 74.54 61.72 50.48 0.01 0.91
PPD-Sparse [3] 76.32 62.79 51.40 0.013 4.36
PfastreXML [5] 73.63 60.31 49.69 0.037 1.82
SLEEC [7] 74.31 60.00 49.11 0.35 4.87
PD-Sparse [6] 73.53 60.80 49.37 0.12 4.36
CS [11] 58.01 44.85 35.50 0.01 140.10
LEML [8] 60.34 47.45 37.96 0.67 2.24
WSABIE [10] 76.09 61.69 49.11 0.13 2.24
CPLST [9] 61.01 47.43 38.04 0.18 2.24

Wikipedia-500K

Slice (|S|=2000) [1] 62.62 41.79 31.57 15.61 11.14
DiSMEC [4] 63.70 42.49 32.26 ~2133 316.29
Parabel [2] 59.34 39.05 29.35 6.29 2.94
PPD-Sparse [3] 50.40 33.15 25.54 5.85 316.29
PfastreXML [5] 55.00 36.14 27.38 11.14 6.36

Amazon-670K

Slice [1] 37.77 33.76 30.70 1.92 3.49
DiSMEC [4] 37.60 33.62 30.64 ~789 429
Parabel [2] 33.93 30.38 27.49 1.54 2.85
PPD-Sparse [3] 33.16 29.60 26.85 3.90 429
PfastreXML [5] 28.51 26.06 24.17 2.85 19.35
SLEEC [7] 18.77 16.50 14.97 7.12 22.54

Table 2: Precision@k on benchmark datasets

Usage

Linux/Windows makefiles for compiling Slice have been provided with the source code. To compile, run "make" (Linux) or "nmake -f Makefile.win" (Windows) in the Slice folder. Run the following commands from inside the Slice folder for training and testing.

Training

./slice_train [input feature file name] [input label file name] [output model folder name] -m 100 -c 300 -s 300 -k 300 -o 20 -t 1 -C 1 -f 0.000001 -siter 20 -stype 0 -q 0
where:
	-m = params.M                       :        HNSW M parameter. default=100
	-c = params.efC                     :        HNSW efConstruction parameter. default=300
	-s = params.efS                     :        HNSW efSearch parameter. default=300
	-k = params.num_nbrs                :        Number of labels to be shortlisted per training point according to the generative model. default=300
	-o = params.num_io_threads          :        Number of threads used to write the retrived ANN points to file. default=20
	-t = params.num_threads             :        Number of threads used to train ANNS datastructure and the discriminative classifiers. default=1
	-C = params.classifier_cost         :        Cost co-efficient for linear classifiers            default=1.0 SVM weight co-efficient. default=1.0
	-f = params.classifier_threshold    :        Threshold value for sparsifying linear classifiers' trained weights to reduce model size. default=1e-6
	-siter = params.classifier_maxiter  :        Maximum iterations of algorithm for training linear classifiers. default=20
	-stype = params.classifier_kind     :        Kind of linear classifier to use. 0=L2R_L2LOSS_SVC, 1=L2R_LR (Refer to Liblinear). default=0
	-q = params.quiet                   :        Quiet option to restrict the output for reporting progress and debugging purposes 0=no quiet, 1=quiet. default=[value saved in trained model]
Feature file should be in dense matrix text format and label file should be in sparse matrix text format (refer to Miscellaneous section).

Testing

./slice_predict [feature file name] [model dir name] [output file name] -b 0 -t 1 -q 0
where:
	-s = params.efS                     :        HNSW efSearch parameter. default=[value saved in trained model]
	-k = params.num_nbrs                :        Number of labels to be shortlisted per training point according to the generative model. default=[value saved in trained model]
	-o = params.num_io_threads          :        Number of threads used to write the retrived ANN points to file. default=[value saved in trained model]
	-b = params.b_gen                   :        Bias parameter for the generative model. default=0
	-t = params.num_threads             :        Number of threads. default=[value saved in trained model]
	-q = params.quiet                   :        Quiet option to restrict the output for reporting progress and debugging purposes 0=no quiet, 1=quiet. default=[value saved in trained model]
	
Feature file should be in dense matrix text format (refer to the Miscellaneous section).

Performance Evaluation

Scripts for calculating Precision@k and nDCG@k are available in Tools/metrics folder. The following command can be used to compute these metrics:
  ./precision_k [test score matrix file name] [test label file name] [K]
  ./nDCG_k [test score matrix file name] [test label file name] [K]
	
You can also use the Matlab scripts provided in Tools/metrics folder for calculating the propensity scored metrics. Please execute "make" in Tools/Matlab folder from the Matlab terminal to compile these scripts. Both vanilla and propensity scored metrics can be computed by executing the following Matlab command from the Tools/metrics folder:
	[metrics] = get_all_metrics([test score matrix], [test label matrix], [inverse label propensity vector])
	

Miscellaneous

Toy Example

The zip file containing the source code also includes the EURLex-4K dataset with 1024 dimensional XML-CNN features, as a toy example. To run Slice on the EURLex-4K dataset, execute "bash sample_run.sh" (Linux) or "sample_run" (Windows) in the Slice folder. You should get at Precision@1 of 77.7% if everything is working correctly.

References

[01]    H. Jain,  V. Balasubramanian,  B. Chunduri and M. Varma, Slice: Scalable linear extreme classifiers trained on 100 million labels for related searches, in WSDM 2019.

[02]     Y. Prabhu, A. Kag, S. Harsola, R. Agrawal and M. Varma, Parabel: Partitioned Label Trees for Extreme Classification with Application to Dynamic Search Advertising in WWW, 2018.

[03]     I. E. H. Yen, X. Huang, W. Dai, P. Ravikumar I. S. Dhillon and E. P. Xing, PPDSparse: A Parallel Primal-Dual Sparse Method for Extreme Classification in KDD, 2017.

[04]     R. Babbar, and B. Schölkopf, DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification in WSDM, 2017.

[05]     H. Jain, Y. Prabhu, and M. Varma, Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications in KDD, 2016.

[06]     I. E. H. Yen, X. Huang, K. Zhong, P. Ravikumar and I. S. Dhillon, PD-Sparse: A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification in ICML, 2016.

[07]     K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain, Sparse Local Embeddings for Extreme Multi-label Classification, in NIPS, 2015.

[08]     H. Yu, P. Jain, P. Kar, and I. Dhillon, Large-scale Multi-label Learning with Missing Labels, in ICML, 2014.

[09]     Y. Chen, and H. Lin, Feature-aware Label Space Dimension Reduction for Multi-label Classification , in NIPS, 2012.

[10]     J. Weston, S. Bengio, and N. Usunier, WSABIE: Scaling Up To Large Vocabulary Image Annotation , in IJCAI, 2011.

[11]     D. Hsu, S. Kakade, J. Langford, and T. Zhang, Multi-Label Prediction via Compressed Sensing , in NIPS, 2009.

[12]     J. Liu, W. C. Chang, Y. Wu, and Y. Yang. 2017. Deep Learning for Extreme Multi-label Text Classification, in SIGIR, 2017.