Slice: Scalable Linear Extreme Classifiers

Himanshu Jain • Venkatesh B. • Bhanu Teja Chunduri • Manik Varma

Extreme multi-label learning aims to annotate each data point with the most relevant subset of labels from an extremely large label set. Slice is an efficient 1-vs-All based extreme classifier that is specially designed for low-dimensional dense features. Slice achieves almost the same prediction accuracy as leading 1-vs-All extreme classifiers such as DiSMEC [04] but can be orders of magnitude faster at training and prediction as it cuts down both costs from linear to logarithmic in the number of labels. For example, Slice can efficiently scale to datasets with as many as 100 million labels and 240 million training points. Slice is also much more accurate than all the other state-of-the-art extreme classifiers (PfastreXML [05], PPDSparse [03] and Parabel [02]) for low-dimensional dense deep learning features. Please refer to our WSDM-2019 paper [1] for more details.

Download Slice

This code is made available as is for non-commercial research purposes only. Please make sure that you have read the license agreement in LICENSE.doc/pdf. Please do not install or use Slice unless you agree to the terms of the license.

Download Slice source code in C++ as well as precompiled Windows/Linux binaries.

The code for Slice is written in C++ and should compile on 64 bit Windows/Linux machines using a C++11 enabled compiler. The code also uses the publically available implementation of the HNSW algorithm for Approximate Nearest Neighbor Search (ANNS). Installation and usage instructions are provided below. Running Slice with the default parameter settings should provide reasonable results on benchmark datasets but please verify that this works for you by running the sample_run.sh script provided in the toy example.

Please contact Himanshu Jain and Manik Varma if you have any questions or feedback.

Datasets

The following datasets were used in our WSDM-2019 paper [01] to compare Slice's performance against other methods. All points are represented using XML-CNN [12] features which were provided by the authors directly. Note that, these dataset features differ from those available on The Extreme Classification Repository where points have been represented using pretrained fasttext features. The raw text for these datasets is also available from the repository in case you would like to compute your own custom features.

Table 1: Dataset statistics & download
Dataset	Download	Feature	Label	Number of	Number of	Avg. Points	Avg. Labels	Citations

Dataset	Download	Dimensionality	Dimensionality	Train Points	Test Points	per Label	per Point	Citations


EURLex-4K	Download	1024	3956	11,585	3,865	15.59	5.32	[1] + [12]
Wikipedia-500K	Download	512	501,070	1,646,302	711,542	16.03	4.87	[1] + [12]
Amazon-670K	Download	512	670,091	490,449	153,025	3.99	5.45	[1] + [12]

Benchmark Results

These results were reported in our WSDM-2019 paper. Please let us know if you would like your algorithm's performance added to the table.

Table 2: Precision@k on benchmark datasets
Method	P@1	P@3	P@5	Training time (hr)	Test time/point (ms)


EURLex-4K

Slice [1]	77.72	63.78	52.05	0.02	1.23
DiSMEC [4]	76.12	62.91	51.51	0.13	4.36
Parabel [2]	74.54	61.72	50.48	0.01	0.91
PPD-Sparse [3]	76.32	62.79	51.40	0.013	4.36
PfastreXML [5]	73.63	60.31	49.69	0.037	1.82
SLEEC [7]	74.31	60.00	49.11	0.35	4.87
PD-Sparse [6]	73.53	60.80	49.37	0.12	4.36
CS [11]	58.01	44.85	35.50	0.01	140.10
LEML [8]	60.34	47.45	37.96	0.67	2.24
WSABIE [10]	76.09	61.69	49.11	0.13	2.24
CPLST [9]	61.01	47.43	38.04	0.18	2.24

Wikipedia-500K

Slice (\|S\|=2000) [1]	62.62	41.79	31.57	15.61	11.14
DiSMEC [4]	63.70	42.49	32.26	~2133	316.29
Parabel [2]	59.34	39.05	29.35	6.29	2.94
PPD-Sparse [3]	50.40	33.15	25.54	5.85	316.29
PfastreXML [5]	55.00	36.14	27.38	11.14	6.36

Amazon-670K

Slice [1]	37.77	33.76	30.70	1.92	3.49
DiSMEC [4]	37.60	33.62	30.64	~789	429
Parabel [2]	33.93	30.38	27.49	1.54	2.85
PPD-Sparse [3]	33.16	29.60	26.85	3.90	429
PfastreXML [5]	28.51	26.06	24.17	2.85	19.35
SLEEC [7]	18.77	16.50	14.97	7.12	22.54

Usage

Linux/Windows makefiles for compiling Slice have been provided with the source code. To compile, run "make" (Linux) or "nmake -f Makefile.win" (Windows) in the Slice folder. Run the following commands from inside the Slice folder for training and testing.

Training

./slice_train [input feature file name] [input label file name] [output model folder name] -m 100 -c 300 -s 300 -k 300 -o 20 -t 1 -C 1 -f 0.000001 -siter 20 -stype 0 -q 0

where:

	-m = params.M                       :        HNSW M parameter. default=100
	-c = params.efC                     :        HNSW efConstruction parameter. default=300
	-s = params.efS                     :        HNSW efSearch parameter. default=300
	-k = params.num_nbrs                :        Number of labels to be shortlisted per training point according to the generative model. default=300
	-o = params.num_io_threads          :        Number of threads used to write the retrived ANN points to file. default=20
	-t = params.num_threads             :        Number of threads used to train ANNS datastructure and the discriminative classifiers. default=1
	-C = params.classifier_cost         :        Cost co-efficient for linear classifiers            default=1.0 SVM weight co-efficient. default=1.0
	-f = params.classifier_threshold    :        Threshold value for sparsifying linear classifiers' trained weights to reduce model size. default=1e-6
	-siter = params.classifier_maxiter  :        Maximum iterations of algorithm for training linear classifiers. default=20
	-stype = params.classifier_kind     :        Kind of linear classifier to use. 0=L2R_L2LOSS_SVC, 1=L2R_LR (Refer to Liblinear). default=0
	-q = params.quiet                   :        Quiet option to restrict the output for reporting progress and debugging purposes 0=no quiet, 1=quiet. default=[value saved in trained model]

Feature file should be in dense matrix text format and label file should be in sparse matrix text format (refer to Miscellaneous section).

Testing

./slice_predict [feature file name] [model dir name] [output file name] -b 0 -t 1 -q 0

where:

	-s = params.efS                     :        HNSW efSearch parameter. default=[value saved in trained model]
	-k = params.num_nbrs                :        Number of labels to be shortlisted per training point according to the generative model. default=[value saved in trained model]
	-o = params.num_io_threads          :        Number of threads used to write the retrived ANN points to file. default=[value saved in trained model]
	-b = params.b_gen                   :        Bias parameter for the generative model. default=0
	-t = params.num_threads             :        Number of threads. default=[value saved in trained model]
	-q = params.quiet                   :        Quiet option to restrict the output for reporting progress and debugging purposes 0=no quiet, 1=quiet. default=[value saved in trained model]

Feature file should be in dense matrix text format (refer to the Miscellaneous section).

Performance Evaluation

Scripts for calculating Precision@k and nDCG@k are available in Tools/metrics folder. The following command can be used to compute these metrics:

  ./precision_k [test score matrix file name] [test label file name] [K]
  ./nDCG_k [test score matrix file name] [test label file name] [K]

You can also use the Matlab scripts provided in Tools/metrics folder for calculating the propensity scored metrics. Please execute "make" in Tools/Matlab folder from the Matlab terminal to compile these scripts. Both vanilla and propensity scored metrics can be computed by executing the following Matlab command from the Tools/metrics folder:

	[metrics] = get_all_metrics([test score matrix], [test label matrix], [inverse label propensity vector])

Miscellaneous

Unlike previous code releases from our group (Parabel, PfastreXML, FastXML and SLEEC), Slice stores features in a dense matrix format while labels are stored in a sparse format. The first line of both the files contains the number of rows and columns and the subsequent lines contain one data instance per row. For features, each line contains D (the dimensionality of the feature vectors), space separated, float values while each line of the label file contains indices of active labels and the corresponding value (always 1 in this case) starting from 0. Scripts to convert features from sparse format to dense format and vice versa are also provided in the Tools/c++ folder. The following commands can be used for these conversions:
```
	./smat_to_dmat [sparse feature text file] [dense feature text file]
	./dmat_to_smat [dense feature text file] [sparse feature text file]
		
```
Scripts are provided in the 'Tools' folder for sparse matrix inter conversion between Matlab .mat format and text format.
Please use the following commands to read a sparse matrix in Matlab and write it in text format:
```
	[sparse matrix] = read_text_mat([sparse text matrix file name]); 
```
```
	write_text_mat([Matlab sparse matrix], [sparse text matrix file name]);
```
Please run the following command inside 'Tools/metrics' folder on Matlab terminal to generate inverse propensity weights for each label:
```
	[weights vector] = inv_propensity([training label matrix],A,B); 
```
A,B are the parameters of the inverse propensity model. Following values are to be used over the benchmark datasets:
```
	Wikipedia-500K: A=0.5,  B=0.4
	Amazon-670K:    A=0.6,  B=2.6
	Other:          A=0.55, B=1.5
		
```

Toy Example

The zip file containing the source code also includes the EURLex-4K dataset with 1024 dimensional XML-CNN features, as a toy example. To run Slice on the EURLex-4K dataset, execute "bash sample_run.sh" (Linux) or "sample_run" (Windows) in the Slice folder. You should get at Precision@1 of 77.7% if everything is working correctly.

References

[01] H. Jain, V. Balasubramanian, B. Chunduri and M. Varma, Slice: Scalable linear extreme classifiers trained on 100 million labels for related searches, in WSDM 2019.

[02] Y. Prabhu, A. Kag, S. Harsola, R. Agrawal and M. Varma, Parabel: Partitioned Label Trees for Extreme Classification with Application to Dynamic Search Advertising in WWW, 2018.

[03] I. E. H. Yen, X. Huang, W. Dai, P. Ravikumar I. S. Dhillon and E. P. Xing, PPDSparse: A Parallel Primal-Dual Sparse Method for Extreme Classification in KDD, 2017.

[04] R. Babbar, and B. Schölkopf, DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification in WSDM, 2017.

[05] H. Jain, Y. Prabhu, and M. Varma, Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications in KDD, 2016.

[06] I. E. H. Yen, X. Huang, K. Zhong, P. Ravikumar and I. S. Dhillon, PD-Sparse: A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification in ICML, 2016.

[07] K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain, Sparse Local Embeddings for Extreme Multi-label Classification, in NIPS, 2015.

[08] H. Yu, P. Jain, P. Kar, and I. Dhillon, Large-scale Multi-label Learning with Missing Labels, in ICML, 2014.

[09] Y. Chen, and H. Lin, Feature-aware Label Space Dimension Reduction for Multi-label Classification , in NIPS, 2012.

[10] J. Weston, S. Bengio, and N. Usunier, WSABIE: Scaling Up To Large Vocabulary Image Annotation , in IJCAI, 2011.

[11] D. Hsu, S. Kakade, J. Langford, and T. Zhang, Multi-Label Prediction via Compressed Sensing , in NIPS, 2009.

[12] J. Liu, W. C. Chang, Y. Wu, and Y. Yang. 2017. Deep Learning for Extreme Multi-label Text Classification, in SIGIR, 2017.