The Extreme Classification Repository: Multi-label Datasets & Code

Kush Bhatia • Kunal Dahiya • Himanshu Jain • Purushottam Kar • Anshul Mittal • Yashoteja Prabhu • Manik Varma

The objective of extreme multi-label classification (XC) is to learn feature architectures and classifiers that can automatically tag a data point with the most relevant subset of labels from an extremely large label set. This repository provides resources, including XC datasets, code for leading XC methods and metrics to evaluate the performance of XC algorithms.

Citing the Repository

Please use the following citation if you use any of the datasets or results provided on this repository.

        @Misc{Bhatia16,
          author    = {Bhatia, K. and Dahiya, K. and Jain, H. and Kar, P. and Mittal, A. and Prabhu, Y. and Varma, M.},
          title     = {The extreme classification repository: Multi-label datasets and code},
          url       = {http://manikvarma.org/downloads/XC/XMLRepository.html},
          year      = {2016}
        }

Datasets

Useful Tools

Performance Metrics and Evaluation Protocols

Code for XC Methods

Benchmarked Results

Appendix

References

Datasets

The datasets below consider various XC problems in webpage categorization, related webpage recommendation and product-to-product recommendation tasks. These include multi-modal datasets and datasets where labels have textual features. The dataset file format information can be found in the README file available [here]. Python and Matlab scripts for reading the datasets have been provided [below].

Please get in touch with Manik Varma if you would like to contribute a dataset.

Naming Conventions

Number of labels: The (rounded off) number of labels in the dataset is appended to the dataset name to disambiguate various versions of datasets. Specific legacy datasets were renamed to ensure uniformity. The dataset previously referred to as DeliciousLarge was renamed to Delicious-200K and RCV1-X was renamed to RCV1-2K.
Label features: Datasets that contain label features have the token "LF" prepended to their names. These are usually short textual descriptions of the labels.
Multi-modal features: Datasets that contain multi-modal features have the token "MM" prepended to their names. These usually correspond to short textual descriptions and one or more images for each data point and label.
Short-text datasets: Datasets with the phrase "Titles" in their names, such as AmazonTitles-670K, are short-text datasets whose data points are represented by a 3-5 word textual description such as the name of a product or title of a webpage. For full-text datasets such as Amazon-670K, data points are represented using a more detailed description. Short-text tasks abound in ranking and recommendation applications where data points are user queries or products/webpages represented using only their titles.
Item-to-item datasets: Datasets with the phrase "SeeAlso" in their names correspond to tasks requiring related Wikipedia articles to be predicted for a given Wikipedia article.

Datasets with/without Label Features

Note that there exist pairs of datasets whose names are identical but for the "LF" prefix (e.g. LF-WikiSeeAlsoTitles-320K and WikiSeeAlsoTitles-350K) but which contain a different number of labels and data points. The reason for this variation is that the raw dumps from which these datasets were curated often contained labels for which label features were unavailable or could not be reliably retrieved. Such labels could exist in the non-LF dataset but were excluded from the LF version. Such exclusions could also lead to specific data points having zero labels. Such data points were excluded from the dataset as well.
A special case in this respect is that of the Wikipedia-500K and LF-Wikipedia-500K datasets that are identical and have the same (number of) labels and data points. Wikipedia articles are the data points and Wikipedia categories are the labels for these datasets. As a convention, methods that do not use label features could choose to report their results on the Wikipedia-500K dataset whereas methods that do use label features could report results on the LF-Wikipedia-500K dataset. For this reason, these two datasets have not been released separately. The LF-Wikipedia-500K dataset has been released (see links below). Methods that wish to work on the Wikipedia-500K dataset can download the LF version and disregard the label features.

Multi-modal Datasets

The MM-AmazonTitles-300K dataset was created by taking raw data dumps and extracting all data points and labels for which a short textual description and at least one image was available. The images were resized to fit within a 128 x 128-pixel region and padded with white pixels in a centered manner to ensure a 1:1 aspect ratio. White padding was used since the natural background in most images was white. Subsequent operations such as tokenization, train-test split creation and reciprocal pair removal were done as explained below. The processed and unprocessed image sets are available upon request. To request, please download the dataset using the links given in the table above, inspect the README file in the download for terms of usage and fill out the form available [here]. Tables comparing various methods on the MM-AmazonTitles-300K dataset are not provided on this webpage since most multi-modal benchmarks are not XC methods and most XC methods work only with textual features and not multi-modal features. Instead, please refer to the publication [65] for benchmark comparisons.

Legacy Datasets

Benchmarked results on datasets formerly popular in XC research have shifted to the Appendix available [here]. Some of these datasets are tiny such as the Bibtex dataset with 159 labels. The raw sources can no longer be reliably traced for other datasets and only bag-of-words features are available. All such legacy datasets remain available using links in the dataset table below.

**Dataset statistics & download**
Dataset	Download	BoW Feature	Number of	Number of	Number of	Avg. Points	Avg. Labels	Original

Dataset	Download	Dimensionality	Labels	Train Points	Test Points	per Label	per Point	Source

Multi-modal Datasets
MM-AmazonTitles-300K	BoW Features Raw text	40,000	303,296	586,781	260,536	15.73	8.13	[64]

Datasets with Label Features
LF-AmazonTitles-131K	BoW Features Raw text	40,000	131,073	294,805	134,835	5.15	2.29	[28]
LF-Amazon-131K	BoW Features Raw text	80,000	131,073	294,805	134,835	5.15	2.29	[28]
LF-WikiSeeAlsoTitles-320K	BoW Features Raw text	40,000	312,330	693,082	177,515	4.67	2.11	-
LF-WikiSeeAlso-320K	BoW Features Raw text	80,000	312,330	693,082	177,515	4.67	2.11	-
LF-WikiTitles-500K	BoW Features Raw text	80,000	501,070	1,813,391	783,743	17.15	4.74	-
LF-Wikipedia-500K	BoW Features Raw text	2,381,304	501,070	1,813,391	783,743	24.75	4.77	-
ORCAS-800K	Dataset page	-	797,322	7,360,881	2,547,702	16.13	1.75	[70]
LF-AmazonTitles-1.3M	BoW Features Raw text	128,000	1,305,265	2,248,619	970,237	38.24	22.20	[29] + [30]

Datasets without Label Features
AmazonCat-13K	BoW Features Raw text	203,882	13,330	1,186,239	306,782	448.57	5.04	[28]
AmazonCat-14K	BoW Features Raw text	597,540	14,588	4,398,050	1,099,725	1330.1	3.53	[29] + [30]
WikiSeeAlsoTitles-350K	BoW Features Raw text	91,414	352,072	629,418	162,491	5.24	2.33	-
WikiTitles-500K	BoW Features Raw text	185,479	501,070	1,699,722	722,678	23.62	4.89	-
Wikipedia-500K	(same as LF-Wikipedia-500K)	2,381,304	501,070	1,813,391	783,743	24.75	4.77	-
AmazonTitles-670K	BoW Features Raw text	66,666	670,091	485,176	150,875	5.11	5.39	[28]
Amazon-670K	BoW Features Raw text	135,909	670,091	490,449	153,025	3.99	5.45	[28]
AmazonTitles-3M	BoW Features Raw text	165,431	2,812,281	1,712,536	739,665	31.55	36.18	[29] + [30]
Amazon-3M	BoW Features Raw text	337,067	2,812,281	1,717,899	742,507	31.64	36.17	[29] + [30]

Legacy Datasets
Mediamill	BoW Features	120	101	30,993	12,914	1902.15	4.38	[19]
Bibtex	BoW Features	1,836	159	4,880	2,515	111.71	2.40	[20]
Delicious	BoW Features	500	983	12,920	3,185	311.61	19.03	[21]
RCV1-2K	BoW Features	47,236	2,456	623,847	155,962	1218.56	4.79	[26]
EURLex-4K	BoW Features	5,000	3,993	15,539	3,809	25.73	5.31	[27] + [47]
EURLex-4.3K	BoW Features	200,000	4,271	45,000	6,000	60.57	5.07	[47] + [48]
Wiki10-31K	BoW Features	101,938	30,938	14,146	6,616	8.52	18.64	[23]
Delicious-200K	BoW Features	782,585	205,443	196,606	100,095	72.29	75.54	[24]
WikiLSHTC-325K	BoW Features	1,617,899	325,056	1,778,351	587,084	17.46	3.19	[25]

Tokenization

The table above allows downloading precomputed bag-of-words features or raw text. The tokenization used to create the bag-of-words representation may differ across datasets (e.g. whitespace-separated for legacy datasets vs. WordPiece for more recent datasets). It is recommended that additional experiments be conducted for XC methods that use a novel tokenizer to isolate improvements attributable to better tokenization rather than the architecture or learning algorithm. One way to accomplish this is to execute older XC methods with the novel tokenizer.

Split Creation

For each dataset, a single split is offered. Splits were not created randomly but instead in a way that ensured every label had at least one training point. This yielded more realistic train/test splits than uniform sampling which could have dropped several infrequently occurring and hard-to-classify labels from the test set. For example, on the WikiLSHTC-325K dataset, uniformly random split creation could lose ninety thousand of the hardest to classify labels from the test set whereas the adopted sampling procedure dropped only forty thousand labels from the test set.

Note: Results computed on the train/test splits provided on this page are not comparable to results computed on splits created using uniform sampling.

Reciprocal-pair Removal

For the "LF" datasets that concern related item prediction, additional care is required since introducing label features allowed "reciprocal pairs" to emerge. Specifically, these are pairs of items, say A and B, that are related to each other such that two distinct data points exist, with A appearing as a label for B in one data point and B appearing as a label for A in the other. Such pairs were removed from the ground truth in the test set to prevent algorithms from achieving artificially high scores by memorizing such pairs without learning anything meaningful. The recommended protocol for performing prediction while avoiding such reciprocal pairs using filter files provided with these datasets is described [here].

Useful Tools

The following resources provide several tools

The above tools can be used to perform various useful operations including

Reading and writing the datasets in the given file format
Preprocessing raw text using various tokenizers to generate data point (and label) features, including bag-of-words features
Evaluating various performance measures such as precision, nDCG and their propensity-scored counterparts (see [here] for details)

Performance Metrics and Evaluation Protocols

The benchmarked results below present comparative results of various algorithms with classification accuracy evaluated on several performance measures. The discussion below describes protocols for evaluating XC methods, especially in the presence of head/tail labels and reciprocal pairs (see [here]).

Performance at the Top

The precision$@k$ and nDCG$@k$ metrics are defined for a predicted score vector $\hat{\mathbf y} \in {\mathbb{R}}^{L}$ and ground truth label vector $\mathbf y \in \left\lbrace 0, 1 \right\rbrace^L$ as \[ \text{P}@k := \frac{1}{k} \sum_{l\in \text{rank}_k (\hat{\mathbf y})} \mathbf y_l \] \[ \text{DCG}@k := \sum_{l\in {\text{rank}}_k (\hat{\mathbf y})} \frac{\mathbf y_l}{\log(l+1)} \] \[ \text{nDCG}@k := \frac{{\text{DCG}}@k}{\sum_{l=1}^{\min(k, \|\mathbf y\|_0)} \frac{1}{\log(l+1)}}, \] where, $\text{rank}_k(\mathbf y)$ returns the $k$ largest indices of $\mathbf{y}$ ranked in descending order.

Propensity-scored Performance at the Top

For datasets that contain excessively popular labels (often referred to as "head" labels), high P@k may be achieved by simply predicting head labels repeatedly irrespective of their relevance to the data point. To check for such trivial behavior, it is recommended that XC methods also be evaluated with respect to propensity-scored counterparts of the precision$@k$ and nDCG$@k$ metrics (PSP$@k$ and PSnDCG$@k$) described below. \[ \text{PSP}@k := \frac{1}{k} \sum_{l\in \text{rank}_k (\hat{\mathbf y})} \frac{\mathbf y_l}{p_l} \] \[ \text{PSDCG}@k := \sum_{l\in {\text{rank}}_k (\hat{\mathbf y})} \frac{\mathbf y_l}{p_l\log(l+1)} \] \[ \text{PSnDCG}@k := \frac{{\text{PSDCG}}@k}{\sum_{l=1}^{k} \frac{1}{\log(l+1)}}, \] where $p_l$ is the propensity score for label $l$ which helps in making metrics unbiased [31] with respect to missing labels. Propensity-scored metrics place specific emphasis on performing well on tail labels and give feeble rewards for predicting popular or head labels. It is recommended that scripts provided [here] be used to compute propensity-scored metrics in order to be consistent with results reported below.

Removal of Reciprocal-pairs

As described [here], reciprocal pairs were removed from the ground truth in the test splits of the LF datasets to avoid trivial predictions from getting rewarded. However, these reciprocal pairs must now be removed from the test predictions of XC methods to avoid unnecessary penalization. It is recommended that filter files provided along with the datasets and the tools provided in the PyXCLib library linked [here] be used to evaluate XC methods on LF datasets. Although reciprocal pairs were not removed from the train splits, a separate filter file is provided for the train splits enumerating the reciprocal pairs therein so that methods that wish to eliminate them from train splits may do so. Note that these filter files are distinct from the ground truth files and only contain lists of reciprocal pairs.

Code for XC Methods

The following lists provide links to code for leading XC methods. The methods have been categorized based on the kind of classifier used (e.g. one-vs-all, trees, embeddings) for easy identification. Methods that learn deep representations for data points jointly with the classifier are included as a separate category.

Slice (Jain et al., WSDM 2019)
1-vs-All

Pre-trained-dense

C++
Parabel (Prabhu et al., WWW 2018)
1-vs-All

Sparse-BoW

C++
DiSMEC++ (Schultheis and Babbar, ECML-MLJ 2022)
1-vs-All

Sparse-BoW

C++
DiSMEC (Babbar and Schölkopf, WSDM 2017)
1-vs-All

Sparse-BoW

Java
PPD-Sparse (Yen et al., KDD 2017)
1-vs-All

Sparse-BoW

C++
Label Filters (Niculescu-Mizil and Abbasnejad, AISTATS 2017)
1-vs-All

Sparse BoW

C
PD-Sparse (Yen et al., ICML 2016)
1-vs-All

Sparse-BoW

C++
ProXML (Babbar and Schölkopf, Machine Learning 2019 & ECML 2019)
1-vs-All

Sparse-BoW

C++
Bonsai (Khandagale et al., ArXiv 2019)
1-vs-All

Sparse-BoW

C++
SwiftXML (Prabhu et al., WSDM 2018)
Trees

Sparse-BoW

C++
Probabilistic Label Trees (Jasinska et al., ICML 2017)
Trees

Sparse-BoW

C++
PfastreXML (Jain et al., KDD 2016)
Trees

Sparse-BoW

C++
FastXML (Prabhu & Varma, KDD 2014)
Trees

Sparse-BoW

C++
CRAFTML (Siblini et al., ICML 2018)
Trees

Sparse-BoW

Rust
DEFRAG (Jalan and Kar, IJCAI 2019)
Embeddings

Sparse-BoW

Rust
AnnexML (Tagami, KDD 2017)
Embeddings

Sparse-BoW

C
Randomized embeddings for extreme learning (Mineiro and Karampatziakis, CoRR 2017)
Embeddings

Sparse BoW

Matlab
SLEEC (Bhatia et al., NIPS 2015)
Embeddings

Sparse BoW

Matlab
LEML (Yu et al., ICML 2014)
Embeddings

Sparse BoW

Matlab
W-LTLS (Evron et al., NeurIPS 2018)
Embeddings

Sparse BoW

Python
ExMLDS-(4,1) (Gupta et al., AAAI 2019)
Embeddings

Sparse BoW

C
fastTextLearnTree (Jernite et al., ICML 2017)
Deep-learning

Sparse BoW

C
XML-CNN (Liu et al., SIGIR 2017)
Deep-learning

Custom

Python
AttentionXML (You et al., NeurIPS 2019)
Deep-learning

Custom

Python
X-Transformer (Chang et al., KDD 2020)
Deep-learning

Custom

Python
MACH (Medini et al., ICML 2019)
Deep-learning

Custom

Python
APLC-XLNet (Ye et al., ICML 2020)
Deep-learning

Custom

Python
DeepXML/Astec (Dahiya et al., WSDM 2021)
Deep-learning

Custom

Python
DECAF (Mittal et al., WSDM 2021)
Deep-learning

Custom

Python
LightXML (Jiang et al., AAAI 2021)
Deep-learning

Custom

Python
PWXMC (Qaraei et al., TheWebConf 2021)
Loss-function

Custom

Python
GalaXC (Saini et al., TheWebConf 2021)
Deep-learning

Custom

Python
ECLARE (Mittal et al., TheWebConf 2021)
Deep-learning

Custom

Python
SiameseXML (Dahiya et al., ICML 2021)
Deep-learning

Custom

Python
ZestXML (Gupta et al., KDD 2021)
Zero-shot-learning

Sparse-BoW

C++
MUFIN (Mittal et al., CVPR 2022)
Deep-learning

Custom

Python
InceptionXML (Kharbanda et al., SIGIR 2023)
Deep-learning

Custom

Python
CascadeXML (Kharbanda et al., NeurIPS 2022)
Deep-learning

Custom

Python
NGAME (Dahiya et al., WSDM 2023)
Deep-learning

Custom

Python
Renee (Jain et al., MLSys 2023)
Deep-learning

Custom

Python
MatchXML (Ye et al., TKDE 2024)
Deep-learning

Custom

Python
DEXA (Dahiya et al., KDD 2023)
Deep-learning

Custom

Python

Please contact Manik Varma if you would like us to provide a link to your code.

Benchmarked Results

The tables below provide benchmarked results for various XC methods on several datasets. Rows corresponding to XC methods that use deep-learnt features or label features in the LF datasets have been highlighted in light orange. Training times are reported on a single GPU except when noted otherwise for methods that necessarily require multiple GPUs to scale. The model sizes mentioned alongside XC methods are either as reported else on-disk sizes subject to compression. Notably, executions using different platforms/libraries may introduce variance in model sizes and affect reproducibility. The tables below offer columns that are sortable in ascending/descending order. Please click on the name of a column to sort the data on that attribute.

Note 1: Deep learning methods use diverse architectures e.g. CPU-only or CPU-GPU. The symbols *, †, and ‡ are used to specify the machine configuration used for each method (see legend below). AttentionXML and the X-Transformer could not be run on a single GPU. These methods were executed on a cluster with 8 GPUs and training times were scaled accordingly before reporting.

Note 2: Results for methods marked with a ♦ symbol were directly taken from their respective publications. In some cases, this was done since publicly available implementations of the method could not be scaled. In other cases, this was done since a different version of the dataset was used in the publication. For instance, this website does not provide raw text for legacy datasets. Consequently, results on deep learning methods on legacy datasets are always marked with a ♦ symbol since those methods used raw text from alternate sources that resulted in different train-test splits.

Legend:

*: 24-core Intel Xeon 2.6GHz
†: 24-core Intel Xeon 2.6GHz with 1 Nvidia P40 GPU
‡: 24-core Intel Xeon 2.6GHz with 1 Nvidia V100 GPU
♦: Results as reported in publication

LF-AmazonTitles-131K
Method	P@1	P@3	P@5	N@1	N@3	N@5	PSP@1	PSP@3	PSP@5	PSN@1	PSN@3	PSN@5	Model size (GB)	Train time (hr)


AnnexML^*	30.05	21.25	16.02	30.05	31.58	34.05	19.23	26.09	32.26	19.23	23.64	26.60	1.95	0.08
Astec^‡	37.12	25.20	18.24	37.12	38.17	40.16	29.22	34.64	39.49	29.22	32.73	35.03	3.24	1.83
AttentionXML^‡	32.25	21.70	15.61	32.25	32.83	34.42	23.97	28.60	32.57	23.97	26.88	28.75	2.61	20.73
Bonsai^*	34.11	23.06	16.63	34.11	34.81	36.57	24.75	30.35	34.86	24.75	28.32	30.47	0.24	0.10
DECAF^‡	38.40	25.84	18.65	38.40	39.43	41.46	30.85	36.44	41.42	30.85	34.69	37.13	0.81	2.16
DEXA^‡	46.42	30.50	21.59	46.42	47.06	49.00	39.11	44.69	49.65	39.11	43.10	45.58	-	13.01
DiSMEC^*	35.14	23.88	17.24	35.14	36.17	38.06	25.86	32.11	36.97	25.86	30.09	32.47	0.11	3.10
ECLARE^‡	40.74	27.54	19.88	40.74	42.01	44.16	33.51	39.55	44.70	33.51	37.70	40.21	0.72	2.16
GalaXC^‡	39.17	26.85	19.49	39.17	40.82	43.06	32.50	38.79	43.95	32.50	36.86	39.37	0.67	0.42
LightXML^‡	35.60	24.15	17.45	35.60	36.33	38.17	25.67	31.66	36.44	25.67	29.43	31.68	2.25	71.40
MACH^‡	33.49	22.71	16.45	33.49	34.36	36.16	24.97	30.23	34.72	24.97	28.41	30.54	2.35	3.30
NGAME^‡	46.01	30.28	21.47	46.01	46.69	48.67	38.81	44.40	49.43	38.81	42.79	45.31	1.20	12.59
Parabel^*	32.60	21.80	15.61	32.60	32.96	34.47	23.27	28.21	32.14	23.27	26.36	28.21	0.34	0.03
PfastreXML^*	32.56	22.25	16.05	32.56	33.62	35.26	26.81	30.61	34.24	26.81	29.02	30.67	3.02	0.26
Renee	46.05	30.81	22.04	46.05	47.46	49.68	39.08	45.12	50.48	39.08	43.56	46.24	-	-
SiameseXML^†	41.42	27.92	21.21	41.42	42.65	44.95	35.80	40.96	46.19	35.80	39.36	41.95	1.71	1.08
Slice+FastText^*	30.43	20.50	14.84	30.43	31.07	32.76	23.08	27.74	31.89	23.08	26.11	28.13	0.39	0.08
X-Transformer^‡	29.95	18.73	13.07	29.95	28.75	29.60	21.72	24.42	27.09	21.72	23.18	24.39	-	-
XR-Transformer^‡	38.10	25.57	18.32	38.10	38.89	40.71	28.86	34.85	39.59	28.86	32.92	35.21	-	35.40
XT^*	31.41	21.39	15.48	31.41	32.17	33.86	22.37	27.51	31.64	22.37	25.58	27.52	0.84	9.46

LF-Amazon-131K
Method	P@1	P@3	P@5	N@1	N@3	N@5	PSP@1	PSP@3	PSP@5	PSN@1	PSN@3	PSN@5	Model size (GB)	Train time (hr)


AnnexML^*	35.73	25.46	19.41	35.73	37.81	41.08	23.56	31.97	39.95	23.56	29.07	33.00	4.01	0.68
Astec^‡	42.22	28.62	20.85	42.22	43.57	46.06	32.95	39.42	45.30	32.95	37.45	40.35	5.52	3.39
AttentionXML^‡	42.90	28.96	20.97	42.90	44.07	46.44	32.92	39.51	45.24	32.92	37.49	40.33	5.04	50.17
Bonsai^*	40.23	27.29	19.87	40.23	41.46	43.84	29.60	36.52	42.39	29.60	34.43	37.34	0.46	0.40
DECAF^‡	42.94	28.79	21.00	42.94	44.25	46.84	34.52	41.14	47.33	34.52	39.35	42.48	1.86	1.80
DEXA^‡	47.16	31.45	22.42	47.16	48.20	50.36	38.70	45.43	50.97	38.70	43.44	46.19	-	41.41
DiSMEC^*	41.68	28.32	20.58	41.68	43.22	45.69	31.61	38.96	45.07	31.61	36.97	40.05	0.45	7.12
ECLARE^‡	43.56	29.65	21.57	43.56	45.24	47.82	34.98	42.38	48.53	34.98	40.30	43.37	1,118.78	2.15
LightXML^‡	41.49	28.32	20.75	41.49	42.70	45.23	30.27	37.71	44.10	30.27	35.20	38.28	2.03	56.03
MACH^‡	34.52	23.39	17.00	34.52	35.53	37.51	25.27	30.71	35.42	25.27	29.02	31.33	4.57	13.91
NGAME^‡	46.53	30.89	22.02	46.53	47.44	49.58	38.53	44.95	50.45	38.53	43.07	45.81	1.20	39.99
Parabel^*	39.57	26.64	19.26	39.57	40.48	42.61	28.99	35.36	40.69	28.99	33.36	35.97	0.62	0.10
PINA^♦	46.76	31.88	23.20	-	-	-	-	-	-	-	-	-	-	-
PfastreXML^*	35.83	24.35	17.60	35.83	36.97	38.85	28.99	33.24	37.40	28.99	31.65	33.62	5.30	1.54
SiameseXML^†	44.81	30.19	21.94	44.81	46.15	48.76	37.56	43.69	49.75	37.56	41.91	44.97	1.76	1.18
Renee	48.05	32.33	23.26	48.05	49.56	52.04	40.11	47.39	53.67	40.11	45.37	48.52	-	-
Slice+FastText^*	32.07	22.21	16.52	32.07	33.54	35.98	23.14	29.08	34.63	23.14	27.25	30.06	0.39	0.11
XR-Transformer^‡	45.61	30.85	22.32	45.61	47.10	49.65	34.93	42.83	49.24	34.93	40.67	43.91	-	38.40
XT^*	34.31	23.27	16.99	34.31	35.18	37.26	24.35	29.81	34.70	24.35	27.95	30.34	0.92	1.38

LF-WikiSeeAlsoTitles-320K
Method	P@1	P@3	P@5	N@1	N@3	N@5	PSP@1	PSP@3	PSP@5	PSN@1	PSN@3	PSN@5	Model size (GB)	Train time (hr)


AnnexML^*	16.30	11.24	8.84	16.30	16.19	17.14	7.24	9.63	11.75	7.24	9.06	10.43	4.22	0.21
Astec^‡	22.72	15.12	11.43	22.72	22.16	22.87	13.69	15.81	17.50	13.69	15.56	16.75	7.30	4.17
AttentionXML^‡	17.56	11.34	8.52	17.56	16.58	17.07	9.45	10.63	11.73	9.45	10.45	11.24	6.02	56.12
Bonsai^*	19.31	12.71	9.55	19.31	18.74	19.32	10.69	12.44	13.79	10.69	12.29	13.29	0.37	0.37
DECAF^‡	25.14	16.90	12.86	25.14	24.99	25.95	16.73	18.99	21.01	16.73	19.18	20.75	1.76	11.16
DiSMEC^*	19.12	12.93	9.87	19.12	18.93	19.71	10.56	13.01	14.82	10.56	12.70	14.02	0.19	15.56
ECLARE^‡	29.35	19.83	15.05	29.35	29.21	30.20	22.01	24.23	26.27	22.01	24.46	26.03	1.67	13.46
GalaXC^‡	27.87	18.75	14.30	27.87	26.84	27.60	19.77	22.25	24.47	19.77	21.70	23.16	1.08	1.08
MACH^‡	18.06	11.91	8.99	18.06	17.57	18.17	9.68	11.28	12.53	9.68	11.19	12.14	2.51	8.23
Parabel^*	17.68	11.48	8.59	17.68	16.96	17.44	9.24	10.65	11.80	9.24	10.49	11.32	0.60	0.07
PfastreXML^*	17.10	11.13	8.35	17.10	16.80	17.35	12.15	12.51	13.26	12.15	12.81	13.48	6.77	0.59
SiameseXML^†	31.97	21.43	16.24	31.97	31.57	32.59	26.82	28.42	30.36	26.82	28.74	30.27	2.62	1.90
Slice+FastText^*	18.55	12.62	9.68	18.55	18.29	19.07	11.24	13.45	15.20	11.24	13.03	14.23	0.94	0.20
XT^*	17.04	11.31	8.60	17.04	16.61	17.24	8.99	10.52	11.82	8.99	10.33	11.26	1.90	5.28

LF-WikiSeeAlso-320K
Method	P@1	P@3	P@5	N@1	N@3	N@5	PSP@1	PSP@3	PSP@5	PSN@1	PSN@3	PSN@5	Model size (GB)	Train time (hr)


AnnexML^*	30.79	20.88	16.47	30.79	30.02	31.64	13.48	17.92	22.21	13.48	16.52	19.08	12.13	2.40
Astec^‡	40.07	26.69	20.36	40.07	39.36	40.88	23.41	28.08	31.92	23.41	27.48	30.17	13.46	7.11
AttentionXML^‡	40.50	26.43	19.87	40.50	39.13	40.26	22.67	26.66	29.83	22.67	26.13	28.38	7.12	90.37
Bonsai^*	34.86	23.21	17.66	34.86	34.09	35.32	18.19	22.35	25.66	18.19	21.62	23.84	0.84	1.39
DECAF^‡	41.36	28.04	21.38	41.36	41.55	43.32	25.72	30.93	34.89	25.72	30.69	33.69	4.84	13.40
DEXA^‡	47.11	30.48	22.71	47.10	46.31	47.62	31.81	35.50	38.78	31.81	38.94	78.61	-	78.61
DiSMEC^*	34.59	23.58	18.26	34.59	34.43	36.11	18.95	23.92	27.90	18.95	23.04	25.76	1.28	58.79
ECLARE^‡	40.58	26.86	20.14	40.48	40.05	41.23	26.04	30.09	33.01	26.04	30.06	32.32	2.83	9.40
LightXML^‡	34.50	22.31	16.83	34.50	33.21	34.24	17.85	21.26	24.16	17.85	20.81	22.80	-	249.00
MACH^‡	27.18	17.38	12.89	27.18	26.09	26.80	13.11	15.28	16.93	13.11	15.17	16.48	11.41	50.22
NGAME^‡	47.65	31.56	23.68	47.65	47.50	48.99	33.83	37.79	41.03	33.83	38.36	41.01	2.51	75.39
PINA^♦	44.54	30.11	22.92	-	-	-	-	-	-	-	-	-	-	-
Parabel^*	33.46	22.03	16.61	33.46	32.40	33.34	17.10	20.73	23.53	17.10	20.02	21.88	1.18	0.33
PfastreXML^*	28.79	18.38	13.60	28.79	27.69	28.28	17.12	18.19	19.43	17.12	18.23	19.20	14.02	4.97
SiameseXML^†	42.16	28.14	21.39	42.16	41.79	43.36	29.02	32.68	36.03	29.02	32.64	35.17	2.70	2.33
Renee	47.86	31.91	24.05	47.86	47.93	49.63	32.02	37.07	40.90	32.02	37.52	40.60	-	-
Slice+FastText^*	27.74	19.39	15.47	27.74	27.84	29.65	13.07	17.50	21.55	13.07	16.36	18.90	0.94	0.20
XR-Transformer^‡	42.57	28.24	21.30	42.57	41.99	43.44	25.18	30.13	33.79	25.18	29.84	32.59	-	119.47
XT^*	30.10	19.60	14.92	30.10	28.65	29.58	14.43	17.13	19.69	14.43	16.37	17.97	2.20	3.27

LF-WikiTitles-500K
Method	P@1	P@3	P@5	N@1	N@3	N@5	PSP@1	PSP@3	PSP@5	PSN@1	PSN@3	PSN@5	Model size (GB)	Train time (hr)


AnnexML^*	39.00	20.66	14.55	39.00	28.40	26.80	13.91	13.38	13.75	13.91	14.63	15.88	11.18	1.98
Astec^‡	44.40	24.69	17.49	44.40	33.43	31.72	18.31	18.25	18.56	18.31	19.57	21.09	15.01	13.50
AttentionXML^‡	40.90	21.55	15.05	40.90	29.38	27.45	14.80	13.97	13.88	14.80	15.24	16.22	14.01	133.94
Bonsai^*	40.97	22.30	15.66	40.97	30.35	28.65	16.58	16.34	16.40	16.58	17.60	18.85	1.63	2.03
DECAF^‡	44.21	24.64	17.36	44.21	33.55	31.92	19.29	19.82	19.96	19.29	21.26	22.95	4.53	42.26
DiSMEC^*	39.42	21.10	14.85	39.42	28.87	27.29	15.88	15.54	15.89	15.88	16.76	18.13	0.68	48.27
ECLARE^‡	44.36	24.29	16.91	44.36	33.33	31.46	21.58	20.39	19.84	21.58	22.39	23.61	4.24	39.34
MACH^‡	37.74	19.11	13.26	37.74	26.63	24.94	13.71	12.14	12.00	13.71	13.63	14.54	4.73	22.46
Parabel^*	40.41	21.98	15.42	40.41	29.89	28.15	15.55	15.32	15.35	15.55	16.50	17.66	2.70	0.42
PfastreXML^*	35.71	19.27	13.64	35.71	26.45	25.15	18.23	15.42	15.08	18.23	17.34	18.24	20.41	3.79
Slice+FastText^*	25.48	15.06	10.98	25.48	20.67	20.52	13.90	13.33	13.82	13.90	14.50	15.90	2.30	0.74
XT^*	38.13	20.71	14.66	38.13	28.13	26.61	14.10	14.12	14.38	14.10	15.15	16.40	3.10	14.67

LF-AmazonTitles-1.3M
Method	P@1	P@3	P@5	N@1	N@3	N@5	PSP@1	PSP@3	PSP@5	PSN@1	PSN@3	PSN@5	Model size (GB)	Train time (hr)


AnnexML^*	47.79	41.65	36.91	47.79	44.83	42.93	15.42	19.67	21.91	15.42	18.05	19.36	14.53	2.48
Astec^‡	48.82	42.62	38.44	48.82	46.11	44.80	21.47	25.41	27.86	21.47	24.08	25.66	26.66	18.54
AttentionXML^‡	45.04	39.71	36.25	45.04	42.42	41.23	15.97	19.90	22.54	15.97	18.23	19.60	28.84	380.02
Bonsai^*	47.87	42.19	38.34	47.87	45.47	44.35	18.48	23.06	25.95	18.48	21.52	23.33	9.02	7.89
DECAF^‡	50.67	44.49	40.35	50.67	48.05	46.85	22.07	26.54	29.30	22.07	25.06	26.85	9.62	74.47
DEXA^‡	56.63	49.05	43.90	56.60	53.81	52.37	29.12	32.69	34.86	29.12	32.02	33.86	-	103.13
ECLARE^‡	50.14	44.09	40.00	50.14	47.75	46.68	23.43	27.90	30.56	23.43	26.67	28.61	9.15	70.59
GalaXC^‡	49.81	44.23	40.12	49.81	47.64	46.47	25.22	29.12	31.44	25.22	27.81	29.36	2.69	9.55
MACH^‡	35.68	31.22	28.35	35.68	33.42	32.27	9.32	11.65	13.26	9.32	10.79	11.65	7.68	60.39
NGAME^‡	56.75	49.19	44.09	56.75	53.84	52.41	29.18	33.01	35.36	29.18	32.07	33.91	9.71	97.75
Parabel^*	46.79	41.36	37.65	46.79	44.39	43.25	16.94	21.31	24.13	16.94	19.70	21.34	11.75	1.50
PfastreXML^*	37.08	33.77	31.43	37.08	36.61	36.61	28.71	30.98	32.51	28.71	29.92	30.73	29.59	9.66
SiameseXML^†	49.02	42.72	38.52	49.02	46.38	45.15	27.12	30.43	32.52	27.12	29.41	30.90	14.58	9.89
Renee	56.04	49.91	45.32	56.04	54.21	53.15	28.54	33.38	36.14	28.54	32.15	34.18	-	-
Slice^*	34.80	30.58	27.71	34.80	32.72	31.69	13.96	17.08	19.14	13.96	15.83	16.97	5.98	0.79
XT^*	40.60	35.74	32.01	40.60	38.18	36.68	13.67	17.11	19.06	13.67	15.64	16.65	7.90	82.18
XR-Transformer^‡	50.14	44.07	39.98	50.14	47.71	46.59	20.06	24.85	27.79	20.06	23.44	25.41	-	132.00

WikiSeeAlsoTitles-350K
Method	P@1	P@3	P@5	N@1	N@3	N@5	PSP@1	PSP@3	PSP@5	PSN@1	PSN@3	PSN@5	Model size (GB)	Train time (hr)


AnnexML^*	14.96	10.20	8.11	14.96	14.20	14.76	5.63	7.04	8.59	5.63	6.79	7.76	3.59	0.20
Astec^†	20.61	14.58	11.49	20.61	20.08	20.80	9.91	12.16	14.04	9.91	11.76	12.98	7.41	4.36
AttentionXML^†	15.86	10.43	8.01	15.86	14.59	14.86	6.39	7.20	8.15	6.39	7.05	7.64	4.07	30.44
Bonsai^*	17.95	12.27	9.56	17.95	17.13	17.66	8.16	9.68	11.07	8.16	9.49	10.43	0.25	0.46
DiSMEC^*	16.61	11.57	9.14	16.61	16.09	16.72	7.48	9.19	10.74	7.48	8.95	9.99	0.09	6.62
InceptionXML^♦	21.87	15.48	12.20	-	-	-	11.13	13.31	15.20	-	-	-	-	-
MACH^†	14.79	9.57	7.13	14.79	13.83	14.05	6.45	7.02	7.54	6.45	7.20	7.73	5.22	7.44
Parabel^*	17.24	11.61	8.92	17.24	16.31	16.67	7.56	8.83	9.96	7.56	8.68	9.45	0.43	0.06
PfastreXML^*	15.09	10.49	8.24	15.09	14.98	15.59	9.03	9.69	10.64	9.03	9.82	10.52	5.22	0.51
SLICE+FastText^*	18.13	12.87	10.29	18.13	17.71	18.52	8.63	10.78	12.74	8.63	10.37	11.63	0.97	0.22
XML-CNN	17.75	12.34	9.73	17.75	16.93	17.48	8.24	9.72	11.15	8.24	9.40	10.31	0.78	14.25
XT^*	16.55	11.37	8.93	16.55	15.88	16.47	7.38	8.75	10.05	7.38	8.57	9.46	2.00	3.25

WikiTitles-500K
Method	P@1	P@3	P@5	N@1	N@3	N@5	PSP@1	PSP@3	PSP@5	PSN@1	PSN@3	PSN@5	Model size (GB)	Train time (hr)


AnnexML^*	39.56	20.50	14.32	39.56	28.28	26.54	15.44	13.83	13.79	15.44	15.49	16.58	10.70	1.77
Astec^†	46.60	26.03	18.50	46.60	35.10	33.34	18.89	18.90	19.30	18.89	20.33	22.00	15.15	13.04
AttentionXML^†	42.89	22.71	15.89	42.89	30.92	28.93	15.12	14.32	14.22	15.12	15.69	16.75	9.21	102.43
Bonsai^*	42.60	23.08	16.25	42.60	31.34	29.58	17.38	16.85	16.90	17.38	18.28	19.62	1.18	2.94
DiSMEC^*	39.89	21.23	14.96	39.89	28.97	27.32	15.89	15.15	15.43	15.89	16.52	17.86	0.35	23.94
InceptionXML^♦	48.35	27.63	19.74	-	-	-	20.86	21.02	21.23	-	-	-	-	-
MACH^†	33.74	15.62	10.41	33.74	22.61	20.80	11.43	8.98	8.35	11.43	10.77	11.28	10.48	23.65
Parabel^*	42.50	23.04	16.21	42.50	31.24	29.45	16.55	16.12	16.16	16.55	17.49	18.77	2.15	0.34
PfastreXML^*	30.99	18.07	13.09	30.99	24.54	23.88	17.87	15.40	15.15	17.87	17.38	18.46	16.85	3.07
SLICE+FastText^*	28.07	16.78	12.28	28.07	22.97	22.87	15.10	14.69	15.33	15.10	16.02	17.67	1.50	0.54
XML-CNN^†	43.45	23.24	16.53	43.45	31.69	29.95	15.64	14.74	14.98	15.64	16.17	17.45	1.17	55.21
XT^*	39.44	21.57	15.31	39.44	29.17	27.65	15.23	15.00	15.25	15.23	16.23	17.59	3.30	12.13

AmazonTitles-670K
Method	P@1	P@3	P@5	N@1	N@3	N@5	PSP@1	PSP@3	PSP@5	PSN@1	PSN@3	PSN@5	Model size (GB)	Train time (hr)


AnnexML^*	35.31	30.90	27.83	35.31	32.76	31.26	17.94	20.69	23.30	17.94	19.57	20.88	2.99	0.17
Astec^†	40.63	36.22	33.00	40.63	38.45	37.09	28.07	30.17	32.07	28.07	29.20	29.98	10.93	3.85
AttentionXML^†	37.92	33.73	30.57	37.92	35.78	34.35	24.24	26.43	28.39	24.24	25.48	26.33	12.11	37.50
Bonsai^*	38.46	33.91	30.53	38.46	36.05	34.48	23.62	26.19	28.41	23.62	25.16	26.21	0.66	0.53
DiSMEC^*	38.12	34.03	31.15	38.12	36.07	34.88	22.26	25.46	28.67	22.26	24.30	26.00	0.29	11.74
InceptionXML^♦	42.45	38.04	34.68	-	-	-	28.70	31.48	33.83	-	-	-	-	-
LightXML^‡	43.10	38.70	35.50	-	-	-	-	-	-	-	-	-	-	-
MACH^†	34.92	31.18	28.56	34.92	33.07	31.97	20.56	23.14	25.79	20.56	22.18	23.53	3.84	6.41
Parabel^*	38.00	33.54	30.10	38.00	35.62	33.98	23.10	25.57	27.61	23.10	24.55	25.48	1.06	0.09
PfastreXML^*	32.88	30.54	28.80	32.88	32.20	31.85	26.61	27.79	29.22	26.61	27.10	27.59	5.32	0.99
Renee	45.20	40.24	36.61	45.20	42.77	41.27	28.98	32.66	35.83	28.98	31.38	33.07	-	-
SLICE+FastText^*	33.85	30.07	26.97	33.85	31.97	30.56	21.91	24.15	25.81	21.91	23.26	24.03	2.01	0.22
XML-CNN^†	35.02	31.37	28.45	35.02	33.24	31.94	21.99	24.93	26.84	21.99	23.83	24.67	1.36	23.52
XR-Transformers^‡	41.94	37.44	34.19	41.89	39.67	38.32	25.34	28.86	32.14	25.34	27.58	29.30	-	-
XT^*	36.57	32.73	29.79	36.57	34.64	33.35	22.11	24.81	27.18	22.11	23.73	24.87	4.00	4.65

AmazonTitles-3M
Method	P@1	P@3	P@5	N@1	N@3	N@5	PSP@1	PSP@3	PSP@5	PSN@1	PSN@3	PSN@5	Model size (GB)	Train time (hr)


AnnexML^*	48.37	44.68	42.24	48.37	45.93	44.43	11.47	13.84	15.72	11.47	13.02	14.15	10.23	1.68
Astec^†	48.74	45.70	43.31	48.74	46.96	45.67	16.10	18.89	20.94	16.10	18.00	19.33	40.60	13.04
AttentionXML^†	46.00	42.81	40.59	46.00	43.94	42.61	12.81	15.03	16.71	12.80	14.23	15.25	44.40	273.10
Bonsai^*	46.89	44.38	42.30	46.89	45.46	44.35	13.78	16.66	18.75	13.78	15.75	17.10	9.53	9.90
MACH^†	37.10	33.57	31.33	37.10	34.67	33.17	7.51	8.61	9.46	7.51	8.23	8.76	9.77	40.48
Parabel^*	46.42	43.81	41.71	46.42	44.86	43.70	12.94	15.58	17.55	12.94	14.70	15.94	13.20	1.54
PfastreXML^*	31.16	31.35	31.10	31.16	31.78	32.08	22.37	24.59	26.16	22.37	23.72	24.65	22.97	10.47
Renee	51.81	48.84	46.54	51.81	50.08	48.86	14.49	17.43	19.66	14.49	16.50	17.95	-	-
SLICE+FastText^*	35.39	33.33	31.74	35.39	34.12	33.21	11.32	13.37	14.94	11.32	12.65	13.61	12.22	0.64
XR-Transformer^‡	50.50	47.41	45.00	50.50	48.79	47.57	15.81	19.03	21.34	15.81	18.14	19.75	-	-
XT^*	27.99	25.24	23.57	27.99	25.98	24.78	4.45	5.06	5.57	4.45	4.78	5.03	16.00	15.80

LF-Wikipedia-500K / Wikipedia-500K
Method	P@1	P@3	P@5	N@1	N@3	N@5	PSP@1	PSP@3	PSP@5	PSN@1	PSN@3	PSN@5	Model size (GB)	Train time (hr)


AnnexML^*	64.64	43.20	32.77	64.64	54.54	52.42	26.88	30.24	32.79	26.88	30.71	33.33	48.32	15.50
APLC-XLNet^♦	72.83	50.50	38.55	72.83	62.06	59.27	30.03	35.25	38.27	30.03	35.01	37.86	1.40	-
Astec^†	73.02	52.02	40.53	73.02	64.10	62.32	30.69	36.48	40.38	30.69	36.33	39.84	28.06	20.35
AttentionXML^†	82.73	63.75	50.41	82.73	76.56	74.86	34.00	44.32	50.15	34.00	42.99	47.69	9.30	110.60
Bonsai^*	69.20	49.80	38.80	-	-	-	-	-	-	-	-	-	-	-
CascadeXML^♦	81.13	62.43	49.12	-	-	-	32.12	43.15	49.37				-	-
DEXA^‡	84.92	65.50	50.51	84.90	79.18	76.80	42.59	53.93	58.33	42.59	52.92	57.44		57.51
DiSMEC^*	70.20	50.60	39.70	70.20	42.10	40.50	31.20	33.40	37.00	31.20	33.70	37.10	-	-
ECLARE^‡	68.04	46.44	35.74	68.04	58.15	56.37	31.02	35.39	38.29	31.02	35.66	34.50	7.40	86.57
LightXML^‡	81.59	61.78	47.64	81.59	74.73	72.23	31.99	42.00	46.53	31.99	40.99	45.18	-	185.56
MACH^‡	52.78	32.39	23.75	52.78	42.05	39.70	17.65	18.06	18.66	17.64	19.18	45.18	4.50	31.20
MatchXML^♦	80.66	60.43	47.09	80.66	73.28	71.20	35.87	43.12	47.50	35.87	43.00	47.18	61	11.10
NGAME^‡	84.01	64.69	49.97	84.01	78.25	75.97	41.25	52.57	57.04	41.25	51.58	56.11	3.88	54.88
PINA^♦	82.83	63.14	50.11	-	-	-	-	-	-	-	-	-	-	-
Parabel^*	68.70	49.57	38.64	68.70	60.51	58.62	26.88	31.96	35.26	26.88	31.73	34.61	5.65	2.72
PfastreXML^*	59.50	40.20	30.70	59.50	30.10	28.70	29.20	27.60	27.70	29.20	28.70	28.30	-	63.59
ProXML^*	68.80	48.90	37.90	68.80	39.10	38.00	33.10	35.00	39.40	33.10	35.20	39.00	-	-
SiameseXML^†	67.26	44.82	33.73	67.26	56.64	54.29	33.95	35.46	37.07	33.95	36.58	38.93	5.73	7.31
Renee	84.95	66.25	51.68	84.95	79.79	77.83	39.89	51.77	56.70	39.89	50.73	55.57	-	-
X-Transformer^♦	76.95	58.42	46.14	-	-	-	-	-	-	-	-	-	-	-
XML-CNN^♦	59.85	39.28	29.81	59.85	48.67	46.12	-	-	-	-	-	-	-	117.23
XR-Transformer^‡	81.62	61.38	47.85	81.62	74.46	72.43	33.58	42.97	47.81	33.58	42.21	46.61	-	318.90
XT^*	64.48	45.84	35.46	-	-	-	-	-	-	-	-	-	5.50	20.88

Amazon-670K
Method	P@1	P@3	P@5	N@1	N@3	N@5	PSP@1	PSP@3	PSP@5	PSN@1	PSN@3	PSN@5	Model size (GB)	Train time (hr)


AnnexML^*	42.39	36.89	32.98	42.39	39.07	37.04	21.56	24.78	27.66	21.56	23.38	24.76	50.00	1.56
APLC-XLNet^♦	43.46	38.83	35.32	43.46	41.01	39.38	26.12	29.66	32.78	26.12	28.20	29.68	1.1	-
Astec^†	47.77	42.79	39.10	47.77	45.28	43.74	32.13	35.14	37.82	32.13	33.80	35.01	18.79	7.32
AttentionXML^♦	47.58	42.61	38.92	47.58	45.07	43.50	30.29	33.85	37.13	-	-	-	16.56	78.30
Bonsai^*	45.58	40.39	36.60	45.58	42.79	41.05	27.08	30.79	34.11	-	-	-	-	-
CascadeXML^♦	52.15	46.54	42.44	-	-	-	30.77	35.78	40.52	-	-	-	-	-
DiSMEC^*	44.70	39.70	36.10	44.70	42.10	40.50	27.80	30.60	34.20	27.80	28.80	30.70	3.75	56.02
FastXML^*	36.99	33.28	30.53	36.99	35.11	33.86	19.37	23.26	26.85	19.37	22.25	24.69	-	-
LEML^*	8.13	6.83	6.03	8.13	7.30	6.85	2.07	2.26	2.47	2.07	2.21	2.35	-	-
LPSR-NB^*	28.65	24.88	22.37	28.65	26.40	25.03	16.68	18.07	19.43	16.68	17.70	18.63	-	-
LightXML^♦	49.10	43.83	39.85	-	-	-	-	-	-	-	-	-	4.59	86.25
MatchXML^♦	51.64	46.17	42.05	51.64	48.81	47.04	30.30	35.28	39.78	30.30	33.46	35.87	18	3.30
PPD-Sparse^*	45.32	40.37	36.92	-	-	-	26.64	30.65	34.65	-	-	-	-	-
Parabel^*	44.89	39.80	36.00	44.89	42.14	40.36	25.43	29.43	32.85	25.43	28.38	30.71	2.41	0.41
PfastreXML^*	39.46	35.81	33.05	39.46	37.78	36.69	29.30	30.80	32.43	29.30	30.40	31.49	-	-
ProXML^*	43.50	38.70	35.30	43.50	41.10	39.70	30.80	32.80	35.10	30.80	31.70	32.70	-	-
Renee	54.23	48.22	43.83	54.23	51.23	49.41	34.16	39.14	43.39	34.16	37.48	39.83	-	-
SLEEC^*	35.05	31.25	28.56	34.77	32.74	31.53	20.62	23.32	25.98	20.62	22.63	24.43	-	-
SLICE+FastText^*	33.15	29.76	26.93	33.15	31.51	30.27	20.20	22.69	24.70	20.20	21.71	22.72	2.01	0.21
XML-CNN^♦	35.39	31.93	29.32	35.39	33.74	32.64	28.67	33.27	36.51	-	-	-	-	52.23
XR-Transformers^‡	50.13	44.60	40.69	50.13	47.28	45.60	29.90	34.35	38.63	29.90	32.75	35.03	-	-
XT^*	42.50	37.87	34.41	42.50	40.01	38.43	24.82	28.20	31.24	24.82	26.82	28.29	4.20	8.22

Amazon-3M
Method	P@1	P@3	P@5	N@1	N@3	N@5	PSP@1	PSP@3	PSP@5	PSN@1	PSN@3	PSN@5	Model size (GB)	Train time (hr)


AnnexML^*	49.30	45.55	43.11	49.30	46.79	45.27	11.69	14.07	15.98	-	-	-	-	-
AttentionXML^†	50.86	48.04	45.83	50.86	49.16	47.94	15.52	18.45	20.60	-	-	-	-	-
Bonsai^*	48.45	45.65	43.49	48.45	46.78	45.59	13.79	16.71	18.87	-	-	-	-	-
CascadeXML^♦	53.91	51.24	49.52	-	-	-	-	-	-	-	-	-	-	-
DiSMEC^*	47.34	44.96	42.80	47.36	-	-	-	-	-	-	-	-	-	-
FastXML^*	44.24	40.83	38.59	44.24	41.92	40.47	9.77	11.69	13.25	9.77	11.20	12.29	-	-
MatchXML^♦	55.88	52.39	49.80	55.88	53.90	52.58	17.00	20.55	23.16	17.00	19.56	21.38	113	8.30
Parabel^*	47.48	44.65	42.53	47.48	45.73	44.53	12.82	15.61	17.73	12.82	14.89	16.38	-	-
PfastreXML^*	43.83	41.81	40.09	43.83	42.68	41.75	21.38	23.22	24.52	21.38	22.75	23.68	-	-
Renee	54.84	52.08	49.77	54.84	53.31	52.13	15.74	19.06	21.54	15.74	18.02	19.64	-	-
XR-Transformers^‡	53.67	50.29	47.74	53.67	51.74	50.42	16.54	19.94	22.39	16.54	18.99	20.71	-	-

AmazonCat-13K
Method	P@1	P@3	P@5	N@1	N@3	N@5	PSP@1	PSP@3	PSP@5	PSN@1	PSN@3	PSN@5	Model size (GB)	Train time (hr)


AnnexML^*	93.54	78.37	63.30	93.54	87.29	85.10	49.04	61.13	69.64	49.04	58.83	65.47	18.61	3.45
APLC-XLNet^♦	94.56	79.82	64.61	94.56	88.74	86.66	52.22	65.08	71.40	52.22	62.57	67.92	0.50	-
AttentionXML^♦	95.92	82.41	67.31	95.92	91.17	89.48	53.76	68.72	76.38	-	-	-	-	-
Bonsai^*	92.98	79.13	64.46	92.98	87.68	85.92	51.30	64.60	72.48	-	-	-	0.55	1.26
CascadeXML^♦	96.71	84.07	68.69	-	-	-	51.39	66.81	77.58	-	-	-	-	-
DiSMEC^*	93.40	79.10	64.10	93.40	87.70	85.80	59.10	67.10	71.20	59.10	65.20	68.80	-	-
FastXML^*	93.11	78.20	63.41	93.11	87.07	85.16	48.31	60.26	69.30	48.31	56.90	62.75	-	-
LightXML^♦	96.77	84.02	68.70	-	-	-	-	-	-	-	-	-	-	-
MatchXML^♦	96.83	83.83	68.20	96.83	92.59	90.62	48.02	64.26	75.65	48.02	60.85	69.30	2.2	6.60
PD-Sparse^*	90.60	75.14	60.69	90.60	84.00	82.05	49.58	61.63	68.23	49.58	58.28	62.68	-	-
Parabel^*	93.03	79.16	64.52	93.03	87.72	86.00	50.93	64.00	72.08	50.93	60.37	65.68	0.62	0.63
PfastreXML^*	91.75	77.97	63.68	91.75	86.48	84.96	69.52	73.22	75.48	69.52	72.21	73.67	19.02	5.69
SLEEC^*	90.53	76.33	61.52	90.53	84.96	82.77	46.75	58.46	65.96	46.75	55.19	60.08	-	-
XML-CNN^♦	93.26	77.06	61.40	93.26	86.20	83.43	52.42	62.83	67.10	-	-	-	-	-
XT^*	92.59	78.24	63.58	92.59	86.90	85.03	49.61	62.22	70.24	49.61	59.71	66.04	0.46	7.14
XTransformer^♦	96.70	83.85	68.58	-	-	-	-	-	-	-	-	-	-	-

References

[01] K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain, Sparse Local Embeddings for Extreme Multi-label Classification, in NeurIPS 2015.
[02] R. Agrawal, A. Gupta, Y. Prabhu and M. Varma, Multi-Label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages, in WWW 2013.
[03] Y. Prabhu and M. Varma, FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning, in KDD 2014.
[04] J. Weston, A. Makadia, and H. Yee, Label Partitioning For Sublinear Ranking, in ICML 2013.
[05] H. Yu, P. Jain, P. Kar, and I. Dhillon, Large-scale Multi-label Learning with Missing Labels, in ICML 2014.
[06] D. Hsu, S. Kakade, J. Langford, and T. Zhang, Multi-Label Prediction via Compressed Sensing, in NeurIPS 2009.
[07] F. Tai, and H. Lin, Multi-label Classification with Principle Label Space Transformation , in Neural Computation,2012.
[08] W. Bi, and J. Kwok, Efficient Multi-label Classification with Many Labels , in ICML, 2013.
[09] Y. Chen, and H. Lin, Feature-aware Label Space Dimension Reduction for Multi-label Classification , in NeurIPS, 2012.
[10] C. Ferng, and H. Lin, Multi-label Classification with Error-correcting Codes, in ACML, 2011.
[11] J. Weston, S. Bengio, and N. Usunier, WSABIE: Scaling Up To Large Vocabulary Image Annotation , in IJCAI, 2011.
[12] S. Ji, L. Tang, S. Yu, and J. Ye, Extracting Shared Subspaces for Multi-label Classification , in KDD, 2008.
[13] Z. Lin, G. Ding, M. Hu, and J. Wang, Multi-label Classification via Feature-aware Implicit Label Space Encoding , in ICML, 2014.
[14] P. Mineiro, and N. Karampatziakis, Fast Label Embeddings via Randomized Linear Algebra, Preprint, 2015.
[15] N. Karampatziakis, and P. Mineiro, Scalable Multilabel Prediction via Randomized Methods, Preprint, 2015.
[16] K. Balasubramanian, and G. Lebanon, The Landmark Selection Method for Multiple Output Prediction, Preprint, 2012.
[17] M. Cisse, N. Usunier, T. Artieres, and P. Gallinari, Robust Bloom Filters for Large Multilabel Classification Tasks , in NIPS, 2013.
[18] B. Hariharan, S. Vishwanathan, and M. Varma, Efficient max-margin multi-label classification with applications to zero-shot learning, in Machine Learning Journal, 2012.
[19] C. Snoek, M. Worring, J. van Gemert, J.-M. Geusebroek, and A. Smeulders, The challenge problem for automated detection of 101 semantic concepts in multimedia, in ACM Multimedia, 2006.
[20] I. Katakis, G. Tsoumakas, and I. Vlahavas, Multilabel text classification for automated tag suggestion, in ECML/PKDD Discovery Challenge, 2008.
[21] G. Tsoumakas, I. Katakis, and I. Vlahavas, Effective and efficient multilabel classification in domains with large number of labels, in ECML/PKDD 2008 Workshop on Mining Multidimensional Data, 2008.
[22] J. Leskovec and A. Krevl, SNAP Datasets: Stanford large network dataset collection, 2014.
[23] A. Zubiaga, Enhancing navigation on wikipedia with social tags, Preprint, 2009.
[24] R. Wetzker, C. Zimmermann, and C. Bauckhage, Analyzing social bookmarking systems: A del.icio.us cookbook, in Mining Social Data (MSoDa) Workshop Proceedings, ECAI, 2008.
[25] I. Partalas, A Kosmopoulos, N Baskiotis, T Artieres, G Paliouras, E Gaussier, I Androutsopoulos, M.-R. Amini and P Galinari, LSHTC: A Benchmark for Large-Scale Text Classification, Preprint , 2015
[26] D. D. Lewis, Y. Yang, T. Rose, and F. Li, RCV1: A New Benchmark Collection for Text Categorization Research in JMLR, 2004.
[27] E. L. Mencia, and J. Furnkranz, Efficient pairwise multilabel classification for large-scale problems in the legal domain in ECML/PKDD, 2008.
[28] J. McAuley, and J. Leskovec, Hidden factors and hidden topics: understanding rating dimensions with review text in Proceedings of the 7th ACM conference on Recommender systems ACM, 2013.
[29] J. McAuley, C. Targett, Q. Shi, and A. v. d. Hengel, Image-based Recommendations on Styles and Substitutes in International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015.
[30] J. McAuley, R. Pandey, and J. Leskovec, Inferring networks of substitutable and complementary products in KDD, 2015.
[31] H. Jain, Y. Prabhu, and M. Varma, Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications in KDD, 2016.
[32] R. Babbar, and B. Schölkopf, DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification in WSDM, 2017.
[33] I. E. H. Yen, X. Huang, K. Zhong, P. Ravikumar and I. S. Dhillon, PD-Sparse: A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification in ICML, 2016.
[34] I. E. H. Yen, X. Huang, W. Dai, P. Ravikumar I. S. Dhillon and E.-P. Xing, PPDSparse: A Parallel Primal-Dual Sparse Method for Extreme Classification in KDD, 2017.
[35] K. Jasinska, K. Dembczynski, R. Busa-Fekete, K. Pfannschmidt, T. Klerx and E. Hullermeier, Extreme F-Measure Maximization using Sparse Probability Estimates in ICML, 2017.
[36] J. Liu, W-C. Chang, Y. Wu and Y. Yang, Deep Learning for Extreme Multi-label Text Classification in SIGIR, 2017.
[37] Y. Jernite, A. Choromanska, D. Sontag, Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation in ICML, 2017.
[38] Y. Prabhu, A. Kag, S. Harsola, R. Agrawal and M. Varma, Parabel: Partitioned Label Trees for Extreme Classification with Application to Dynamic Search Advertising in WWW, 2018.
[39] I. Evron, E. Moroshko and K. Crammer, Efficient Loss-Based Decoding on Graphs for Extreme Classification in NeurIPS, 2018.
[40] A. Niculescu-Mizil and E. Abbasnejad, Label Filters for Large Scale Multilabel Classification in AISTATS, 2017.
[41] H. Jain, V. Balasubramanian, B. Chunduri and M. Varma, Slice: Scalable linear extreme classifiers trained on 100 million labels for related searches, in WSDM 2019.
[42] A. Jalan, P. Kar, Accelerating Extreme Classification via Adaptive Feature Agglomeration, in IJCAI 2019.
[43] R. Babbar, and B. Schölkopf, Data Scarcity, Robustness and Extreme Multi-label Classification in Machine Learning Journal and European Conference on Machine Learning, 2019.
[44] S. Khandagale, H. Xiao and R. Babbar, Bonsai - Diverse and Shallow Trees for Extreme Multi-label Classification, in ArXiv 2019.
[45] W. Siblini, F. Meyer and P. Kuntz, CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning, in ICML 2018.
[46] V. Gupta, R. Wadbude, N. Natarajan, H. Karnick, P. Jain and P. Rai, Distributional Semantics meets Multi-Label Learning, in AAAI 2019.
[47] I. Chalkidis, E. Fergadiotis, P. Malakasiotis, N. Aletras and I. Androutsopoulos, Extreme Multi-Label Legal Text Classification: A Case Study in EU Legislation, in Natural Legal Language Processing Workshop 2019.
[47b] I. Chalkidis, E. Fergadiotis, P. Malakasiotis, N. Aletras and I. Androutsopoulos, EURLEX57K Dataset.
[48] I. Chalkidis, E. Fergadiotis, P. Malakasiotis, and I. Androutsopoulos, Large-Scale Multi-Label Text Classification on EU Legislation, in ACL 2019.
[49] G. Tsoumakas, E. Spyromitros-Xioufis, J. Vilcek and I. Vlahavas, Mulan: A Java Library for Multi-Label Learning, in JMLR 2011.
[50] A. Jalan and P. Kar, Accelerating Extreme Classification via Adaptive Feature Agglomeration, in IJCAI 2019.
[51] R. You, S. Dai, Z. Zhang, H. Mamitsuka, and S. Zhu, AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Network, in NeurIPS 2019.
[52] T. K. R. Medini, Q. Huang, Y. Wang, V. Mohan, and A. Shrivastava, Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products, in NeurIPS 2019.
[53] W-C. Chang, H.-F. Yu, K. Zhong, Y. Yang, and I. Dhillon, Taming Pretrained Transformers for Extreme Multi-label Text Classification, in KDD 2020.
[54] T. Jiang, D. Wang, L. Sun, H. Yang, Z. Zhao, F. Zhuang, LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification, in AAAI 2021.
[55] K. Dahiya, D. Saini, A. Mittal, A. Shaw, K. Dave, A. Soni, H. Jain, S. Agarwal and M. Varma, DeepXML: A deep extreme multi-Label learning framework applied to short text documents, in WSDM 2021.
[56] A. Mittal, K. Dahiya, S. Agrawal, D. Saini, S. Agarwal, P. Kar and M. Varma, DECAF: Deep extreme classification with label features, in WSDM 2021.
[57] A. Mittal, N. Sachdeva, S. Agrawal, S. Agarwal, P. Kar and M. Varma, ECLARE: Extreme classification with label graph correlations, in TheWebConf 2021.
[58] D. Saini, A. K. Jain, K. Dave, J. Jiao, A. Singh, R. Zhang and M. Varma, GalaXC: Graph neural networks with labelwise attention for extreme classification, in TheWebConf 2021.
[59] H. Ye, Z. Chen, D.-H. Wang, B.-D. Davison, Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification, in ICML 2020.
[60] Y. Prabhu, A. Kag, S. Gopinath, K. Dahiya, S. Harsola, R. Agrawal and M. Varma, Extreme multi-label learning with label features for warm-start tagging, ranking and recommendation in WSDM, 2018.
[61] M. Qaraei, E. Schultheis, P. Gupta, and R. Babbar, Convex Surrogates for Unbiased Loss Functions in Extreme Classification With Missing Labels in TheWebConf , 2021.
[62] N. Gupta, S. Bohra, Y. Prabhu, S. Purohit and M. Varma, Generalized zero-Shot extreme multi-label learning, in KDD 2021.
[63] K. Dahiya, A. Agarwal, D. Saini, K. Gururaj, J. Jiao, A. Singh, S. Agarwal, P. Kar and M. Varma, SiameseXML: Siamese networks meet extreme classifiers with 100M labels, in ICML 2021.
[64] J. Ni, J. Li and J. McAuley, Justifying recommendations using distantly-labeled reviews and fined-grained aspects in Proceedings of Empirical Methods in Natural Language Processing (EMNLP), 2019.
[65] A. Mittal, K. Dahiya, S. Malani, J. Ramaswamy, S. Kuruvilla, J. Ajmera, K-h. Chang, S. Agarwal, P. Kar and M. Varma, Multimodal extreme classification, in CVPR 2022.
[66] K. Dahiya, N. Gupta, D. Saini, A. Soni, Y. Wang, K. Dave, J. Jiao, G. K, P. Dey, A. Singh, D. Hada, V. Jain, B. Paliwal, A. Mittal, S. Mehta, R. Ramjee, S. Agarwal, P. Kar and M. Varma, NGAME: Negative Mining-aware Mini-batching for Extreme Classification, in ArXiv 2022.
[67] E. Schultheis and R. Babbar, Speeding-up One-vs-All Training for Extreme Classification via Smart Initialization, in ECML-MLJ 2022.
[68] E. Chien, J. Zhang, C.-J. Hsieh, J.-Y. Jiang, W.-C. Chang, O. Milenkovic and H.-F. Yu, PINA: Leveraging Side Information in eXtreme Multi-label Classification via Predicted Instance Neighborhood Aggregation, in ICML 2023.
[69] V. Jain, J. Prakash, D. Saini, J. Jiao, R. Ramjee and M. Varma, Renee: End-to-end training of extreme classification models, in MLSys 2023.
[70] K. Dahiya, S. Yadav, S. Sondhi, D. Saini, S. Mehta, J. Jiao, S. Agarwal, P. Kar and M. Varma, Deep encoders with auxiliary parameters for extreme classification, in KDD 2023.
[71] S. Kharbanda, A. Banerjee, R. Schultheis and R. Babbar, CascadeXML : Rethinking Transformers for End-to-end Multi-resolution Training in Extreme Multi-Label Classification, in NeurIPS 2022.
[72] S. Kharbanda, A. Banerjee, D. Gupta, A. Palrecha, and R. Babbar, InceptionXML : A Lightweight Framework with Synchronized Negative Sampling for Short Text Extreme Classification, in SIGIR 2023.
[73] H. Ye, R. Sunderraman, S. Ji, MatchXML: An Efficient Text-Label Matching Framework for Extreme Multi-Label Text Classification, in TKDE 2024.

Method	P@1	P@3	P@5	N@1	N@3	N@5	PSP@1	PSP@3	PSP@5	PSN@1	PSN@3	PSN@5	Model size (GB)	Train time (hr)


AnnexML^*	87.82	73.45	59.17	87.82	81.50	79.22	70.14	72.76	74.02	70.14	72.31	73.13	-	-
CPLST^*	83.82	67.32	52.80	83.82	75.29	71.92	66.23	65.28	63.70	66.23	65.89	64.77	-	-
CS^*	78.95	60.93	44.27	78.95	68.97	62.88	62.53	58.97	53.23	62.53	60.33	56.50	-	-
DiSMEC^*	81.86	62.52	45.11	81.86	70.21	63.71	62.23	59.85	54.03	62.25	61.05	57.26	-	-
FastXML^*	83.57	65.78	49.97	83.57	74.06	69.34	66.06	63.83	61.11	66.06	64.83	62.94	-	-
LEML^*	81.29	64.74	49.83	81.29	72.92	69.37	64.24	62.73	59.92	64.24	63.47	61.57	-	-
LPSR^*	83.57	65.50	48.57	83.57	73.84	68.18	66.06	63.53	59.38	66.06	64.63	61.84	-	-
ML-CSSP	83.98	67.37	53.02	83.98	75.31	72.21	66.88	65.90	64.90	66.88	66.47	65.71	-	-
PD-Sparse^*	-	-	-	-	-	-	-	-	-	-	-	-	-	-
PPD-Sparse^*	86.50	68.40	53.20	86.50	77.30	75.60	64.30	61.30	60.80	64.30	63.60	62.80	-	-
Parabel^*	-	-	-	-	-	-	-	-	-	-	-	-	-	-
PfastreXML^*	84.22	67.33	53.04	84.22	75.41	72.37	66.67	65.43	64.30	66.08	66.08	65.24	-	-
SLEEC^*	84.01	67.20	52.80	84.01	75.23	71.96	66.34	65.11	63.62	66.34	65.79	64.71	-	-
WSABIE	83.35	66.18	51.46	83.35	74.21	70.55	65.79	64.07	61.89	65.79	64.88	63.36	-	-
kNN^*	83.91	67.12	52.99	83.91	75.22	72.21	66.51	65.21	64.30	66.51	65.91	65.20	-	-

Method	P@1	P@3	P@5	N@1	N@3	N@5	PSP@1	PSP@3	PSP@5	PSN@1	PSN@3	PSN@5	Model size (GB)	Train time (hr)


1-vs-All	62.62	39.09	28.79	62.62	59.13	61.58	48.84	52.96	59.29	48.84	51.62	55.09	-	-
CPLST^*	62.38	37.84	27.62	62.38	57.63	59.71	48.17	50.86	56.42	48.17	49.94	52.96	-	-
CS^*	58.87	33.53	23.72	58.87	52.19	53.25	46.04	45.08	48.17	46.04	45.25	46.89	-	-
DiSMEC^*	-	-	-	-	-	-	-	-	-	-	-	-	-	-
FastXML^*	63.42	39.23	28.86	63.42	59.51	61.70	48.54	52.30	58.28	48.54	51.11	54.38	-	-
LEML^*	62.54	38.41	28.21	62.54	58.22	60.53	47.97	51.42	57.53	47.97	50.25	53.59	-	-
LPSR^*	62.11	36.65	26.53	62.11	56.50	58.23	49.20	50.14	55.01	49.20	49.78	52.41	-	-
ML-CSSP	44.98	30.43	23.53	44.98	44.67	47.97	32.38	38.68	45.96	32.38	36.73	40.74	-	-
PD-Sparse^*	61.29	35.82	25.74	61.29	55.83	57.35	48.34	48.77	52.93	48.34	48.49	50.72	-	-
PPD-Sparse^*	-	-	-	-	-	-	-	-	-	-	-	-	-	-
Parabel^*	64.53	38.56	27.94	64.53	59.35	61.06	50.88	52.42	57.36	50.88	51.90	54.58	-	-
PfastreXML^*	63.46	39.22	29.14	63.46	59.61	62.12	52.28	54.36	60.55	52.28	53.62	56.99	-	-
ProXML^*	64.60	39.00	28.20	64.40	59.20	61.50	50.10	52.00	58.30	50.10	52.00	55.10	-	-
SLEEC^*	65.08	39.64	28.87	65.08	60.47	62.64	51.12	53.95	59.56	51.12	52.99	56.04	-	-
WSABIE	54.78	32.39	23.98	54.78	50.11	52.39	43.39	44.00	49.30	43.39	43.64	46.50	-	-
kNN^*	57.04	34.38	25.44	57.04	52.29	54.64	43.71	45.82	51.64	43.71	45.04	48.20	-	-

Method	P@1	P@3	P@5	N@1	N@3	N@5	PSP@1	PSP@3	PSP@5	PSN@1	PSN@3	PSN@5	Model size (GB)	Train time (hr)


AnnexML^*	-	-	-	-	-	-	-	-	-	-	-	-	-	-
CPLST^*	65.31	59.95	55.31	65.31	61.16	57.80	31.10	32.40	33.02	31.10	32.07	32.55	-	-
CS^*	61.36	56.46	52.07	61.36	57.66	54.44	30.60	31.84	32.26	30.60	31.54	31.89	-	-
DiSMEC^*	-	-	-	-	-	-	-	-	-	-	-	-	-	-
FastXML^*	69.61	64.12	59.27	69.61	65.47	61.90	32.35	34.51	35.43	32.35	34.00	34.73	-	-
LEML^*	65.67	60.55	56.08	65.67	61.77	58.47	30.73	32.43	33.26	30.73	32.01	32.66	-	-
LPSR^*	65.01	58.96	53.49	65.01	60.45	56.38	31.34	32.57	32.77	31.34	32.29	32.50	-	-
ML-CSSP	63.04	56.26	50.16	63.04	57.91	53.36	29.48	30.27	30.02	29.48	30.10	29.98	-	-
PD-Sparse^*	51.82	44.18	38.95	51.82	46.00	42.02	25.22	24.63	23.85	25.22	24.80	24.25	-	-
Parabel^*	67.44	61.83	56.75	67.44	63.15	59.41	32.69	34.00	34.53	32.69	33.69	34.10	-	-
PfastreXML^*	67.13	62.33	58.62	67.13	63.48	60.74	34.57	34.80	35.86	34.57	34.71	35.42	-	-
SLEEC^*	67.59	61.38	56.56	67.59	62.87	59.28	32.11	33.21	33.83	32.11	32.93	33.41	-	-
WSABIE	64.13	58.13	53.64	64.13	59.59	56.25	31.25	32.02	32.47	31.25	31.84	32.18	-	-
kNN^*	64.95	58.89	54.11	64.95	60.32	56.77	31.03	32.02	32.43	31.03	31.76	32.09	-	-

Method	P@1	P@3	P@5	N@1	N@3	N@5	PSP@1	PSP@3	PSP@5	PSN@1	PSN@3	PSN@5	Model size (GB)	Train time (hr)


AnnexML^*	79.26	64.30	52.33	79.26	68.13	61.60	34.25	39.83	42.76	34.25	38.35	40.30	0.09	0.06
APLC-XLNet^♦	87.72	74.56	62.28	87.72	77.90	71.75	42.93	49.84	53.07	42.93	48.00	50.40	0.48	-
Bonsai^*	82.96	69.76	58.31	82.96	73.15	67.41	37.08	45.13	49.57	37.08	42.94	46.10	0.02	0.03
CPLST^*	58.52	45.51	32.47	58.52	48.67	40.79	24.97	27.46	25.04	24.97	26.82	25.57	-	-
CS^*	62.09	48.39	40.11	62.09	51.63	47.11	24.94	27.19	28.90	25.94	26.56	27.67	-	-
DiSMEC^*	82.40	68.50	57.70	82.40	72.50	66.70	41.20	45.40	49.30	41.20	44.30	46.90	-	-
FastXML^*	76.37	63.36	52.03	76.37	66.63	60.61	33.17	39.68	41.99	33.17	37.92	39.55	0.26	0.07
LEML^*	68.55	55.11	45.12	68.55	58.44	53.03	31.16	34.85	36.82	31.16	33.85	35.17	-	-
LPSR^*	79.89	66.01	53.80	79.89	69.62	63.04	37.97	44.01	46.17	37.97	42.44	43.97	-	-
MatchXML^♦	88.85	76.02	63.30	88.85	79.50	73.26	46.73	54.23	58.19	46.73	52.33	55.29	0.6	0.20
ML-CSSP^*	75.45	62.70	52.51	75.45	65.97	60.78	43.86	45.72	46.97	43.86	45.23	46.03	-	-
PD-Sparse^*	83.83	70.72	59.21	-	-	-	37.61	46.05	50.79	-	-	-	-	-
PPD-Sparse^*	83.40	70.90	59.10	83.40	74.40	68.20	45.20	48.50	51.00	45.20	47.50	49.10	-	-
Parabel^*	82.25	68.71	57.53	82.25	72.17	66.54	36.44	44.08	48.46	36.44	41.99	44.91	0.03	0.02
PfastreXML^*	71.36	59.90	50.39	71.36	62.87	58.06	26.62	34.16	38.96	26.62	32.07	35.23	-	-
SLEEC^*	63.40	50.35	41.28	63.40	53.56	48.47	24.10	27.20	29.09	24.10	26.37	27.62	-	-
WSABIE^*	72.28	58.16	47.73	72.28	61.64	55.92	28.60	32.49	34.46	28.60	31.45	32.77	-	-
XT^*	78.97	65.64	54.44	78.97	69.05	63.23	33.52	40.35	44.02	33.52	38.50	41.09	0.03	0.10
kNN^*	81.73	68.78	57.44	81.73	72.15	66.40	36.36	44.04	48.29	36.36	41.95	44.78	-	-

Method	P@1	P@3	P@5	N@1	N@3	N@5	PSP@1	PSP@3	PSP@5	PSN@1	PSN@3	PSN@5	Model size (GB)	Train time (hr)


AnnexML^*	86.49	74.27	64.20	86.49	77.13	69.44	11.90	12.76	13.58	11.90	12.53	13.10	0.62	0.39
APLC-XLNet^♦	89.44	78.93	69.73	89.44	81.38	74.41	14.84	15.85	17.04	14.84	15.58	16.40	0.54	-
CascadeXML^♦	89.18	79.71	71.19	-	-	-	13.32	15.35	17.45	-	-	-	-	-
AttentionXML^♦	87.47	78.48	69.37	87.47	80.61	73.79	15.57	16.80	17.82	-	-	-	-	-
Bonsai^*	84.69	73.69	64.39	84.69	76.25	69.17	11.78	13.27	14.28	11.78	12.89	13.61	0.13	0.64
DiSMEC^*	85.20	74.60	65.90	84.10	77.10	70.40	13.60	13.10	13.80	13.60	13.20	13.60	-	-
FastXML^*	83.03	67.47	57.76	84.31	75.35	63.36	9.80	10.17	10.54	9.80	10.08	10.33	-	-
LEML^*	73.47	62.43	54.35	73.47	64.92	58.69	9.41	10.07	10.55	9.41	9.90	10.24	-	-
LPSR-NB^*	72.72	58.51	49.50	72.72	61.71	54.63	12.79	12.26	12.13	12.79	12.38	12.27	-	-
LightXML^♦	89.45	78.96	69.85	-	-	-	-	-	-	-	-	-	-	-
MatchXML^♦	89.74	81.51	72.18	89.74	83.46	76.53	16.92	19.29	20.93	16.92	18.70	19.91	2.9	0.22
Parabel^*	84.17	72.46	63.37	84.17	75.22	68.22	11.68	12.73	13.69	11.68	12.47	13.14	0.18	0.20
PfastreXML^*	83.57	68.61	59.10	83.57	72.00	64.54	19.02	18.34	18.43	19.02	18.49	18.52	-	-
SLEEC^*	85.88	72.98	62.70	85.88	76.02	68.13	11.14	11.86	12.40	11.14	11.68	12.06	1.13	0.21
XML-CNN^♦	81.42	66.23	56.11	81.42	69.78	61.83	9.39	10.00	10.20	-	-	-	-	-
XT^*	86.15	75.18	65.41	86.15	77.76	70.35	11.87	13.08	13.89	11.87	12.78	13.36	0.37	0.39
XTransformer^♦	88.51	78.71	69.62	-	-	-	-	-	-	-	-	-	-	-

Method	P@1	P@3	P@5	N@1	N@3	N@5	PSP@1	PSP@3	PSP@5	PSN@1	PSN@3	PSN@5	Model size (GB)	Train time (hr)


AnnexML^*	46.79	40.72	37.67	46.79	42.17	39.84	7.18	8.05	8.74	7.18	7.78	8.22	10.74	2.58
Bonsai^*	46.69	39.88	36.38	46.69	41.51	38.84	7.26	7.97	8.53	7.26	7.75	8.10	3.91	64.42
DiSMEC^*	45.50	38.70	35.50	45.50	40.90	37.80	6.50	7.60	8.40	6.50	7.50	7.90	-	-
FastXML^*	43.07	38.66	36.19	43.07	39.70	37.83	6.48	7.52	8.31	6.51	7.26	7.79		-
LEML^*	40.73	37.71	35.84	40.73	38.44	37.01	6.06	7.24	8.10	6.06	6.93	7.52	-	-
LPSR-NB	18.59	15.43	14.07	18.59	16.17	15.13	3.24	3.42	3.64	3.24	3.37	3.52	-	-
PD-Sparse^*	34.37	29.48	27.04	34.37	30.60	28.65	5.29	5.80	6.24	5.29	5.66	5.96	-	-
PPD-Sparse^*	-	-	-	-	-	-	-	-	-	-	-	-	-	-
Parabel^*	46.86	40.08	36.70	46.86	41.69	39.10	7.22	7.94	8.54	7.22	7.71	8.09	6.36	9.58
Parabel^*	46.97	40.08	36.63	46.97	41.72	39.07	7.25	7.94	8.52	7.25	7.75	8.15	-	-
PfastreXML^*	41.72	37.83	35.58	41.72	38.76	37.08	3.15	3.87	4.43	3.15	3.68	4.06	15.34	3.60
SLEEC^*	47.85	42.21	39.43	47.85	43.52	41.37	7.17	8.16	8.96	7.17	7.89	8.44	-	-
XT^*	45.59	39.10	35.92	45.59	40.62	38.17	6.96	7.71	8.33	6.96	7.47	7.86	2.70	31.22

Method	P@1	P@3	P@5	N@1	N@3	N@5	PSP@1	PSP@3	PSP@5	PSN@1	PSN@3	PSN@5	Model size (GB)	Train time (hr)


AnnexML^*	63.30	40.64	29.80	63.30	56.61	56.24	25.13	30.46	34.30	25.13	31.16	34.36	29.70	4.24
Bonsai^*	66.41	44.40	32.92	66.41	60.69	60.53	28.11	35.36	39.73	28.11	35.42	38.94	2.43	3.04
DiSMEC^*	64.40	42.50	31.50	64.40	58.50	58.40	29.10	35.60	39.50	29.10	35.90	39.40	-	-
FastXML^*	49.75	33.10	24.45	49.75	45.23	44.75	16.35	20.99	23.56	16.35	19.56	21.02	-	-
LEML^*	19.82	11.43	8.39	19.82	14.52	13.73	3.48	3.79	4.27	3.48	3.68	3.94	-	-
LPSR-NB	27.44	16.23	11.77	27.44	23.04	22.55	6.93	7.21	7.86	6.93	7.11	7.46	-	-
PD-Sparse^*	61.26	39.48	28.79	61.26	55.08	54.67	28.34	33.50	36.62	28.34	31.92	33.68	-	-
PPD-Sparse^*	64.08	41.26	30.12	-	-	-	27.47	33.00	36.29	-	-	-	-	-
Parabel^*	65.04	43.23	32.05	65.04	59.15	58.93	26.76	33.27	37.36	26.76	31.26	33.57	3.10	0.75
PfastreXML^*	56.05	36.79	27.09	56.05	50.59	50.13	30.66	31.55	33.12	30.66	31.24	32.09	14.23	6.34
ProXML^*	63.60	41.50	30.80	63.80	57.40	57.10	34.80	37.70	41.00	34.80	38.70	41.50	-	-
SLEEC^*	54.83	33.42	23.85	54.83	47.25	46.16	20.27	23.18	25.08	20.27	22.27	23.35	-	-
XT^*	56.54	37.17	27.73	56.54	50.48	50.36	20.56	25.42	28.90	20.56	25.30	27.90	4.50	1.89