The Extreme Classification Repository: Multi-label Datasets & Code


Kush BhatiaKunal DahiyaHimanshu JainPurushottam KarAnshul Mittal Yashoteja Prabhu Manik Varma

The objective in extreme multi-label classification is to learn feature architectures and classifiers that can automatically tag a data point with the most relevant subset of labels from an extremely large label set. This repository provides resources that can be used for evaluating the performance of extreme multi-label algorithms including datasets, code, and metrics.

Citing the Repository

If you use any of the datasets or results provided on this repository, then please cite

        @Misc{Bhatia16,
          author    = {Bhatia, K. and Dahiya, K. and Jain, H. and Kar, P. and Mittal, A. and Prabhu, Y. and Varma, M.},
          title     = {The extreme classification repository: Multi-label datasets and code},
          url       = {http://manikvarma.org/downloads/XC/XMLRepository.html},
          year      = {2016}
        }
        

Table of Contents

  1. Datasets
  2. Useful Tools
  3. Performance Metrics and Evaluation Protocols
  4. Code for XC Methods
  5. Benchmarked Results
  6. Appendix
  7. References

Datasets

The datasets below consider a variety of XC problems in webpage and production categorization, as well as webpage-to-webpage and product-to-product recommendation tasks. Also included are datasets for which labels also have textual features. The dataset file format information can be found in the README file available [here]. Certain datasets formerly popular in XC research have been excluded from the main table but have been provided in the appendix [here] for legacy reasons. Python and Matlab scripts for reading the datasets have been provided [below].

Please contact Manik Varma if you would like to contribute a dataset.

Naming Convention

To disambiguate possible versions of datasets, the number of labels in the dataset, rounded off, is appended to the dataset name. The legacy dataset previously referred to as DeliciousLarge has been renamed to Delicious-200K and RCV1-X has been renamed to RCV1-2K (see the Appendix [here] for these datasets). Also, datasets that contain label features have the token "LF" prepended to its name. Datasets with the phrase "Titles" in their names are short-text datasets. These correspond to tasks that require prediction to be made given only a short 3-5 word description of the data point, such as the name of a product or title of a webpage, rather than a complete and detailed descriptions of the data point in question. Short-text applications are extremely useful in several ranking and recommendation applications such as those involving product/webpage titles or user queries that are frequently short-texts. Datasets with the phrase "SeeAlso" in their names correspond to tasks requiring related Wikipedia article to be predicted for a given Wikipedia article.


Dataset Download BoW Feature Number of Number of Number of Avg. Points Avg. Labels Original
Dimensionality Labels Train Points Test Points per Label per Point Source

LF-AmazonTitles-131K BoW Features Raw text 40,000 131,073 294,805 134,835 5.15 2.29 [28]
LF-Amazon-131K BoW Features Raw text 80,000 131,073 294,805 134,835 5.15 2.29 [28]
LF-WikiSeeAlsoTitles-320K BoW Features Raw text 40,000 312,330 693,082 177,515 4.67 2.11 -
LF-WikiSeeAlso-320K BoW Features Raw text 80,000 312,330 693,082 177,515 4.67 2.11 -
LF-WikiTitles-500K BoW Features Raw text 80,000 501,070 1,813,391 783,743 17.15 4.74 -
LF-AmazonTitles-1.3M BoW Features Raw text 128,000 1,305,265 2,248,619 970,237 38.24 22.20 [29] + [30]

AmazonCat-13K BoW Features Raw text 203,882 13,330 1,186,239 306,782 448.57 5.04 [28]
AmazonCat-14K BoW Features Raw text 597,540 14,588 4,398,050 1,099,725 1330.1 3.53 [29] + [30]
WikiSeeAlsoTitles-350K BoW Features Raw text 91,414 352,072 629,418 162,491 5.24 2.33 -
WikiTitles-500K BoW Features Raw text 185,479 501,070 1,699,722 722,678 23.62 4.89 -
Wikipedia-500K BoW Features Raw text 2,381,304 501,070 1,813,391 783,743 24.75 4.77 -
AmazonTitles-670K BoW Features Raw text 66,666 670,091 485,176 150,875 5.11 5.39 [28]
Amazon-670K BoW Features Raw text 135,909 670,091 490,449 153,025 3.99 5.45 [28]
AmazonTitles-3M BoW Features Raw text 165,431 2,812,281 1,712,536 739,665 31.55 36.18 [29] + [30]
Amazon-3M BoW Features Raw text 337,067 2,812,281 1,717,899 742,507 31.64 36.17 [29] + [30]

Mediamill BoW Features 120 101 30,993 12,914 1902.15 4.38 [19]
Bibtex BoW Features 1,836 159 4,880 2,515 111.71 2.40 [20]
Delicious BoW Features 500 983 12,920 3,185 311.61 19.03 [21]
RCV1-2K BoW Features 47,236 2,456 623,847 155,962 1218.56 4.79 [26]
EURLex-4K BoW Features 5,000 3,993 15,539 3,809 25.73 5.31 [27] + [47]
EURLex-4.3K BoW Features 200,000 4,271 45,000 6,000 60.57 5.07 [47] + [48]
Wiki10-31K BoW Features 101,938 30,938 14,146 6,616 8.52 18.64 [23]
Delicious-200K BoW Features 782,585 205,443 196,606 100,095 72.29 75.54 [24]
WikiLSHTC-325K BoW Features 1,617,899 325,056 1,778,351 587,084 17.46 3.19 [25]

Dataset statistics & download

Tokenization

Datasets presented the table below offer the option to either download precomputed (e.g. bag-of-words) features or raw text. The tokenization used to create the bag-of-words representation may differ across datasets (e.g. whitespace-separated for legacy datasets vs WordPiece for more recent datasets). It is recommended that if an XC method uses a distinctly novel tokenizer, additional experiments be conducted to measure improvements due to better tokenization alone. This can be done, for example, by re-executing older XC methods with the novel tokenizer.

Split Creation

For each dataset, a single split is offered. Splits were not created randomly but instead created in a way that ensured that every label has at least one training point. This yielded more realistic train/test splits as compared to uniform sampling which could drop many of the infrequently occurring, and hard to classify, labels from the test set. For example, on the WikiLSHTC-325K dataset, uniformly random split creation could loose ninety thousand of the hardest to classify labels from the test set whereas the adopted sampling procedure dropped only forty thousand labels from the test set.
Note: Results computed on the train/test splits provided on this page are therefore not comparable to results computed on the original sources or those using splits created using uniform sampling.

Reciprocal-pair Removal

For the "LF" datasets that concern related item prediction, additional care needs to be taken since introducing label features allowed "reciprocal pairs" to emerge. Specifically, these are pairs of items, say A and B, that are related to each other such that two distinct data points exist, with A appearing as a label for B in one data point, and B appearing as a label for A in the other. To prevent algorithms from achieving artificially high scores by memorizing such pairs without learning anything useful, such pairs were removed from the ground truth in the test set. Please see [here] for a protocol on how to perform prediction while avoiding such reciprocal pairs using filter files provided with these datasets.

Useful Tools

The following two resources provide several tools The above tools can be used to perform various useful operations including
  1. Reading and writing the datasets in the given file format
  2. Preprocessing raw text using various tokenizers to generate data point (and label) features, including bag-of-words features
  3. Evaluating various performance measures such as precision, nDCG and their propensity-scored counterparts (see [here] for details)

Performance Metrics and Evaluation Protocols

The benchmarked results below present comparative results of various algorithms with classification accuracy evaluated on several performance measures. The discussion below describes protocols on how to properly evaluate XC methods, especially in the presence of head/tail labels and reciprocal pairs (see [here]).

Performance at the Top

The precision$@k$ and nDCG$@k$ metrics are defined for a predicted score vector $\hat{\mathbf y} \in {\mathbb{R}}^{L}$ and ground truth label vector $\mathbf y \in \left\lbrace 0, 1 \right\rbrace^L$ as \[ \text{P}@k := \frac{1}{k} \sum_{l\in \text{rank}_k (\hat{\mathbf y})} \mathbf y_l \] \[ \text{DCG}@k := \sum_{l\in {\text{rank}}_k (\hat{\mathbf y})} \frac{\mathbf y_l}{\log(l+1)} \] \[ \text{nDCG}@k := \frac{{\text{DCG}}@k}{\sum_{l=1}^{\min(k, \|\mathbf y\|_0)} \frac{1}{\log(l+1)}}, \] where, $\text{rank}_k(\mathbf y)$ returns the $k$ largest indices of $\mathbf{y}$ ranked in descending order.

Propensity-scored Performance at the Top

For datasets that contain excessively popular labels (often referred to as "head" labels), high P@k may be achieved by simply predicting head labels repeatedly irrespective of their relevance to the data point. To check for such trivial behavior, it is recommended that XC methods also be evaluated with respect to propensity-scored counterparts of the precision$@k$ and nDCG$@k$ metrics (PSP$@k$ and PSnDCG$@k$) described below. \[ \text{PSP}@k := \frac{1}{k} \sum_{l\in \text{rank}_k (\hat{\mathbf y})} \frac{\mathbf y_l}{p_l} \] \[ \text{PSDCG}@k := \sum_{l\in {\text{rank}}_k (\hat{\mathbf y})} \frac{\mathbf y_l}{p_l\log(l+1)} \] \[ \text{PSnDCG}@k := \frac{{\text{PSDCG}}@k}{\sum_{l=1}^{k} \frac{1}{\log(l+1)}}, \] where $p_l$ is the propensity score for label $l$ which helps in making metrics unbiased [31] with respect to missing labels. Propensity-scored metrics place specific emphasis on performing well on tail labels and give feeble rewards for predicting popular or head labels. It is recommended that scripts provided [here] be used to compute propensity-scored metrics in order to be consistent with results reported below.

Removal of Reciprocal-pairs

As described [here], reciprocal pairs were removed from the ground truth in the test splits of the LF datasets to avoid trivial predictions from getting rewarded. However, these reciprocal pairs must now be removed from the test predictions of XC methods as well to avoid unnecessary penalization. To do so, it is recommended that predictions of all XC methods on the LF datasets be evaluated using the filter files provided alongwith the datasets and the tools provided in the PyXCLib library linked [here]. Although reciprocal pairs were not removed from the train splits, a separate filter file is provided for the train splits enumerating the reciprocal pairs therein so that methods that wish to eliminate them from train splits may do so. It is to be noted that these filter files are distinct from the ground truth files and only contain lists of reciprocal pairs.

Code for XC Methods

The following lists provide links to code for leading XC methods. For sake of easy identification, the methods have been categorized based on the kind of classifier used, specifically one-vs-all, trees, and embeddings-based methods. Methods that learn deep representations for data points jointly with the classifier have been included as a separate category.

Please contact Manik Varma if you would like us to provide a link to your code.

Benchmarked Results

The tables below provide benchmarked results for various XC methods on several datasets. Rows corresponding to XC methods that utilize deep-learnt features as well as those that utilize label features in the LF datasets have been highlighted in a brown color. Training times are reported on a single GPU except when noted otherwise for methods that necessarily require multiple GPUs to scale. The model sizes mentioned alongsize XC methods are either as reported else on-disk sizes subject to compression. It is notable that executions using distinct platforms/libraries may cause some variance in model sizes making them less reproducible. The tables below offer columns that are sortable in ascending/descending order. To make use of this facility, please click on the name of the column using which you wish to sort the data.

LF-AmazonTitles-131K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 30.05 21.25 16.02 30.05 31.58 34.05 19.23 26.09 32.26 19.23 23.64 26.60 1.95 0.08
Astec 37.12 25.20 18.24 37.12 38.17 40.16 29.22 34.64 39.49 29.22 32.73 35.03 3.24 1.83
AttentionXML 32.25 21.70 15.61 32.25 32.83 34.42 23.97 28.60 32.57 23.97 26.88 28.75 2.61 20.73
Bonsai* 34.11 23.06 16.63 34.11 34.81 36.57 24.75 30.35 34.86 24.75 28.32 30.47 0.24 0.10
DECAF 38.40 25.84 18.65 38.40 39.43 41.46 30.85 36.44 41.42 30.85 34.69 37.13 0.81 2.16
DiSMEC* 35.14 23.88 17.24 35.14 36.17 38.06 25.86 32.11 36.97 25.86 30.09 32.47 0.11 3.10
ECLARE 40.74 27.54 19.88 40.74 42.01 44.16 33.51 39.55 44.70 33.51 37.70 40.21 0.72 2.16
GalaXC 39.17 26.85 19.49 39.17 40.82 43.06 32.50 38.79 43.95 32.50 36.86 39.37 0.67 0.42
MACH 33.49 22.71 16.45 33.49 34.36 36.16 24.97 30.23 34.72 24.97 28.41 30.54 2.35 3.30
Parabel* 32.60 21.80 15.61 32.60 32.96 34.47 23.27 28.21 32.14 23.27 26.36 28.21 0.34 0.03
PfastreXML* 32.56 22.25 16.05 32.56 33.62 35.26 26.81 30.61 34.24 26.81 29.02 30.67 3.02 0.26
Slice+FastText* 30.43 20.50 14.84 30.43 31.07 32.76 23.08 27.74 31.89 23.08 26.11 28.13 0.39 0.08
X-Transformer 29.95 18.73 13.07 29.95 28.75 29.60 21.72 24.42 27.09 21.72 23.18 24.39 - -
XT* 31.41 21.39 15.48 31.41 32.17 33.86 22.37 27.51 31.64 22.37 25.58 27.52 0.84 9.46

LF-Amazon-131K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 35.73 25.46 19.41 35.73 37.81 41.08 23.56 31.97 39.95 23.56 29.07 33.00 4.01 0.68
Astec 42.22 28.62 20.85 42.22 43.57 46.06 32.95 39.42 45.30 32.95 37.45 40.35 5.52 3.39
AttentionXML 42.90 28.96 20.97 42.90 44.07 46.44 32.92 39.51 45.24 32.92 37.49 40.33 5.04 50.17
Bonsai* 40.23 27.29 19.87 40.23 41.46 43.84 29.60 36.52 42.39 29.60 34.43 37.34 0.46 0.40
DECAF 42.94 28.79 21.00 42.94 44.25 46.84 34.52 41.14 47.33 34.52 39.35 42.48 1.86 1.80
DiSMEC* 41.68 28.32 20.58 41.68 43.22 45.69 31.61 38.96 45.07 31.61 36.97 40.05 0.45 7.12
MACH 34.52 23.39 17.00 34.52 35.53 37.51 25.27 30.71 35.42 25.27 29.02 31.33 4.57 13.91
Parabel* 39.57 26.64 19.26 39.57 40.48 42.61 28.99 35.36 40.69 28.99 33.36 35.97 0.62 0.10
PfastreXML* 35.83 24.35 17.60 35.83 36.97 38.85 28.99 33.24 37.40 28.99 31.65 33.62 5.30 1.54
Slice+FastText* 32.07 22.21 16.52 32.07 33.54 35.98 23.14 29.08 34.63 23.14 27.25 30.06 0.39 0.11
XT* 34.31 23.27 16.99 34.31 35.18 37.26 24.35 29.81 34.70 24.35 27.95 30.34 0.92 1.38

LF-WikiSeeAlsoTitles-320K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 16.30 11.24 8.84 16.30 16.19 17.14 7.24 9.63 11.75 7.24 9.06 10.43 4.22 0.21
Astec 22.72 15.12 11.43 22.72 22.16 22.87 13.69 15.81 17.50 13.69 15.56 16.75 7.30 4.17
AttentionXML 17.56 11.34 8.52 17.56 16.58 17.07 9.45 10.63 11.73 9.45 10.45 11.24 6.02 56.12
Bonsai* 19.31 12.71 9.55 19.31 18.74 19.32 10.69 12.44 13.79 10.69 12.29 13.29 0.37 0.37
DECAF 25.14 16.90 12.86 25.14 24.99 25.95 16.73 18.99 21.01 16.73 19.18 20.75 1.76 11.16
DiSMEC* 19.12 12.93 9.87 19.12 18.93 19.71 10.56 13.01 14.82 10.56 12.70 14.02 0.19 15.56
ECLARE 29.35 19.83 15.05 29.35 29.21 30.20 22.01 24.23 26.27 22.01 24.46 26.03 1.67 13.46
GalaXC 27.87 18.75 14.30 27.87 26.84 27.60 19.77 22.25 24.47 19.77 21.70 23.16 1.08 1.08
MACH 18.06 11.91 8.99 18.06 17.57 18.17 9.68 11.28 12.53 9.68 11.19 12.14 2.51 8.23
Parabel* 17.68 11.48 8.59 17.68 16.96 17.44 9.24 10.65 11.80 9.24 10.49 11.32 0.60 0.07
PfastreXML* 17.10 11.13 8.35 17.10 16.80 17.35 12.15 12.51 13.26 12.15 12.81 13.48 6.77 0.59
Slice+FastText* 18.55 12.62 9.68 18.55 18.29 19.07 11.24 13.45 15.20 11.24 13.03 14.23 0.94 0.20
XT* 17.04 11.31 8.60 17.04 16.61 17.24 8.99 10.52 11.82 8.99 10.33 11.26 1.90 5.28

LF-WikiSeeAlso-320K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 30.79 20.88 16.47 30.79 30.02 31.64 13.48 17.92 22.21 13.48 16.52 19.08 12.13 2.40
Astec 40.07 26.69 20.36 40.07 39.36 40.88 23.41 28.08 31.92 23.41 27.48 30.17 13.46 7.11
AttentionXML 40.50 26.43 19.87 40.50 39.13 40.26 22.67 26.66 29.83 22.67 26.13 28.38 7.12 90.37
Bonsai* 34.86 23.21 17.66 34.86 34.09 35.32 18.19 22.35 25.66 18.19 21.62 23.84 0.84 1.39
DECAF 41.36 28.04 21.38 41.36 41.55 43.32 25.72 30.93 34.89 25.72 30.69 33.69 4.84 13.40
DiSMEC* 34.59 23.58 18.26 34.59 34.43 36.11 18.95 23.92 27.90 18.95 23.04 25.76 1.28 58.79
MACH 27.18 17.38 12.89 27.18 26.09 26.80 13.11 15.28 16.93 13.11 15.17 16.48 11.41 50.22
Parabel* 33.46 22.03 16.61 33.46 32.40 33.34 17.10 20.73 23.53 17.10 20.02 21.88 1.18 0.33
PfastreXML* 28.79 18.38 13.60 28.79 27.69 28.28 17.12 18.19 19.43 17.12 18.23 19.20 14.02 4.97
Slice+FastText* 27.74 19.39 15.47 27.74 27.84 29.65 13.07 17.50 21.55 13.07 16.36 18.90 0.94 0.20
XT* 30.10 19.60 14.92 30.10 28.65 29.58 14.43 17.13 19.69 14.43 16.37 17.97 2.20 3.27

LF-WikiTitles-500K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 39.00 20.66 14.55 39.00 28.40 26.80 13.91 13.38 13.75 13.91 14.63 15.88 11.18 1.98
Astec 44.40 24.69 17.49 44.40 33.43 31.72 18.31 18.25 18.56 18.31 19.57 21.09 15.01 13.50
AttentionXML 40.90 21.55 15.05 40.90 29.38 27.45 14.80 13.97 13.88 14.80 15.24 16.22 14.01 133.94
Bonsai* 40.97 22.30 15.66 40.97 30.35 28.65 16.58 16.34 16.40 16.58 17.60 18.85 1.63 2.03
DECAF 44.21 24.64 17.36 44.21 33.55 31.92 19.29 19.82 19.96 19.29 21.26 22.95 4.53 42.26
DiSMEC* 39.42 21.10 14.85 39.42 28.87 27.29 15.88 15.54 15.89 15.88 16.76 18.13 0.68 48.27
ECLARE 44.36 24.29 16.91 44.36 33.33 31.46 21.58 20.39 19.84 21.58 22.39 23.61 4.24 39.34
MACH 37.74 19.11 13.26 37.74 26.63 24.94 13.71 12.14 12.00 13.71 13.63 14.54 4.73 22.46
Parabel* 40.41 21.98 15.42 40.41 29.89 28.15 15.55 15.32 15.35 15.55 16.50 17.66 2.70 0.42
PfastreXML* 35.71 19.27 13.64 35.71 26.45 25.15 18.23 15.42 15.08 18.23 17.34 18.24 20.41 3.79
Slice+FastText* 25.48 15.06 10.98 25.48 20.67 20.52 13.90 13.33 13.82 13.90 14.50 15.90 2.30 0.74
XT* 38.13 20.71 14.66 38.13 28.13 26.61 14.10 14.12 14.38 14.10 15.15 16.40 3.10 14.67

LF-AmazonTitles-1.3M


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 47.79 41.65 36.91 47.79 44.83 42.93 15.42 19.67 21.91 15.42 18.05 19.36 14.53 2.48
Astec 48.82 42.62 38.44 48.82 46.11 44.80 21.47 25.41 27.86 21.47 24.08 25.66 26.66 18.54
AttentionXML 45.04 39.71 36.25 45.04 42.42 41.23 15.97 19.90 22.54 15.97 18.23 19.60 28.84 380.02
Bonsai* 47.87 42.19 38.34 47.87 45.47 44.35 18.48 23.06 25.95 18.48 21.52 23.33 9.02 7.89
DECAF 50.67 44.49 40.35 50.67 48.05 46.85 22.07 26.54 29.30 22.07 25.06 26.85 9.62 74.47
ECLARE 50.14 44.09 40.00 50.14 47.75 46.68 23.43 27.90 30.56 23.43 26.67 28.61 9.15 70.59
GalaXC 49.81 44.23 40.12 49.81 47.64 46.47 25.22 29.12 31.44 25.22 27.81 29.36 2.69 9.55
MACH 35.68 31.22 28.35 35.68 33.42 32.27 9.32 11.65 13.26 9.32 10.79 11.65 7.68 60.39
Parabel* 46.79 41.36 37.65 46.79 44.39 43.25 16.94 21.31 24.13 16.94 19.70 21.34 11.75 1.50
PfastreXML* 37.08 33.77 31.43 37.08 36.61 36.61 28.71 30.98 32.51 28.71 29.92 30.73 29.59 9.66
Slice* 34.80 30.58 27.71 34.80 32.72 31.69 13.96 17.08 19.14 13.96 15.83 16.97 5.98 0.79
XT* 40.60 35.74 32.01 40.60 38.18 36.68 13.67 17.11 19.06 13.67 15.64 16.65 7.90 82.18

WikiSeeAlsoTitles-350K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 14.96 10.20 8.11 14.96 14.20 14.76 5.63 7.04 8.59 5.63 6.79 7.76 3.59 0.20
Astec 20.61 14.58 11.49 20.61 20.08 20.80 9.91 12.16 14.04 9.91 11.76 12.98 7.41 4.36
AttentionXML 15.86 10.43 8.01 15.86 14.59 14.86 6.39 7.20 8.15 6.39 7.05 7.64 4.07 30.44
Bonsai* 17.95 12.27 9.56 17.95 17.13 17.66 8.16 9.68 11.07 8.16 9.49 10.43 0.25 0.46
DiSMEC* 16.61 11.57 9.14 16.61 16.09 16.72 7.48 9.19 10.74 7.48 8.95 9.99 0.09 6.62
MACH 14.79 9.57 7.13 14.79 13.83 14.05 6.45 7.02 7.54 6.45 7.20 7.73 5.22 7.44
Parabel* 17.24 11.61 8.92 17.24 16.31 16.67 7.56 8.83 9.96 7.56 8.68 9.45 0.43 0.06
PfastreXML* 15.09 10.49 8.24 15.09 14.98 15.59 9.03 9.69 10.64 9.03 9.82 10.52 5.22 0.51
SLICE+FastText* 18.13 12.87 10.29 18.13 17.71 18.52 8.63 10.78 12.74 8.63 10.37 11.63 0.97 0.22
XML-CNN 17.75 12.34 9.73 17.75 16.93 17.48 8.24 9.72 11.15 8.24 9.40 10.31 0.78 14.25
XT* 16.55 11.37 8.93 16.55 15.88 16.47 7.38 8.75 10.05 7.38 8.57 9.46 2.00 3.25

WikiTitles-500K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 39.56 20.50 14.32 39.56 28.28 26.54 15.44 13.83 13.79 15.44 15.49 16.58 10.70 1.77
Astec 46.60 26.03 18.50 46.60 35.10 33.34 18.89 18.90 19.30 18.89 20.33 22.00 15.15 13.04
AttentionXML 42.89 22.71 15.89 42.89 30.92 28.93 15.12 14.32 14.22 15.12 15.69 16.75 9.21 102.43
Bonsai* 42.60 23.08 16.25 42.60 31.34 29.58 17.38 16.85 16.90 17.38 18.28 19.62 1.18 2.94
DiSMEC* 39.89 21.23 14.96 39.89 28.97 27.32 15.89 15.15 15.43 15.89 16.52 17.86 0.35 23.94
MACH 33.74 15.62 10.41 33.74 22.61 20.80 11.43 8.98 8.35 11.43 10.77 11.28 10.48 23.65
Parabel* 42.50 23.04 16.21 42.50 31.24 29.45 16.55 16.12 16.16 16.55 17.49 18.77 2.15 0.34
PfastreXML* 30.99 18.07 13.09 30.99 24.54 23.88 17.87 15.40 15.15 17.87 17.38 18.46 16.85 3.07
SLICE+FastText* 28.07 16.78 12.28 28.07 22.97 22.87 15.10 14.69 15.33 15.10 16.02 17.67 1.50 0.54
XML-CNN 43.45 23.24 16.53 43.45 31.69 29.95 15.64 14.74 14.98 15.64 16.17 17.45 1.17 55.21
XT* 39.44 21.57 15.31 39.44 29.17 27.65 15.23 15.00 15.25 15.23 16.23 17.59 3.30 12.13

AmazonTitles-670K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 35.31 30.90 27.83 35.31 32.76 31.26 17.94 20.69 23.30 17.94 19.57 20.88 2.99 0.17
Astec 40.63 36.22 33.00 40.63 38.45 37.09 28.07 30.17 32.07 28.07 29.20 29.98 10.93 3.85
AttentionXML 37.92 33.73 30.57 37.92 35.78 34.35 24.24 26.43 28.39 24.24 25.48 26.33 12.11 37.50
Bonsai* 38.46 33.91 30.53 38.46 36.05 34.48 23.62 26.19 28.41 23.62 25.16 26.21 0.66 0.53
DiSMEC* 38.12 34.03 31.15 38.12 36.07 34.88 22.26 25.46 28.67 22.26 24.30 26.00 0.29 11.74
MACH 34.92 31.18 28.56 34.92 33.07 31.97 20.56 23.14 25.79 20.56 22.18 23.53 3.84 6.41
Parabel* 38.00 33.54 30.10 38.00 35.62 33.98 23.10 25.57 27.61 23.10 24.55 25.48 1.06 0.09
PfastreXML* 32.88 30.54 28.80 32.88 32.20 31.85 26.61 27.79 29.22 26.61 27.10 27.59 5.32 0.99
SLICE+FastText* 33.85 30.07 26.97 33.85 31.97 30.56 21.91 24.15 25.81 21.91 23.26 24.03 2.01 0.22
XML-CNN 35.02 31.37 28.45 35.02 33.24 31.94 21.99 24.93 26.84 21.99 23.83 24.67 1.36 23.52
XT* 36.57 32.73 29.79 36.57 34.64 33.35 22.11 24.81 27.18 22.11 23.73 24.87 4.00 4.65

AmazonTitles-3M


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 48.37 44.68 42.24 48.37 45.93 44.43 11.47 13.84 15.72 11.47 13.02 14.15 10.23 1.68
Astec 48.74 45.70 43.31 48.74 46.96 45.67 16.10 18.89 20.94 16.10 18.00 19.33 40.60 13.04
AttentionXML 46.00 42.81 40.59 46.00 43.94 42.61 12.81 15.03 16.71 12.80 14.23 15.25 44.40 273.10
Bonsai* 46.89 44.38 42.30 46.89 45.46 44.35 13.78 16.66 18.75 13.78 15.75 17.10 9.53 9.90
MACH 37.10 33.57 31.33 37.10 34.67 33.17 7.51 8.61 9.46 7.51 8.23 8.76 9.77 40.48
Parabel* 46.42 43.81 41.71 46.42 44.86 43.70 12.94 15.58 17.55 12.94 14.70 15.94 13.20 1.54
PfastreXML* 31.16 31.35 31.10 31.16 31.78 32.08 22.37 24.59 26.16 22.37 23.72 24.65 22.97 10.47
SLICE+FastText* 35.39 33.33 31.74 35.39 34.12 33.21 11.32 13.37 14.94 11.32 12.65 13.61 12.22 0.64
XT* 27.99 25.24 23.57 27.99 25.98 24.78 4.45 5.06 5.57 4.45 4.78 5.03 16.00 15.80

Wikipedia-500K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 64.64 43.20 32.77 64.64 54.54 52.42 26.88 30.24 32.79 26.88 30.71 33.33 48.32 15.50
APLC-XLNet 72.83 50.50 38.55 72.83 62.06 59.27 30.03 35.25 38.27 30.03 35.01 37.86 1.40 -
Astec 73.02 52.02 40.53 73.02 64.10 62.32 30.69 36.48 40.38 30.69 36.33 39.84 28.06 20.35
AttentionXML 82.73 63.75 50.41 82.73 76.56 74.86 34.00 44.32 50.15 34.00 42.99 47.69 9.30 110.60
Bonsai* 69.20 49.80 38.80 - - - - - - - - - - -
DiSMEC* 70.20 50.60 39.70 70.20 42.10 40.50 31.20 33.40 37.00 31.20 33.70 37.10 - -
Parabel* 68.70 49.57 38.64 68.70 60.51 58.62 26.88 31.96 35.26 26.88 31.73 34.61 5.65 2.72
PfastreXML* 59.50 40.20 30.70 59.50 30.10 28.70 29.20 27.60 27.70 29.20 28.70 28.30 - 63.59
ProXML* 68.80 48.90 37.90 68.80 39.10 38.00 33.10 35.00 39.40 33.10 35.20 39.00 - -
X-Transformer 76.95 58.42 46.14 - - - - - - - - - - -
XML-CNN 59.85 39.28 29.81 59.85 48.67 46.12 - - - - - - - 117.23
XT* 64.48 45.84 35.46 - - - - - - - - - 5.50 20.88

Amazon-670K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 42.39 36.89 32.98 42.39 39.07 37.04 21.56 24.78 27.66 21.56 23.38 24.76 50.00 1.56
APLC-XLNet 43.46 38.83 35.32 43.46 41.01 39.38 26.12 29.66 32.78 26.12 28.20 29.68 1.1 -
Astec 47.77 42.79 39.10 47.77 45.28 43.74 32.13 35.14 37.82 32.13 33.80 35.01 18.79 7.32
AttentionXML 47.58 42.61 38.92 47.58 45.07 43.50 30.29 33.85 37.13 - - - 16.56 78.30
Bonsai* 45.58 40.39 36.60 45.58 42.79 41.05 27.08 30.79 34.11 - - - - -
DiSMEC* 44.70 39.70 36.10 44.70 42.10 40.50 27.80 30.60 34.20 27.80 28.80 30.70 3.75 56.02
FastXML* 36.99 33.28 30.53 36.99 35.11 33.86 19.37 23.26 26.85 19.37 22.25 24.69 - -
LEML* 8.13 6.83 6.03 8.13 7.30 6.85 2.07 2.26 2.47 2.07 2.21 2.35 - -
LPSR-NB* 28.65 24.88 22.37 28.65 26.40 25.03 16.68 18.07 19.43 16.68 17.70 18.63 - -
LightXML 49.10 43.83 39.85 - - - - - - - - - 4.59 86.25
PPD-Sparse* 45.32 40.37 36.92 - - - 26.64 30.65 34.65 - - - - -
Parabel* 44.89 39.80 36.00 44.89 42.14 40.36 25.43 29.43 32.85 25.43 28.38 30.71 2.41 0.41
PfastreXML* 39.46 35.81 33.05 39.46 37.78 36.69 29.30 30.80 32.43 29.30 30.40 31.49 - -
ProXML* 43.50 38.70 35.30 43.50 41.10 39.70 30.80 32.80 35.10 30.80 31.70 32.70 - -
SLEEC* 35.05 31.25 28.56 34.77 32.74 31.53 20.62 23.32 25.98 20.62 22.63 24.43 - -
SLICE+FastText* 33.15 29.76 26.93 33.15 31.51 30.27 20.20 22.69 24.70 20.20 21.71 22.72 2.01 0.21
XML-CNN 35.39 31.93 29.32 35.39 33.74 32.64 28.67 33.27 36.51 - - - - 52.23
XT* 42.50 37.87 34.41 42.50 40.01 38.43 24.82 28.20 31.24 24.82 26.82 28.29 4.20 8.22

Amazon-3M


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 49.30 45.55 43.11 49.30 46.79 45.27 11.69 14.07 15.98 - - - - -
AttentionXML 50.86 48.04 45.83 50.86 49.16 47.94 15.52 18.45 20.60 - - - - -
Bonsai* 48.45 45.65 43.49 48.45 46.78 45.59 13.79 16.71 18.87 - - - - -
DiSMEC* 47.34 44.96 42.80 47.36 - - - - - - - - - -
FastXML* 44.24 40.83 38.59 44.24 41.92 40.47 9.77 11.69 13.25 9.77 11.20 12.29 - -
Parabel* 47.48 44.65 42.53 47.48 45.73 44.53 12.82 15.61 17.73 12.82 14.89 16.38 - -
PfastreXML* 43.83 41.81 40.09 43.83 42.68 41.75 21.38 23.22 24.52 21.38 22.75 23.68 - -

Wiki10-31K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 86.49 74.27 64.20 86.49 77.13 69.44 11.90 12.76 13.58 11.90 12.53 13.10 0.62 0.39
APLC-XLNet 89.44 78.93 69.73 89.44 81.38 74.41 14.84 15.85 17.04 14.84 15.58 16.40 0.54 -
AttentionXML 87.47 78.48 69.37 87.47 80.61 73.79 15.57 16.80 17.82 - - - - -
Bonsai* 84.69 73.69 64.39 84.69 76.25 69.17 11.78 13.27 14.28 11.78 12.89 13.61 0.13 0.64
DiSMEC* 85.20 74.60 65.90 84.10 77.10 70.40 13.60 13.10 13.80 13.60 13.20 13.60 - -
FastXML* 83.03 67.47 57.76 84.31 75.35 63.36 9.80 10.17 10.54 9.80 10.08 10.33 - -
LEML* 73.47 62.43 54.35 73.47 64.92 58.69 9.41 10.07 10.55 9.41 9.90 10.24 - -
LPSR-NB* 72.72 58.51 49.50 72.72 61.71 54.63 12.79 12.26 12.13 12.79 12.38 12.27 - -
LightXML 89.45 78.96 69.85 - - - - - - - - - - -
Parabel* 84.17 72.46 63.37 84.17 75.22 68.22 11.68 12.73 13.69 11.68 12.47 13.14 0.18 0.20
PfastreXML* 83.57 68.61 59.10 83.57 72.00 64.54 19.02 18.34 18.43 19.02 18.49 18.52 - -
SLEEC* 85.88 72.98 62.70 85.88 76.02 68.13 11.14 11.86 12.40 11.14 11.68 12.06 1.13 0.21
XML-CNN 81.42 66.23 56.11 81.42 69.78 61.83 9.39 10.00 10.20 - - - - -
XT* 86.15 75.18 65.41 86.15 77.76 70.35 11.87 13.08 13.89 11.87 12.78 13.36 0.37 0.39
XTransformer 88.51 78.71 69.62 - - - - - - - - - - -

AmazonCat-13K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 93.54 78.37 63.30 93.54 87.29 85.10 49.04 61.13 69.64 49.04 58.83 65.47 18.61 3.45
APLC-XLNet 94.56 79.82 64.61 94.56 88.74 86.66 52.22 65.08 71.40 52.22 62.57 67.92 0.50 -
AttentionXML 95.92 82.41 67.31 95.92 91.17 89.48 53.76 68.72 76.38 - - - - -
Bonsai* 92.98 79.13 64.46 92.98 87.68 85.92 51.30 64.60 72.48 - - - 0.55 1.26
DiSMEC* 93.40 79.10 64.10 93.40 87.70 85.80 59.10 67.10 71.20 59.10 65.20 68.80 - -
FastXML* 93.11 78.20 63.41 93.11 87.07 85.16 48.31 60.26 69.30 48.31 56.90 62.75 - -
LightXML 96.77 84.02 68.70 - - - - - - - - - - -
PD-Sparse* 90.60 75.14 60.69 90.60 84.00 82.05 49.58 61.63 68.23 49.58 58.28 62.68 - -
Parabel* 93.03 79.16 64.52 93.03 87.72 86.00 50.93 64.00 72.08 50.93 60.37 65.68 0.62 0.63
PfastreXML* 91.75 77.97 63.68 91.75 86.48 84.96 69.52 73.22 75.48 69.52 72.21 73.67 19.02 5.69
SLEEC* 90.53 76.33 61.52 90.53 84.96 82.77 46.75 58.46 65.96 46.75 55.19 60.08 - -
XML-CNN 93.26 77.06 61.40 93.26 86.20 83.43 52.42 62.83 67.10 - - - - -
XT* 92.59 78.24 63.58 92.59 86.90 85.03 49.61 62.22 70.24 49.61 59.71 66.04 0.46 7.14
XTransformer 96.70 83.85 68.58 - - - - - - - - - - -

EURLex-4K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 79.26 64.30 52.33 79.26 68.13 61.60 34.25 39.83 42.76 34.25 38.35 40.30 0.09 0.06
APLC-XLNet 87.72 74.56 62.28 87.72 77.90 71.75 42.93 49.84 53.07 42.93 48.00 50.40 0.48 -
Bonsai* 82.96 69.76 58.31 82.96 73.15 67.41 37.08 45.13 49.57 37.08 42.94 46.10 0.02 0.03
CPLST* 58.52 45.51 32.47 58.52 48.67 40.79 24.97 27.46 25.04 24.97 26.82 25.57 - -
CS* 62.09 48.39 40.11 62.09 51.63 47.11 24.94 27.19 28.90 25.94 26.56 27.67 - -
DiSMEC* 82.40 68.50 57.70 82.40 72.50 66.70 41.20 45.40 49.30 41.20 44.30 46.90 - -
FastXML* 76.37 63.36 52.03 76.37 66.63 60.61 33.17 39.68 41.99 33.17 37.92 39.55 0.26 0.07
LEML* 68.55 55.11 45.12 68.55 58.44 53.03 31.16 34.85 36.82 31.16 33.85 35.17 - -
LPSR* 79.89 66.01 53.80 79.89 69.62 63.04 37.97 44.01 46.17 37.97 42.44 43.97 - -
ML-CSSP* 75.45 62.70 52.51 75.45 65.97 60.78 43.86 45.72 46.97 43.86 45.23 46.03 - -
PD-Sparse* 83.83 70.72 59.21 - - - 37.61 46.05 50.79 - - - - -
PPD-Sparse* 83.40 70.90 59.10 83.40 74.40 68.20 45.20 48.50 51.00 45.20 47.50 49.10 - -
Parabel* 82.25 68.71 57.53 82.25 72.17 66.54 36.44 44.08 48.46 36.44 41.99 44.91 0.03 0.02
PfastreXML* 71.36 59.90 50.39 71.36 62.87 58.06 26.62 34.16 38.96 26.62 32.07 35.23 - -
SLEEC* 63.40 50.35 41.28 63.40 53.56 48.47 24.10 27.20 29.09 24.10 26.37 27.62 - -
WSABIE* 72.28 58.16 47.73 72.28 61.64 55.92 28.60 32.49 34.46 28.60 31.45 32.77 - -
XT* 78.97 65.64 54.44 78.97 69.05 63.23 33.52 40.35 44.02 33.52 38.50 41.09 0.03 0.10
kNN* 81.73 68.78 57.44 81.73 72.15 66.40 36.36 44.04 48.29 36.36 41.95 44.78 - -

Note: Given the diversity of architecutures used by deep learning methods, for example CPU-only or CPU-GPU methods, the symbols *, †, and ‡ have been used to specify machine configuration used for each method (see legend below). Certain methods for example AttentionXML and the X-Transformer could not be run on a single GPU out-of-the-box, so they were run on a cluster with 8 GPUs and training times were scaled accordingly. Moreover, for certain methods (marked with a ♦ symbol), results are currently reported as mentioned in their respective publications. Results reproduced from independent executions should be available for these methods shortly.

Legend:

Appendix

WikiLSHTC-325K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 63.30 40.64 29.80 63.30 56.61 56.24 25.13 30.46 34.30 25.13 31.16 34.36 29.70 4.24
Bonsai* 66.41 44.40 32.92 66.41 60.69 60.53 28.11 35.36 39.73 28.11 35.42 38.94 2.43 3.04
DiSMEC* 64.40 42.50 31.50 64.40 58.50 58.40 29.10 35.60 39.50 29.10 35.90 39.40 - -
FastXML* 49.75 33.10 24.45 49.75 45.23 44.75 16.35 20.99 23.56 16.35 19.56 21.02 - -
LEML* 19.82 11.43 8.39 19.82 14.52 13.73 3.48 3.79 4.27 3.48 3.68 3.94 - -
LPSR-NB 27.44 16.23 11.77 27.44 23.04 22.55 6.93 7.21 7.86 6.93 7.11 7.46 - -
PD-Sparse* 61.26 39.48 28.79 61.26 55.08 54.67 28.34 33.50 36.62 28.34 31.92 33.68 - -
PPD-Sparse* 64.08 41.26 30.12 - - - 27.47 33.00 36.29 - - - - -
Parabel* 65.04 43.23 32.05 65.04 59.15 58.93 26.76 33.27 37.36 26.76 31.26 33.57 3.10 0.75
PfastreXML* 56.05 36.79 27.09 56.05 50.59 50.13 30.66 31.55 33.12 30.66 31.24 32.09 14.23 6.34
ProXML* 63.60 41.50 30.80 63.80 57.40 57.10 34.80 37.70 41.00 34.80 38.70 41.50 - -
SLEEC* 54.83 33.42 23.85 54.83 47.25 46.16 20.27 23.18 25.08 20.27 22.27 23.35 - -
XT* 56.54 37.17 27.73 56.54 50.48 50.36 20.56 25.42 28.90 20.56 25.30 27.90 4.50 1.89

Delicious-200K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 46.79 40.72 37.67 46.79 42.17 39.84 7.18 8.05 8.74 7.18 7.78 8.22 10.74 2.58
Bonsai* 46.69 39.88 36.38 46.69 41.51 38.84 7.26 7.97 8.53 7.26 7.75 8.10 3.91 64.42
DiSMEC* 45.50 38.70 35.50 45.50 40.90 37.80 6.50 7.60 8.40 6.50 7.50 7.90 - -
FastXML* 43.07 38.66 36.19 43.07 39.70 37.83 6.48 7.52 8.31 6.51 7.26 7.79 -
LEML* 40.73 37.71 35.84 40.73 38.44 37.01 6.06 7.24 8.10 6.06 6.93 7.52 - -
LPSR-NB 18.59 15.43 14.07 18.59 16.17 15.13 3.24 3.42 3.64 3.24 3.37 3.52 - -
PD-Sparse* 34.37 29.48 27.04 34.37 30.60 28.65 5.29 5.80 6.24 5.29 5.66 5.96 - -
PPD-Sparse* - - - - - - - - - - - - - -
Parabel* 46.86 40.08 36.70 46.86 41.69 39.10 7.22 7.94 8.54 7.22 7.71 8.09 6.36 9.58
Parabel* 46.97 40.08 36.63 46.97 41.72 39.07 7.25 7.94 8.52 7.25 7.75 8.15 - -
PfastreXML* 41.72 37.83 35.58 41.72 38.76 37.08 3.15 3.87 4.43 3.15 3.68 4.06 15.34 3.60
SLEEC* 47.85 42.21 39.43 47.85 43.52 41.37 7.17 8.16 8.96 7.17 7.89 8.44 - -
XT* 45.59 39.10 35.92 45.59 40.62 38.17 6.96 7.71 8.33 6.96 7.47 7.86 2.70 31.22

Delicious


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* - - - - - - - - - - - - - -
CPLST* 65.31 59.95 55.31 65.31 61.16 57.80 31.10 32.40 33.02 31.10 32.07 32.55 - -
CS* 61.36 56.46 52.07 61.36 57.66 54.44 30.60 31.84 32.26 30.60 31.54 31.89 - -
DiSMEC* - - - - - - - - - - - - - -
FastXML* 69.61 64.12 59.27 69.61 65.47 61.90 32.35 34.51 35.43 32.35 34.00 34.73 - -
LEML* 65.67 60.55 56.08 65.67 61.77 58.47 30.73 32.43 33.26 30.73 32.01 32.66 - -
LPSR* 65.01 58.96 53.49 65.01 60.45 56.38 31.34 32.57 32.77 31.34 32.29 32.50 - -
ML-CSSP 63.04 56.26 50.16 63.04 57.91 53.36 29.48 30.27 30.02 29.48 30.10 29.98 - -
PD-Sparse* 51.82 44.18 38.95 51.82 46.00 42.02 25.22 24.63 23.85 25.22 24.80 24.25 - -
Parabel* 67.44 61.83 56.75 67.44 63.15 59.41 32.69 34.00 34.53 32.69 33.69 34.10 - -
PfastreXML* 67.13 62.33 58.62 67.13 63.48 60.74 34.57 34.80 35.86 34.57 34.71 35.42 - -
SLEEC* 67.59 61.38 56.56 67.59 62.87 59.28 32.11 33.21 33.83 32.11 32.93 33.41 - -
WSABIE 64.13 58.13 53.64 64.13 59.59 56.25 31.25 32.02 32.47 31.25 31.84 32.18 - -
kNN* 64.95 58.89 54.11 64.95 60.32 56.77 31.03 32.02 32.43 31.03 31.76 32.09 - -

Bibtex


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

1-vs-All 62.62 39.09 28.79 62.62 59.13 61.58 48.84 52.96 59.29 48.84 51.62 55.09 - -
CPLST* 62.38 37.84 27.62 62.38 57.63 59.71 48.17 50.86 56.42 48.17 49.94 52.96 - -
CS* 58.87 33.53 23.72 58.87 52.19 53.25 46.04 45.08 48.17 46.04 45.25 46.89 - -
DiSMEC* - - - - - - - - - - - - - -
FastXML* 63.42 39.23 28.86 63.42 59.51 61.70 48.54 52.30 58.28 48.54 51.11 54.38 - -
LEML* 62.54 38.41 28.21 62.54 58.22 60.53 47.97 51.42 57.53 47.97 50.25 53.59 - -
LPSR* 62.11 36.65 26.53 62.11 56.50 58.23 49.20 50.14 55.01 49.20 49.78 52.41 - -
ML-CSSP 44.98 30.43 23.53 44.98 44.67 47.97 32.38 38.68 45.96 32.38 36.73 40.74 - -
PD-Sparse* 61.29 35.82 25.74 61.29 55.83 57.35 48.34 48.77 52.93 48.34 48.49 50.72 - -
PPD-Sparse* - - - - - - - - - - - - - -
Parabel* 64.53 38.56 27.94 64.53 59.35 61.06 50.88 52.42 57.36 50.88 51.90 54.58 - -
PfastreXML* 63.46 39.22 29.14 63.46 59.61 62.12 52.28 54.36 60.55 52.28 53.62 56.99 - -
ProXML* 64.60 39.00 28.20 64.40 59.20 61.50 50.10 52.00 58.30 50.10 52.00 55.10 - -
SLEEC* 65.08 39.64 28.87 65.08 60.47 62.64 51.12 53.95 59.56 51.12 52.99 56.04 - -
WSABIE 54.78 32.39 23.98 54.78 50.11 52.39 43.39 44.00 49.30 43.39 43.64 46.50 - -
kNN* 57.04 34.38 25.44 57.04 52.29 54.64 43.71 45.82 51.64 43.71 45.04 48.20 - -

Mediamill


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 87.82 73.45 59.17 87.82 81.50 79.22 70.14 72.76 74.02 70.14 72.31 73.13 - -
CPLST* 83.82 67.32 52.80 83.82 75.29 71.92 66.23 65.28 63.70 66.23 65.89 64.77 - -
CS* 78.95 60.93 44.27 78.95 68.97 62.88 62.53 58.97 53.23 62.53 60.33 56.50 - -
DiSMEC* 81.86 62.52 45.11 81.86 70.21 63.71 62.23 59.85 54.03 62.25 61.05 57.26 - -
FastXML* 83.57 65.78 49.97 83.57 74.06 69.34 66.06 63.83 61.11 66.06 64.83 62.94 - -
LEML* 81.29 64.74 49.83 81.29 72.92 69.37 64.24 62.73 59.92 64.24 63.47 61.57 - -
LPSR* 83.57 65.50 48.57 83.57 73.84 68.18 66.06 63.53 59.38 66.06 64.63 61.84 - -
ML-CSSP 83.98 67.37 53.02 83.98 75.31 72.21 66.88 65.90 64.90 66.88 66.47 65.71 - -
PD-Sparse* - - - - - - - - - - - - - -
PPD-Sparse* 86.50 68.40 53.20 86.50 77.30 75.60 64.30 61.30 60.80 64.30 63.60 62.80 - -
Parabel* - - - - - - - - - - - - - -
PfastreXML* 84.22 67.33 53.04 84.22 75.41 72.37 66.67 65.43 64.30 66.08 66.08 65.24 - -
SLEEC* 84.01 67.20 52.80 84.01 75.23 71.96 66.34 65.11 63.62 66.34 65.79 64.71 - -
WSABIE 83.35 66.18 51.46 83.35 74.21 70.55 65.79 64.07 61.89 65.79 64.88 63.36 - -
kNN* 83.91 67.12 52.99 83.91 75.22 72.21 66.51 65.21 64.30 66.51 65.91 65.20 - -

References

[01] K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain, Sparse Local Embeddings for Extreme Multi-label Classification, in NeurIPS 2015.

[02] R. Agrawal, A. Gupta, Y. Prabhu and M. Varma, Multi-Label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages, in WWW 2013.

[03] Y. Prabhu and M. Varma, FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning, in KDD 2014.

[04] J. Weston, A. Makadia, and H. Yee, Label Partitioning For Sublinear Ranking, in ICML 2013.

[05] H. Yu, P. Jain, P. Kar, and I. Dhillon, Large-scale Multi-label Learning with Missing Labels, in ICML 2014.

[06] D. Hsu, S. Kakade, J. Langford, and T. Zhang, Multi-Label Prediction via Compressed Sensing, in NeurIPS 2009.

[07] F. Tai, and H. Lin, Multi-label Classification with Principle Label Space Transformation , in Neural Computation,2012.

[08] W. Bi, and J. Kwok, Efficient Multi-label Classification with Many Labels , in ICML, 2013.

[09] Y. Chen, and H. Lin, Feature-aware Label Space Dimension Reduction for Multi-label Classification , in NeurIPS, 2012.

[10] C. Ferng, and H. Lin, Multi-label Classification with Error-correcting Codes, in ACML, 2011.

[11] J. Weston, S. Bengio, and N. Usunier, WSABIE: Scaling Up To Large Vocabulary Image Annotation , in IJCAI, 2011.

[12] S. Ji, L. Tang, S. Yu, and J. Ye, Extracting Shared Subspaces for Multi-label Classification , in KDD, 2008.

[13] Z. Lin, G. Ding, M. Hu, and J. Wang, Multi-label Classification via Feature-aware Implicit Label Space Encoding , in ICML, 2014.

[14] P. Mineiro, and N. Karampatziakis, Fast Label Embeddings via Randomized Linear Algebra, Preprint, 2015.

[15] N. Karampatziakis, and P. Mineiro, Scalable Multilabel Prediction via Randomized Methods, Preprint, 2015.

[16] K. Balasubramanian, and G. Lebanon, The Landmark Selection Method for Multiple Output Prediction, Preprint, 2012.

[17] M. Cisse, N. Usunier, T. Artieres, and P. Gallinari, Robust Bloom Filters for Large Multilabel Classification Tasks , in NIPS, 2013.

[18] B. Hariharan, S. Vishwanathan, and M. Varma, Efficient max-margin multi-label classification with applications to zero-shot learning, in Machine Learning Journal, 2012.

[19] C. Snoek, M. Worring, J. van Gemert, J.-M. Geusebroek, and A. Smeulders, The challenge problem for automated detection of 101 semantic concepts in multimedia, in ACM Multimedia, 2006.

[20] I. Katakis, G. Tsoumakas, and I. Vlahavas, Multilabel text classification for automated tag suggestion, in ECML/PKDD Discovery Challenge, 2008.

[21] G. Tsoumakas, I. Katakis, and I. Vlahavas, Effective and efficient multilabel classification in domains with large number of labels, in ECML/PKDD 2008 Workshop on Mining Multidimensional Data, 2008.

[22] J. Leskovec and A. Krevl, SNAP Datasets: Stanford large network dataset collection, 2014.

[23] A. Zubiaga, Enhancing navigation on wikipedia with social tags, Preprint, 2009.

[24] R. Wetzker, C. Zimmermann, and C. Bauckhage, Analyzing social bookmarking systems: A del.icio.us cookbook, in Mining Social Data (MSoDa) Workshop Proceedings, ECAI, 2008.

[25] I. Partalas, A Kosmopoulos, N Baskiotis, T Artieres, G Paliouras, E Gaussier, I Androutsopoulos, M.-R. Amini and P Galinari, LSHTC: A Benchmark for Large-Scale Text Classification, Preprint , 2015

[26] D. D. Lewis, Y. Yang, T. Rose, and F. Li, RCV1: A New Benchmark Collection for Text Categorization Research in JMLR, 2004.

[27] E. L. Mencia, and J. Furnkranz, Efficient pairwise multilabel classification for large-scale problems in the legal domain in ECML/PKDD, 2008.

[28] J. McAuley, and J. Leskovec, Hidden factors and hidden topics: understanding rating dimensions with review text in Proceedings of the 7th ACM conference on Recommender systems ACM, 2013.

[29] J. McAuley, C. Targett, Q. Shi, and A. v. d. Hengel, Image-based Recommendations on Styles and Substitutes in International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015.

[30] J. McAuley, R. Pandey, and J. Leskovec, Inferring networks of substitutable and complementary products in KDD, 2015.

[31] H. Jain, Y. Prabhu, and M. Varma, Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications in KDD, 2016.

[32] R. Babbar, and B. Schölkopf, DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification in WSDM, 2017.

[33] I. E. H. Yen, X. Huang, K. Zhong, P. Ravikumar and I. S. Dhillon, PD-Sparse: A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification in ICML, 2016.

[34] I. E. H. Yen, X. Huang, W. Dai, P. Ravikumar I. S. Dhillon and E.-P. Xing, PPDSparse: A Parallel Primal-Dual Sparse Method for Extreme Classification in KDD, 2017.

[35] K. Jasinska, K. Dembczynski, R. Busa-Fekete, K. Pfannschmidt, T. Klerx and E. Hullermeier, Extreme F-Measure Maximization using Sparse Probability Estimates in ICML, 2017.

[36] J. Liu, W-C. Chang, Y. Wu and Y. Yang, Deep Learning for Extreme Multi-label Text Classification in SIGIR, 2017.

[37] Y. Jernite, A. Choromanska, D. Sontag, Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation in ICML, 2017.

[38] Y. Prabhu, A. Kag, S. Harsola, R. Agrawal and M. Varma, Parabel: Partitioned Label Trees for Extreme Classification with Application to Dynamic Search Advertising in WWW, 2018.

[39] I. Evron, E. Moroshko and K. Crammer, Efficient Loss-Based Decoding on Graphs for Extreme Classification in NeurIPS, 2018.

[40] A. Niculescu-Mizil and E. Abbasnejad, Label Filters for Large Scale Multilabel Classification in AISTATS, 2017.

[41] H. Jain, V. Balasubramanian, B. Chunduri and M. Varma, Slice: Scalable linear extreme classifiers trained on 100 million labels for related searches, in WSDM 2019.

[42] A. Jalan, P. Kar, Accelerating Extreme Classification via Adaptive Feature Agglomeration, in IJCAI 2019.

[43] R. Babbar, and B. Schölkopf, Data Scarcity, Robustness and Extreme Multi-label Classification in Machine Learning Journal and European Conference on Machine Learning, 2019.

[44] S. Khandagale, H. Xiao and R. Babbar, Bonsai - Diverse and Shallow Trees for Extreme Multi-label Classification, in ArXiv 2019.

[45] W. Siblini, F. Meyer and P. Kuntz, CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning, in ICML 2018.

[46] V. Gupta, R. Wadbude, N. Natarajan, H. Karnick, P. Jain and P. Rai, Distributional Semantics meets Multi-Label Learning, in AAAI 2019.

[47] I. Chalkidis, E. Fergadiotis, P. Malakasiotis, N. Aletras and I. Androutsopoulos, Extreme Multi-Label Legal Text Classification: A Case Study in EU Legislation, in Natural Legal Language Processing Workshop 2019.

[47b] I. Chalkidis, E. Fergadiotis, P. Malakasiotis, N. Aletras and I. Androutsopoulos, EURLEX57K Dataset.

[48] I. Chalkidis, E. Fergadiotis, P. Malakasiotis, and I. Androutsopoulos, Large-Scale Multi-Label Text Classification on EU Legislation, in ACL 2019.

[49] G. Tsoumakas, E. Spyromitros-Xioufis, J. Vilcek and I. Vlahavas, Mulan: A Java Library for Multi-Label Learning, in JMLR 2011.

[50] A. Jalan and P. Kar, Accelerating Extreme Classification via Adaptive Feature Agglomeration, in IJCAI 2019.

[51] R. You, S. Dai, Z. Zhang, H. Mamitsuka, and S. Zhu, AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Network, in NeurIPS 2019.

[52] T. K. R. Medini, Q. Huang, Y. Wang, V. Mohan, and A. Shrivastava, Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products, in NeurIPS 2019.

[53] W-C. Chang, H.-F. Yu, K. Zhong, Y. Yang, and I. Dhillon, Taming Pretrained Transformers for Extreme Multi-label Text Classification, in KDD 2020.

[54] T. Jiang, D. Wang, L. Sun, H. Yang, Z. Zhao, F. Zhuang, LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification, in AAAI 2021.

[55] K. Dahiya, D. Saini, A. Mittal, A. Shaw, K. Dave, A. Soni, H. Jain, S. Agarwal and M. Varma, DeepXML: A deep extreme multi-Label learning framework applied to short text documents, in WSDM 2021.

[56] A. Mittal, K. Dahiya, S. Agrawal, D. Saini, S. Agarwal, P. Kar and M. Varma, DECAF: Deep extreme classification with label features, in WSDM 2021.

[57] A. Mittal, N. Sachdeva, S. Agrawal, S. Agarwal, P. Kar and M. Varma, ECLARE: Extreme classification with label graph correlations, in TheWebConf 2021.

[58] D. Saini, A. K. Jain, K. Dave, J. Jiao, A. Singh, R. Zhang and M. Varma, GalaXC: Graph neural networks with labelwise attention for extreme classification, in TheWebConf 2021.

[59] H. Ye, Z. Chen, D.-H. Wang, B.-D. Davison, Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification, in ICML 2020.

[60] Y. Prabhu, A. Kag, S. Gopinath, K. Dahiya, S. Harsola, R. Agrawal and M. Varma, Extreme multi-label learning with label features for warm-start tagging, ranking and recommendation in WSDM, 2018.

[61] M. Qaraei, E. Schultheis, P. Gupta, and R. Babbar, Convex Surrogates for Unbiased Loss Functions in Extreme Classification With Missing Labels in TheWebConf , 2021.

[62] N. Gupta, S. Bohra, Y. Prabhu, S. Purohit and M. Varma, Generalized zero-Shot extreme multi-label learning, in KDD 2021.

[63] K. Dahiya, A. Agarwal, D. Saini, K. Gururaj, J. Jiao, A. Singh, S. Agarwal, P. Kar and M. Varma, SiameseXML: Siamese networks meet extreme classifiers with 100M labels, in ICML 2021.