The Extreme Classification Repository: Multi-label Datasets & Code


Kush BhatiaKunal DahiyaHimanshu JainPurushottam KarAnshul Mittal Yashoteja Prabhu Manik Varma

The objective of extreme multi-label classification (XC) is to learn feature architectures and classifiers that can automatically tag a data point with the most relevant subset of labels from an extremely large label set. This repository provides resources, including XC datasets, code for leading XC methods and metrics to evaluate the performance of XC algorithms.

Citing the Repository

Please use the following citation if you use any of the datasets or results provided on this repository.

        @Misc{Bhatia16,
          author    = {Bhatia, K. and Dahiya, K. and Jain, H. and Kar, P. and Mittal, A. and Prabhu, Y. and Varma, M.},
          title     = {The extreme classification repository: Multi-label datasets and code},
          url       = {http://manikvarma.org/downloads/XC/XMLRepository.html},
          year      = {2016}
        }
        

Table of Contents

  1. Datasets
  2. Useful Tools
  3. Performance Metrics and Evaluation Protocols
  4. Code for XC Methods
  5. Benchmarked Results
  6. Appendix
  7. References

Datasets

The datasets below consider various XC problems in webpage categorization, related webpage recommendation and product-to-product recommendation tasks. These include multi-modal datasets and datasets where labels have textual features. The dataset file format information can be found in the README file available [here]. Python and Matlab scripts for reading the datasets have been provided [below].

Please get in touch with Manik Varma if you would like to contribute a dataset.

Naming Conventions

  1. Number of labels: The (rounded off) number of labels in the dataset is appended to the dataset name to disambiguate various versions of datasets. Specific legacy datasets were renamed to ensure uniformity. The dataset previously referred to as DeliciousLarge was renamed to Delicious-200K and RCV1-X was renamed to RCV1-2K.
  2. Label features: Datasets that contain label features have the token "LF" prepended to their names. These are usually short textual descriptions of the labels.
  3. Multi-modal features: Datasets that contain multi-modal features have the token "MM" prepended to their names. These usually correspond to short textual descriptions and one or more images for each data point and label.
  4. Short-text datasets: Datasets with the phrase "Titles" in their names, such as AmazonTitles-670K, are short-text datasets whose data points are represented by a 3-5 word textual description such as the name of a product or title of a webpage. For full-text datasets such as Amazon-670K, data points are represented using a more detailed description. Short-text tasks abound in ranking and recommendation applications where data points are user queries or products/webpages represented using only their titles.
  5. Item-to-item datasets: Datasets with the phrase "SeeAlso" in their names correspond to tasks requiring related Wikipedia articles to be predicted for a given Wikipedia article.

Datasets with/without Label Features

Note that there exist pairs of datasets whose names are identical but for the "LF" prefix (e.g. LF-WikiSeeAlsoTitles-320K and WikiSeeAlsoTitles-350K)  but which contain a different number of labels and data points. The reason for this variation is that the raw dumps from which these datasets were curated often contained labels for which label features were unavailable or could not be reliably retrieved. Such labels could exist in the non-LF dataset but were excluded from the LF version. Such exclusions could also lead to specific data points having zero labels. Such data points were excluded from the dataset as well.
A special case in this respect is that of the Wikipedia-500K and LF-Wikipedia-500K datasets that are identical and have the same (number of) labels and data points. Wikipedia articles are the data points and Wikipedia categories are the labels for these datasets. As a convention, methods that do not use label features could choose to report their results on the Wikipedia-500K dataset whereas methods that do use label features could report results on the LF-Wikipedia-500K dataset. For this reason, these two datasets have not been released separately. The LF-Wikipedia-500K dataset has been released (see links below). Methods that wish to work on the Wikipedia-500K dataset can download the LF version and disregard the label features.

Multi-modal Datasets

The MM-AmazonTitles-300K dataset was created by taking raw data dumps and extracting all data points and labels for which a short textual description and at least one image was available. The images were resized to fit within a 128 x 128-pixel region and padded with white pixels in a centered manner to ensure a 1:1 aspect ratio. White padding was used since the natural background in most images was white. Subsequent operations such as tokenization, train-test split creation and reciprocal pair removal were done as explained below. The processed and unprocessed image sets are available upon request. To request, please download the dataset using the links given in the table above, inspect the README file in the download for terms of usage and fill out the form available [here]. Tables comparing various methods on the MM-AmazonTitles-300K dataset are not provided on this webpage since most multi-modal benchmarks are not XC methods and most XC methods work only with textual features and not multi-modal features. Instead, please refer to the publication [65] for benchmark comparisons.

Legacy Datasets

Benchmarked results on datasets formerly popular in XC research have shifted to the Appendix available [here]. Some of these datasets are tiny such as the Bibtex dataset with 159 labels. The raw sources can no longer be reliably traced for other datasets and only bag-of-words features are available. All such legacy datasets remain available using links in the dataset table below.


Dataset Download BoW Feature Number of Number of Number of Avg. Points Avg. Labels Original
Dimensionality Labels Train Points Test Points per Label per Point Source

Multi-modal Datasets
MM-AmazonTitles-300K BoW Features Raw text 40,000 303,296 586,781 260,536 15.73 8.13 [64]

Datasets with Label Features
LF-AmazonTitles-131K BoW Features Raw text 40,000 131,073 294,805 134,835 5.15 2.29 [28]
LF-Amazon-131K BoW Features Raw text 80,000 131,073 294,805 134,835 5.15 2.29 [28]
LF-WikiSeeAlsoTitles-320K BoW Features Raw text 40,000 312,330 693,082 177,515 4.67 2.11 -
LF-WikiSeeAlso-320K BoW Features Raw text 80,000 312,330 693,082 177,515 4.67 2.11 -
LF-WikiTitles-500K BoW Features Raw text 80,000 501,070 1,813,391 783,743 17.15 4.74 -
LF-Wikipedia-500K
BoW Features Raw text 2,381,304 501,070 1,813,391 783,743 24.75 4.77 -
LF-AmazonTitles-1.3M BoW Features Raw text 128,000 1,305,265 2,248,619 970,237 38.24 22.20 [29] + [30]

Datasets without Label Features
AmazonCat-13K BoW Features Raw text 203,882 13,330 1,186,239 306,782 448.57 5.04 [28]
AmazonCat-14K BoW Features Raw text 597,540 14,588 4,398,050 1,099,725 1330.1 3.53 [29] + [30]
WikiSeeAlsoTitles-350K BoW Features Raw text 91,414 352,072 629,418 162,491 5.24 2.33 -
WikiTitles-500K BoW Features Raw text 185,479 501,070 1,699,722 722,678 23.62 4.89 -
Wikipedia-500K (same as LF-Wikipedia-500K)
2,381,304 501,070 1,813,391 783,743 24.75 4.77 -
AmazonTitles-670K BoW Features Raw text 66,666 670,091 485,176 150,875 5.11 5.39 [28]
Amazon-670K BoW Features Raw text 135,909 670,091 490,449 153,025 3.99 5.45 [28]
AmazonTitles-3M BoW Features Raw text 165,431 2,812,281 1,712,536 739,665 31.55 36.18 [29] + [30]
Amazon-3M BoW Features Raw text 337,067 2,812,281 1,717,899 742,507 31.64 36.17 [29] + [30]

Legacy Datasets
Mediamill BoW Features 120 101 30,993 12,914 1902.15 4.38 [19]
Bibtex BoW Features 1,836 159 4,880 2,515 111.71 2.40 [20]
Delicious BoW Features 500 983 12,920 3,185 311.61 19.03 [21]
RCV1-2K BoW Features 47,236 2,456 623,847 155,962 1218.56 4.79 [26]
EURLex-4K BoW Features 5,000 3,993 15,539 3,809 25.73 5.31 [27] + [47]
EURLex-4.3K BoW Features 200,000 4,271 45,000 6,000 60.57 5.07 [47] + [48]
Wiki10-31K BoW Features 101,938 30,938 14,146 6,616 8.52 18.64 [23]
Delicious-200K BoW Features 782,585 205,443 196,606 100,095 72.29 75.54 [24]
WikiLSHTC-325K BoW Features 1,617,899 325,056 1,778,351 587,084 17.46 3.19 [25]

Dataset statistics & download

Tokenization

The table above allows downloading precomputed bag-of-words features or raw text. The tokenization used to create the bag-of-words representation may differ across datasets (e.g. whitespace-separated for legacy datasets vs. WordPiece for more recent datasets). It is recommended that additional experiments be conducted for XC methods that use a novel tokenizer to isolate improvements attributable to better tokenization rather than the architecture or learning algorithm. One way to accomplish this is to execute older XC methods with the novel tokenizer.

Split Creation

For each dataset, a single split is offered. Splits were not created randomly but instead in a way that ensured every label had at least one training point. This yielded more realistic train/test splits than uniform sampling which could have dropped several infrequently occurring and hard-to-classify labels from the test set. For example, on the WikiLSHTC-325K dataset, uniformly random split creation could lose ninety thousand of the hardest to classify labels from the test set whereas the adopted sampling procedure dropped only forty thousand labels from the test set.
Note: Results computed on the train/test splits provided on this page are not comparable to results computed on splits created using uniform sampling.

Reciprocal-pair Removal

For the "LF" datasets that concern related item prediction, additional care is required since introducing label features allowed "reciprocal pairs" to emerge. Specifically, these are pairs of items, say A and B, that are related to each other such that two distinct data points exist, with A appearing as a label for B in one data point and B appearing as a label for A in the other. Such pairs were removed from the ground truth in the test set to prevent algorithms from achieving artificially high scores by memorizing such pairs without learning anything meaningful. The recommended protocol for performing prediction while avoiding such reciprocal pairs using filter files provided with these datasets is described [here].

Useful Tools

The following resources provide several tools The above tools can be used to perform various useful operations including
  1. Reading and writing the datasets in the given file format
  2. Preprocessing raw text using various tokenizers to generate data point (and label) features, including bag-of-words features
  3. Evaluating various performance measures such as precision, nDCG and their propensity-scored counterparts (see [here] for details)

Performance Metrics and Evaluation Protocols

The benchmarked results below present comparative results of various algorithms with classification accuracy evaluated on several performance measures. The discussion below describes protocols for evaluating XC methods, especially in the presence of head/tail labels and reciprocal pairs (see [here]).

Performance at the Top

The precision$@k$ and nDCG$@k$ metrics are defined for a predicted score vector $\hat{\mathbf y} \in {\mathbb{R}}^{L}$ and ground truth label vector $\mathbf y \in \left\lbrace 0, 1 \right\rbrace^L$ as \[ \text{P}@k := \frac{1}{k} \sum_{l\in \text{rank}_k (\hat{\mathbf y})} \mathbf y_l \] \[ \text{DCG}@k := \sum_{l\in {\text{rank}}_k (\hat{\mathbf y})} \frac{\mathbf y_l}{\log(l+1)} \] \[ \text{nDCG}@k := \frac{{\text{DCG}}@k}{\sum_{l=1}^{\min(k, \|\mathbf y\|_0)} \frac{1}{\log(l+1)}}, \] where, $\text{rank}_k(\mathbf y)$ returns the $k$ largest indices of $\mathbf{y}$ ranked in descending order.

Propensity-scored Performance at the Top

For datasets that contain excessively popular labels (often referred to as "head" labels), high P@k may be achieved by simply predicting head labels repeatedly irrespective of their relevance to the data point. To check for such trivial behavior, it is recommended that XC methods also be evaluated with respect to propensity-scored counterparts of the precision$@k$ and nDCG$@k$ metrics (PSP$@k$ and PSnDCG$@k$) described below. \[ \text{PSP}@k := \frac{1}{k} \sum_{l\in \text{rank}_k (\hat{\mathbf y})} \frac{\mathbf y_l}{p_l} \] \[ \text{PSDCG}@k := \sum_{l\in {\text{rank}}_k (\hat{\mathbf y})} \frac{\mathbf y_l}{p_l\log(l+1)} \] \[ \text{PSnDCG}@k := \frac{{\text{PSDCG}}@k}{\sum_{l=1}^{k} \frac{1}{\log(l+1)}}, \] where $p_l$ is the propensity score for label $l$ which helps in making metrics unbiased [31] with respect to missing labels. Propensity-scored metrics place specific emphasis on performing well on tail labels and give feeble rewards for predicting popular or head labels. It is recommended that scripts provided [here] be used to compute propensity-scored metrics in order to be consistent with results reported below.

Removal of Reciprocal-pairs

As described [here], reciprocal pairs were removed from the ground truth in the test splits of the LF datasets to avoid trivial predictions from getting rewarded. However, these reciprocal pairs must now be removed from the test predictions of XC methods to avoid unnecessary penalization. It is recommended that filter files provided along with the datasets and the tools provided in the PyXCLib library linked [here] be used to evaluate XC methods on LF datasets. Although reciprocal pairs were not removed from the train splits, a separate filter file is provided for the train splits enumerating the reciprocal pairs therein so that methods that wish to eliminate them from train splits may do so. Note that these filter files are distinct from the ground truth files and only contain lists of reciprocal pairs.

Code for XC Methods

The following lists provide links to code for leading XC methods. The methods have been categorized based on the kind of classifier used (e.g. one-vs-all, trees, embeddings) for easy identification. Methods that learn deep representations for data points jointly with the classifier are included as a separate category.

Please contact Manik Varma if you would like us to provide a link to your code.

Benchmarked Results

The tables below provide benchmarked results for various XC methods on several datasets. Rows corresponding to XC methods that use deep-learnt features or label features in the LF datasets have been highlighted in light orange. Training times are reported on a single GPU except when noted otherwise for methods that necessarily require multiple GPUs to scale. The model sizes mentioned alongside XC methods are either as reported else on-disk sizes subject to compression. Notably, executions using different platforms/libraries may introduce variance in model sizes and affect reproducibility. The tables below offer columns that are sortable in ascending/descending order. Please click on the name of a column to sort the data on that attribute.

Note 1: Deep learning methods use diverse architectures e.g. CPU-only or CPU-GPU. The symbols *, †, and ‡ are used to specify the machine configuration used for each method (see legend below). AttentionXML and the X-Transformer could not be run on a single GPU. These methods were executed on a cluster with 8 GPUs and training times were scaled accordingly before reporting.

Note 2: Results for methods marked with a ♦ symbol were directly taken from their respective publications. In some cases, this was done since publicly available implementations of the method could not be scaled. In other cases, this was done since a different version of the dataset was used in the publication. For instance, this website does not provide raw text for legacy datasets. Consequently, results on deep learning methods on legacy datasets are always marked with a ♦ symbol since those methods used raw text from alternate sources that resulted in different train-test splits.

Legend:

LF-AmazonTitles-131K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 30.05 21.25 16.02 30.05 31.58 34.05 19.23 26.09 32.26 19.23 23.64 26.60 1.95 0.08
Astec 37.12 25.20 18.24 37.12 38.17 40.16 29.22 34.64 39.49 29.22 32.73 35.03 3.24 1.83
AttentionXML 32.25 21.70 15.61 32.25 32.83 34.42 23.97 28.60 32.57 23.97 26.88 28.75 2.61 20.73
Bonsai* 34.11 23.06 16.63 34.11 34.81 36.57 24.75 30.35 34.86 24.75 28.32 30.47 0.24 0.10
DECAF 38.40 25.84 18.65 38.40 39.43 41.46 30.85 36.44 41.42 30.85 34.69 37.13 0.81 2.16
DiSMEC* 35.14 23.88 17.24 35.14 36.17 38.06 25.86 32.11 36.97 25.86 30.09 32.47 0.11 3.10
ECLARE 40.74 27.54 19.88 40.74 42.01 44.16 33.51 39.55 44.70 33.51 37.70 40.21 0.72 2.16
GalaXC 39.17 26.85 19.49 39.17 40.82 43.06 32.50 38.79 43.95 32.50 36.86 39.37 0.67 0.42
LightXML 35.60 24.15 17.45 35.60 36.33 38.17 25.67 31.66 36.44 25.67 29.43 31.68 2.25 71.40
MACH 33.49 22.71 16.45 33.49 34.36 36.16 24.97 30.23 34.72 24.97 28.41 30.54 2.35 3.30
NGAME 46.01 30.28 21.47 46.01 46.69 48.67 38.81 44.40 49.43 38.81 42.79 45.31 1.20 12.59
Parabel* 32.60 21.80 15.61 32.60 32.96 34.47 23.27 28.21 32.14 23.27 26.36 28.21 0.34 0.03
PfastreXML* 32.56 22.25 16.05 32.56 33.62 35.26 26.81 30.61 34.24 26.81 29.02 30.67 3.02 0.26
SiameseXML 41.42 27.92 21.21 41.42 42.65 44.95 35.80 40.96 46.19 35.80 39.36 41.95 1.71 1.08
Slice+FastText* 30.43 20.50 14.84 30.43 31.07 32.76 23.08 27.74 31.89 23.08 26.11 28.13 0.39 0.08
X-Transformer 29.95 18.73 13.07 29.95 28.75 29.60 21.72 24.42 27.09 21.72 23.18 24.39 - -
XR-Transformer 38.10 25.57 18.32 38.10 38.89 40.71 28.86 34.85 39.59 28.86 32.92 35.21 - 35.40
XT* 31.41 21.39 15.48 31.41 32.17 33.86 22.37 27.51 31.64 22.37 25.58 27.52 0.84 9.46

LF-Amazon-131K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 35.73 25.46 19.41 35.73 37.81 41.08 23.56 31.97 39.95 23.56 29.07 33.00 4.01 0.68
Astec 42.22 28.62 20.85 42.22 43.57 46.06 32.95 39.42 45.30 32.95 37.45 40.35 5.52 3.39
AttentionXML 42.90 28.96 20.97 42.90 44.07 46.44 32.92 39.51 45.24 32.92 37.49 40.33 5.04 50.17
Bonsai* 40.23 27.29 19.87 40.23 41.46 43.84 29.60 36.52 42.39 29.60 34.43 37.34 0.46 0.40
DECAF 42.94 28.79 21.00 42.94 44.25 46.84 34.52 41.14 47.33 34.52 39.35 42.48 1.86 1.80
DiSMEC* 41.68 28.32 20.58 41.68 43.22 45.69 31.61 38.96 45.07 31.61 36.97 40.05 0.45 7.12
ECLARE 43.56 29.65 21.57 43.56 45.24 47.82 34.98 42.38 48.53 34.98 40.30 43.37 1,118.78 2.15
LightXML 41.49 28.32 20.75 41.49 42.70 45.23 30.27 37.71 44.10 30.27 35.20 38.28 2.03 56.03
MACH 34.52 23.39 17.00 34.52 35.53 37.51 25.27 30.71 35.42 25.27 29.02 31.33 4.57 13.91
NGAME 46.53 30.89 22.02 46.53 47.44 49.58 38.53 44.95 50.45 38.53 43.07 45.81 1.20 39.99
Parabel* 39.57 26.64 19.26 39.57 40.48 42.61 28.99 35.36 40.69 28.99 33.36 35.97 0.62 0.10
PfastreXML* 35.83 24.35 17.60 35.83 36.97 38.85 28.99 33.24 37.40 28.99 31.65 33.62 5.30 1.54
SiameseXML 44.81 30.19 21.94 44.81 46.15 48.76 37.56 43.69 49.75 37.56 41.91 44.97 1.76 1.18
Slice+FastText* 32.07 22.21 16.52 32.07 33.54 35.98 23.14 29.08 34.63 23.14 27.25 30.06 0.39 0.11
XR-Transformer 45.61 30.85 22.32 45.61 47.10 49.65 34.93 42.83 49.24 34.93 40.67 43.91 - 38.40
XT* 34.31 23.27 16.99 34.31 35.18 37.26 24.35 29.81 34.70 24.35 27.95 30.34 0.92 1.38

LF-WikiSeeAlsoTitles-320K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 16.30 11.24 8.84 16.30 16.19 17.14 7.24 9.63 11.75 7.24 9.06 10.43 4.22 0.21
Astec 22.72 15.12 11.43 22.72 22.16 22.87 13.69 15.81 17.50 13.69 15.56 16.75 7.30 4.17
AttentionXML 17.56 11.34 8.52 17.56 16.58 17.07 9.45 10.63 11.73 9.45 10.45 11.24 6.02 56.12
Bonsai* 19.31 12.71 9.55 19.31 18.74 19.32 10.69 12.44 13.79 10.69 12.29 13.29 0.37 0.37
DECAF 25.14 16.90 12.86 25.14 24.99 25.95 16.73 18.99 21.01 16.73 19.18 20.75 1.76 11.16
DiSMEC* 19.12 12.93 9.87 19.12 18.93 19.71 10.56 13.01 14.82 10.56 12.70 14.02 0.19 15.56
ECLARE 29.35 19.83 15.05 29.35 29.21 30.20 22.01 24.23 26.27 22.01 24.46 26.03 1.67 13.46
GalaXC 27.87 18.75 14.30 27.87 26.84 27.60 19.77 22.25 24.47 19.77 21.70 23.16 1.08 1.08
MACH 18.06 11.91 8.99 18.06 17.57 18.17 9.68 11.28 12.53 9.68 11.19 12.14 2.51 8.23
Parabel* 17.68 11.48 8.59 17.68 16.96 17.44 9.24 10.65 11.80 9.24 10.49 11.32 0.60 0.07
PfastreXML* 17.10 11.13 8.35 17.10 16.80 17.35 12.15 12.51 13.26 12.15 12.81 13.48 6.77 0.59
SiameseXML 31.97 21.43 16.24 31.97 31.57 32.59 26.82 28.42 30.36 26.82 28.74 30.27 2.62 1.90
Slice+FastText* 18.55 12.62 9.68 18.55 18.29 19.07 11.24 13.45 15.20 11.24 13.03 14.23 0.94 0.20
XT* 17.04 11.31 8.60 17.04 16.61 17.24 8.99 10.52 11.82 8.99 10.33 11.26 1.90 5.28

LF-WikiSeeAlso-320K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 30.79 20.88 16.47 30.79 30.02 31.64 13.48 17.92 22.21 13.48 16.52 19.08 12.13 2.40
Astec 40.07 26.69 20.36 40.07 39.36 40.88 23.41 28.08 31.92 23.41 27.48 30.17 13.46 7.11
AttentionXML 40.50 26.43 19.87 40.50 39.13 40.26 22.67 26.66 29.83 22.67 26.13 28.38 7.12 90.37
Bonsai* 34.86 23.21 17.66 34.86 34.09 35.32 18.19 22.35 25.66 18.19 21.62 23.84 0.84 1.39
DECAF 41.36 28.04 21.38 41.36 41.55 43.32 25.72 30.93 34.89 25.72 30.69 33.69 4.84 13.40
DiSMEC* 34.59 23.58 18.26 34.59 34.43 36.11 18.95 23.92 27.90 18.95 23.04 25.76 1.28 58.79
ECLARE 40.58 26.86 20.14 40.48 40.05 41.23 26.04 30.09 33.01 26.04 30.06 32.32 2.83 9.40
LightXML 34.50 22.31 16.83 34.50 33.21 34.24 17.85 21.26 24.16 17.85 20.81 22.80 - 249.00
MACH 27.18 17.38 12.89 27.18 26.09 26.80 13.11 15.28 16.93 13.11 15.17 16.48 11.41 50.22
NGAME 47.65 31.56 23.68 47.65 47.50 48.99 33.83 37.79 41.03 33.83 38.36 41.01 2.51 75.39
Parabel* 33.46 22.03 16.61 33.46 32.40 33.34 17.10 20.73 23.53 17.10 20.02 21.88 1.18 0.33
PfastreXML* 28.79 18.38 13.60 28.79 27.69 28.28 17.12 18.19 19.43 17.12 18.23 19.20 14.02 4.97
SiameseXML 42.16 28.14 21.39 42.16 41.79 43.36 29.02 32.68 36.03 29.02 32.64 35.17 2.70 2.33
Slice+FastText* 27.74 19.39 15.47 27.74 27.84 29.65 13.07 17.50 21.55 13.07 16.36 18.90 0.94 0.20
XR-Transformer 42.57 28.24 21.30 42.57 41.99 43.44 25.18 30.13 33.79 25.18 29.84 32.59 - 119.47
XT* 30.10 19.60 14.92 30.10 28.65 29.58 14.43 17.13 19.69 14.43 16.37 17.97 2.20 3.27

LF-WikiTitles-500K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 39.00 20.66 14.55 39.00 28.40 26.80 13.91 13.38 13.75 13.91 14.63 15.88 11.18 1.98
Astec 44.40 24.69 17.49 44.40 33.43 31.72 18.31 18.25 18.56 18.31 19.57 21.09 15.01 13.50
AttentionXML 40.90 21.55 15.05 40.90 29.38 27.45 14.80 13.97 13.88 14.80 15.24 16.22 14.01 133.94
Bonsai* 40.97 22.30 15.66 40.97 30.35 28.65 16.58 16.34 16.40 16.58 17.60 18.85 1.63 2.03
DECAF 44.21 24.64 17.36 44.21 33.55 31.92 19.29 19.82 19.96 19.29 21.26 22.95 4.53 42.26
DiSMEC* 39.42 21.10 14.85 39.42 28.87 27.29 15.88 15.54 15.89 15.88 16.76 18.13 0.68 48.27
ECLARE 44.36 24.29 16.91 44.36 33.33 31.46 21.58 20.39 19.84 21.58 22.39 23.61 4.24 39.34
MACH 37.74 19.11 13.26 37.74 26.63 24.94 13.71 12.14 12.00 13.71 13.63 14.54 4.73 22.46
Parabel* 40.41 21.98 15.42 40.41 29.89 28.15 15.55 15.32 15.35 15.55 16.50 17.66 2.70 0.42
PfastreXML* 35.71 19.27 13.64 35.71 26.45 25.15 18.23 15.42 15.08 18.23 17.34 18.24 20.41 3.79
Slice+FastText* 25.48 15.06 10.98 25.48 20.67 20.52 13.90 13.33 13.82 13.90 14.50 15.90 2.30 0.74
XT* 38.13 20.71 14.66 38.13 28.13 26.61 14.10 14.12 14.38 14.10 15.15 16.40 3.10 14.67

LF-AmazonTitles-1.3M


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 47.79 41.65 36.91 47.79 44.83 42.93 15.42 19.67 21.91 15.42 18.05 19.36 14.53 2.48
Astec 48.82 42.62 38.44 48.82 46.11 44.80 21.47 25.41 27.86 21.47 24.08 25.66 26.66 18.54
AttentionXML 45.04 39.71 36.25 45.04 42.42 41.23 15.97 19.90 22.54 15.97 18.23 19.60 28.84 380.02
Bonsai* 47.87 42.19 38.34 47.87 45.47 44.35 18.48 23.06 25.95 18.48 21.52 23.33 9.02 7.89
DECAF 50.67 44.49 40.35 50.67 48.05 46.85 22.07 26.54 29.30 22.07 25.06 26.85 9.62 74.47
ECLARE 50.14 44.09 40.00 50.14 47.75 46.68 23.43 27.90 30.56 23.43 26.67 28.61 9.15 70.59
GalaXC 49.81 44.23 40.12 49.81 47.64 46.47 25.22 29.12 31.44 25.22 27.81 29.36 2.69 9.55
MACH 35.68 31.22 28.35 35.68 33.42 32.27 9.32 11.65 13.26 9.32 10.79 11.65 7.68 60.39
NGAME 56.75 49.19 44.09 56.75 53.84 52.41 29.18 33.01 35.36 29.18 32.07 33.91 9.71 97.75
Parabel* 46.79 41.36 37.65 46.79 44.39 43.25 16.94 21.31 24.13 16.94 19.70 21.34 11.75 1.50
PfastreXML* 37.08 33.77 31.43 37.08 36.61 36.61 28.71 30.98 32.51 28.71 29.92 30.73 29.59 9.66
SiameseXML 49.02 42.72 38.52 49.02 46.38 45.15 27.12 30.43 32.52 27.12 29.41 30.90 14.58 9.89
Slice* 34.80 30.58 27.71 34.80 32.72 31.69 13.96 17.08 19.14 13.96 15.83 16.97 5.98 0.79
XT* 40.60 35.74 32.01 40.60 38.18 36.68 13.67 17.11 19.06 13.67 15.64 16.65 7.90 82.18
XR-Transformer 50.14 44.07 39.98 50.14 47.71 46.59 20.06 24.85 27.79 20.06 23.44 25.41 - 132.00

WikiSeeAlsoTitles-350K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 14.96 10.20 8.11 14.96 14.20 14.76 5.63 7.04 8.59 5.63 6.79 7.76 3.59 0.20
Astec 20.61 14.58 11.49 20.61 20.08 20.80 9.91 12.16 14.04 9.91 11.76 12.98 7.41 4.36
AttentionXML 15.86 10.43 8.01 15.86 14.59 14.86 6.39 7.20 8.15 6.39 7.05 7.64 4.07 30.44
Bonsai* 17.95 12.27 9.56 17.95 17.13 17.66 8.16 9.68 11.07 8.16 9.49 10.43 0.25 0.46
DiSMEC* 16.61 11.57 9.14 16.61 16.09 16.72 7.48 9.19 10.74 7.48 8.95 9.99 0.09 6.62
MACH 14.79 9.57 7.13 14.79 13.83 14.05 6.45 7.02 7.54 6.45 7.20 7.73 5.22 7.44
Parabel* 17.24 11.61 8.92 17.24 16.31 16.67 7.56 8.83 9.96 7.56 8.68 9.45 0.43 0.06
PfastreXML* 15.09 10.49 8.24 15.09 14.98 15.59 9.03 9.69 10.64 9.03 9.82 10.52 5.22 0.51
SLICE+FastText* 18.13 12.87 10.29 18.13 17.71 18.52 8.63 10.78 12.74 8.63 10.37 11.63 0.97 0.22
XML-CNN 17.75 12.34 9.73 17.75 16.93 17.48 8.24 9.72 11.15 8.24 9.40 10.31 0.78 14.25
XT* 16.55 11.37 8.93 16.55 15.88 16.47 7.38 8.75 10.05 7.38 8.57 9.46 2.00 3.25

WikiTitles-500K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 39.56 20.50 14.32 39.56 28.28 26.54 15.44 13.83 13.79 15.44 15.49 16.58 10.70 1.77
Astec 46.60 26.03 18.50 46.60 35.10 33.34 18.89 18.90 19.30 18.89 20.33 22.00 15.15 13.04
AttentionXML 42.89 22.71 15.89 42.89 30.92 28.93 15.12 14.32 14.22 15.12 15.69 16.75 9.21 102.43
Bonsai* 42.60 23.08 16.25 42.60 31.34 29.58 17.38 16.85 16.90 17.38 18.28 19.62 1.18 2.94
DiSMEC* 39.89 21.23 14.96 39.89 28.97 27.32 15.89 15.15 15.43 15.89 16.52 17.86 0.35 23.94
MACH 33.74 15.62 10.41 33.74 22.61 20.80 11.43 8.98 8.35 11.43 10.77 11.28 10.48 23.65
Parabel* 42.50 23.04 16.21 42.50 31.24 29.45 16.55 16.12 16.16 16.55 17.49 18.77 2.15 0.34
PfastreXML* 30.99 18.07 13.09 30.99 24.54 23.88 17.87 15.40 15.15 17.87 17.38 18.46 16.85 3.07
SLICE+FastText* 28.07 16.78 12.28 28.07 22.97 22.87 15.10 14.69 15.33 15.10 16.02 17.67 1.50 0.54
XML-CNN 43.45 23.24 16.53 43.45 31.69 29.95 15.64 14.74 14.98 15.64 16.17 17.45 1.17 55.21
XT* 39.44 21.57 15.31 39.44 29.17 27.65 15.23 15.00 15.25 15.23 16.23 17.59 3.30 12.13

AmazonTitles-670K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 35.31 30.90 27.83 35.31 32.76 31.26 17.94 20.69 23.30 17.94 19.57 20.88 2.99 0.17
Astec 40.63 36.22 33.00 40.63 38.45 37.09 28.07 30.17 32.07 28.07 29.20 29.98 10.93 3.85
AttentionXML 37.92 33.73 30.57 37.92 35.78 34.35 24.24 26.43 28.39 24.24 25.48 26.33 12.11 37.50
Bonsai* 38.46 33.91 30.53 38.46 36.05 34.48 23.62 26.19 28.41 23.62 25.16 26.21 0.66 0.53
DiSMEC* 38.12 34.03 31.15 38.12 36.07 34.88 22.26 25.46 28.67 22.26 24.30 26.00 0.29 11.74
MACH 34.92 31.18 28.56 34.92 33.07 31.97 20.56 23.14 25.79 20.56 22.18 23.53 3.84 6.41
Parabel* 38.00 33.54 30.10 38.00 35.62 33.98 23.10 25.57 27.61 23.10 24.55 25.48 1.06 0.09
PfastreXML* 32.88 30.54 28.80 32.88 32.20 31.85 26.61 27.79 29.22 26.61 27.10 27.59 5.32 0.99
SLICE+FastText* 33.85 30.07 26.97 33.85 31.97 30.56 21.91 24.15 25.81 21.91 23.26 24.03 2.01 0.22
XML-CNN 35.02 31.37 28.45 35.02 33.24 31.94 21.99 24.93 26.84 21.99 23.83 24.67 1.36 23.52
XT* 36.57 32.73 29.79 36.57 34.64 33.35 22.11 24.81 27.18 22.11 23.73 24.87 4.00 4.65

AmazonTitles-3M


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 48.37 44.68 42.24 48.37 45.93 44.43 11.47 13.84 15.72 11.47 13.02 14.15 10.23 1.68
Astec 48.74 45.70 43.31 48.74 46.96 45.67 16.10 18.89 20.94 16.10 18.00 19.33 40.60 13.04
AttentionXML 46.00 42.81 40.59 46.00 43.94 42.61 12.81 15.03 16.71 12.80 14.23 15.25 44.40 273.10
Bonsai* 46.89 44.38 42.30 46.89 45.46 44.35 13.78 16.66 18.75 13.78 15.75 17.10 9.53 9.90
MACH 37.10 33.57 31.33 37.10 34.67 33.17 7.51 8.61 9.46 7.51 8.23 8.76 9.77 40.48
Parabel* 46.42 43.81 41.71 46.42 44.86 43.70 12.94 15.58 17.55 12.94 14.70 15.94 13.20 1.54
PfastreXML* 31.16 31.35 31.10 31.16 31.78 32.08 22.37 24.59 26.16 22.37 23.72 24.65 22.97 10.47
SLICE+FastText* 35.39 33.33 31.74 35.39 34.12 33.21 11.32 13.37 14.94 11.32 12.65 13.61 12.22 0.64
XT* 27.99 25.24 23.57 27.99 25.98 24.78 4.45 5.06 5.57 4.45 4.78 5.03 16.00 15.80

LF-Wikipedia-500K / Wikipedia-500K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 64.64 43.20 32.77 64.64 54.54 52.42 26.88 30.24 32.79 26.88 30.71 33.33 48.32 15.50
APLC-XLNet 72.83 50.50 38.55 72.83 62.06 59.27 30.03 35.25 38.27 30.03 35.01 37.86 1.40 -
Astec 73.02 52.02 40.53 73.02 64.10 62.32 30.69 36.48 40.38 30.69 36.33 39.84 28.06 20.35
AttentionXML 82.73 63.75 50.41 82.73 76.56 74.86 34.00 44.32 50.15 34.00 42.99 47.69 9.30 110.60
Bonsai* 69.20 49.80 38.80 - - - - - - - - - - -
DiSMEC* 70.20 50.60 39.70 70.20 42.10 40.50 31.20 33.40 37.00 31.20 33.70 37.10 - -
ECLARE 68.04 46.44 35.74 68.04 58.15 56.37 31.02 35.39 38.29 31.02 35.66 34.50 7.40 86.57
LightXML 81.59 61.78 47.64 81.59 74.73 72.23 31.99 42.00 46.53 31.99 40.99 45.18 - 185.56
MACH 52.78 32.39 23.75 52.78 42.05 39.70 17.65 18.06 18.66 17.64 19.18 45.18 4.50 31.20
NGAME 84.01 64.69 49.97 84.01 78.25 75.97 41.25 52.57 57.04 41.25 51.58 56.11 3.88 54.88
Parabel* 68.70 49.57 38.64 68.70 60.51 58.62 26.88 31.96 35.26 26.88 31.73 34.61 5.65 2.72
PfastreXML* 59.50 40.20 30.70 59.50 30.10 28.70 29.20 27.60 27.70 29.20 28.70 28.30 - 63.59
ProXML* 68.80 48.90 37.90 68.80 39.10 38.00 33.10 35.00 39.40 33.10 35.20 39.00 - -
SiameseXML 67.26 44.82 33.73 67.26 56.64 54.29 33.95 35.46 37.07 33.95 36.58 38.93 5.73 7.31
X-Transformer 76.95 58.42 46.14 - - - - - - - - - - -
XML-CNN 59.85 39.28 29.81 59.85 48.67 46.12 - - - - - - - 117.23
XR-Transformer 81.62 61.38 47.85 81.62 74.46 72.43 33.58 42.97 47.81 33.58 42.21 46.61 - 318.90
XT* 64.48 45.84 35.46 - - - - - - - - - 5.50 20.88

Amazon-670K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 42.39 36.89 32.98 42.39 39.07 37.04 21.56 24.78 27.66 21.56 23.38 24.76 50.00 1.56
APLC-XLNet 43.46 38.83 35.32 43.46 41.01 39.38 26.12 29.66 32.78 26.12 28.20 29.68 1.1 -
Astec 47.77 42.79 39.10 47.77 45.28 43.74 32.13 35.14 37.82 32.13 33.80 35.01 18.79 7.32
AttentionXML 47.58 42.61 38.92 47.58 45.07 43.50 30.29 33.85 37.13 - - - 16.56 78.30
Bonsai* 45.58 40.39 36.60 45.58 42.79 41.05 27.08 30.79 34.11 - - - - -
DiSMEC* 44.70 39.70 36.10 44.70 42.10 40.50 27.80 30.60 34.20 27.80 28.80 30.70 3.75 56.02
FastXML* 36.99 33.28 30.53 36.99 35.11 33.86 19.37 23.26 26.85 19.37 22.25 24.69 - -
LEML* 8.13 6.83 6.03 8.13 7.30 6.85 2.07 2.26 2.47 2.07 2.21 2.35 - -
LPSR-NB* 28.65 24.88 22.37 28.65 26.40 25.03 16.68 18.07 19.43 16.68 17.70 18.63 - -
LightXML 49.10 43.83 39.85 - - - - - - - - - 4.59 86.25
PPD-Sparse* 45.32 40.37 36.92 - - - 26.64 30.65 34.65 - - - - -
Parabel* 44.89 39.80 36.00 44.89 42.14 40.36 25.43 29.43 32.85 25.43 28.38 30.71 2.41 0.41
PfastreXML* 39.46 35.81 33.05 39.46 37.78 36.69 29.30 30.80 32.43 29.30 30.40 31.49 - -
ProXML* 43.50 38.70 35.30 43.50 41.10 39.70 30.80 32.80 35.10 30.80 31.70 32.70 - -
SLEEC* 35.05 31.25 28.56 34.77 32.74 31.53 20.62 23.32 25.98 20.62 22.63 24.43 - -
SLICE+FastText* 33.15 29.76 26.93 33.15 31.51 30.27 20.20 22.69 24.70 20.20 21.71 22.72 2.01 0.21
XML-CNN 35.39 31.93 29.32 35.39 33.74 32.64 28.67 33.27 36.51 - - - - 52.23
XT* 42.50 37.87 34.41 42.50 40.01 38.43 24.82 28.20 31.24 24.82 26.82 28.29 4.20 8.22

Amazon-3M


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 49.30 45.55 43.11 49.30 46.79 45.27 11.69 14.07 15.98 - - - - -
AttentionXML 50.86 48.04 45.83 50.86 49.16 47.94 15.52 18.45 20.60 - - - - -
Bonsai* 48.45 45.65 43.49 48.45 46.78 45.59 13.79 16.71 18.87 - - - - -
DiSMEC* 47.34 44.96 42.80 47.36 - - - - - - - - - -
FastXML* 44.24 40.83 38.59 44.24 41.92 40.47 9.77 11.69 13.25 9.77 11.20 12.29 - -
Parabel* 47.48 44.65 42.53 47.48 45.73 44.53 12.82 15.61 17.73 12.82 14.89 16.38 - -
PfastreXML* 43.83 41.81 40.09 43.83 42.68 41.75 21.38 23.22 24.52 21.38 22.75 23.68 - -

AmazonCat-13K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 93.54 78.37 63.30 93.54 87.29 85.10 49.04 61.13 69.64 49.04 58.83 65.47 18.61 3.45
APLC-XLNet 94.56 79.82 64.61 94.56 88.74 86.66 52.22 65.08 71.40 52.22 62.57 67.92 0.50 -
AttentionXML 95.92 82.41 67.31 95.92 91.17 89.48 53.76 68.72 76.38 - - - - -
Bonsai* 92.98 79.13 64.46 92.98 87.68 85.92 51.30 64.60 72.48 - - - 0.55 1.26
DiSMEC* 93.40 79.10 64.10 93.40 87.70 85.80 59.10 67.10 71.20 59.10 65.20 68.80 - -
FastXML* 93.11 78.20 63.41 93.11 87.07 85.16 48.31 60.26 69.30 48.31 56.90 62.75 - -
LightXML 96.77 84.02 68.70 - - - - - - - - - - -
PD-Sparse* 90.60 75.14 60.69 90.60 84.00 82.05 49.58 61.63 68.23 49.58 58.28 62.68 - -
Parabel* 93.03 79.16 64.52 93.03 87.72 86.00 50.93 64.00 72.08 50.93 60.37 65.68 0.62 0.63
PfastreXML* 91.75 77.97 63.68 91.75 86.48 84.96 69.52 73.22 75.48 69.52 72.21 73.67 19.02 5.69
SLEEC* 90.53 76.33 61.52 90.53 84.96 82.77 46.75 58.46 65.96 46.75 55.19 60.08 - -
XML-CNN 93.26 77.06 61.40 93.26 86.20 83.43 52.42 62.83 67.10 - - - - -
XT* 92.59 78.24 63.58 92.59 86.90 85.03 49.61 62.22 70.24 49.61 59.71 66.04 0.46 7.14
XTransformer 96.70 83.85 68.58 - - - - - - - - - - -

References


[01] K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain, Sparse Local Embeddings for Extreme Multi-label Classification, in NeurIPS 2015.
[02] R. Agrawal, A. Gupta, Y. Prabhu and M. Varma, Multi-Label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages, in WWW 2013.
[03] Y. Prabhu and M. Varma, FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning, in KDD 2014.
[04] J. Weston, A. Makadia, and H. Yee, Label Partitioning For Sublinear Ranking, in ICML 2013.
[05] H. Yu, P. Jain, P. Kar, and I. Dhillon, Large-scale Multi-label Learning with Missing Labels, in ICML 2014.
[06] D. Hsu, S. Kakade, J. Langford, and T. Zhang, Multi-Label Prediction via Compressed Sensing, in NeurIPS 2009.
[07] F. Tai, and H. Lin, Multi-label Classification with Principle Label Space Transformation , in Neural Computation,2012.
[08] W. Bi, and J. Kwok, Efficient Multi-label Classification with Many Labels , in ICML, 2013.
[09] Y. Chen, and H. Lin, Feature-aware Label Space Dimension Reduction for Multi-label Classification , in NeurIPS, 2012.
[10] C. Ferng, and H. Lin, Multi-label Classification with Error-correcting Codes, in ACML, 2011.
[11] J. Weston, S. Bengio, and N. Usunier, WSABIE: Scaling Up To Large Vocabulary Image Annotation , in IJCAI, 2011.
[12] S. Ji, L. Tang, S. Yu, and J. Ye, Extracting Shared Subspaces for Multi-label Classification , in KDD, 2008.
[13] Z. Lin, G. Ding, M. Hu, and J. Wang, Multi-label Classification via Feature-aware Implicit Label Space Encoding , in ICML, 2014.
[14] P. Mineiro, and N. Karampatziakis, Fast Label Embeddings via Randomized Linear Algebra, Preprint, 2015.
[15] N. Karampatziakis, and P. Mineiro, Scalable Multilabel Prediction via Randomized Methods, Preprint, 2015.
[16] K. Balasubramanian, and G. Lebanon, The Landmark Selection Method for Multiple Output Prediction, Preprint, 2012.
[17] M. Cisse, N. Usunier, T. Artieres, and P. Gallinari, Robust Bloom Filters for Large Multilabel Classification Tasks , in NIPS, 2013.
[18] B. Hariharan, S. Vishwanathan, and M. Varma, Efficient max-margin multi-label classification with applications to zero-shot learning, in Machine Learning Journal, 2012.
[19] C. Snoek, M. Worring, J. van Gemert, J.-M. Geusebroek, and A. Smeulders, The challenge problem for automated detection of 101 semantic concepts in multimedia, in ACM Multimedia, 2006.
[20] I. Katakis, G. Tsoumakas, and I. Vlahavas, Multilabel text classification for automated tag suggestion, in ECML/PKDD Discovery Challenge, 2008.
[21] G. Tsoumakas, I. Katakis, and I. Vlahavas, Effective and efficient multilabel classification in domains with large number of labels, in ECML/PKDD 2008 Workshop on Mining Multidimensional Data, 2008.
[22] J. Leskovec and A. Krevl, SNAP Datasets: Stanford large network dataset collection, 2014.
[23] A. Zubiaga, Enhancing navigation on wikipedia with social tags, Preprint, 2009.
[24] R. Wetzker, C. Zimmermann, and C. Bauckhage, Analyzing social bookmarking systems: A del.icio.us cookbook, in Mining Social Data (MSoDa) Workshop Proceedings, ECAI, 2008.
[25] I. Partalas, A Kosmopoulos, N Baskiotis, T Artieres, G Paliouras, E Gaussier, I Androutsopoulos, M.-R. Amini and P Galinari, LSHTC: A Benchmark for Large-Scale Text Classification, Preprint , 2015
[26] D. D. Lewis, Y. Yang, T. Rose, and F. Li, RCV1: A New Benchmark Collection for Text Categorization Research in JMLR, 2004.
[27] E. L. Mencia, and J. Furnkranz, Efficient pairwise multilabel classification for large-scale problems in the legal domain in ECML/PKDD, 2008.
[28] J. McAuley, and J. Leskovec, Hidden factors and hidden topics: understanding rating dimensions with review text in Proceedings of the 7th ACM conference on Recommender systems ACM, 2013.
[29] J. McAuley, C. Targett, Q. Shi, and A. v. d. Hengel, Image-based Recommendations on Styles and Substitutes in International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015.
[30] J. McAuley, R. Pandey, and J. Leskovec, Inferring networks of substitutable and complementary products in KDD, 2015.
[31] H. Jain, Y. Prabhu, and M. Varma, Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications in KDD, 2016.
[32] R. Babbar, and B. Schölkopf, DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification in WSDM, 2017.
[33] I. E. H. Yen, X. Huang, K. Zhong, P. Ravikumar and I. S. Dhillon, PD-Sparse: A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification in ICML, 2016.
[34] I. E. H. Yen, X. Huang, W. Dai, P. Ravikumar I. S. Dhillon and E.-P. Xing, PPDSparse: A Parallel Primal-Dual Sparse Method for Extreme Classification in KDD, 2017.
[35] K. Jasinska, K. Dembczynski, R. Busa-Fekete, K. Pfannschmidt, T. Klerx and E. Hullermeier, Extreme F-Measure Maximization using Sparse Probability Estimates in ICML, 2017.
[36] J. Liu, W-C. Chang, Y. Wu and Y. Yang, Deep Learning for Extreme Multi-label Text Classification in SIGIR, 2017.
[37] Y. Jernite, A. Choromanska, D. Sontag, Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation in ICML, 2017.
[38] Y. Prabhu, A. Kag, S. Harsola, R. Agrawal and M. Varma, Parabel: Partitioned Label Trees for Extreme Classification with Application to Dynamic Search Advertising in WWW, 2018.
[39] I. Evron, E. Moroshko and K. Crammer, Efficient Loss-Based Decoding on Graphs for Extreme Classification in NeurIPS, 2018.
[40] A. Niculescu-Mizil and E. Abbasnejad, Label Filters for Large Scale Multilabel Classification in AISTATS, 2017.
[41] H. Jain, V. Balasubramanian, B. Chunduri and M. Varma, Slice: Scalable linear extreme classifiers trained on 100 million labels for related searches, in WSDM 2019.
[42] A. Jalan, P. Kar, Accelerating Extreme Classification via Adaptive Feature Agglomeration, in IJCAI 2019.
[43] R. Babbar, and B. Schölkopf, Data Scarcity, Robustness and Extreme Multi-label Classification in Machine Learning Journal and European Conference on Machine Learning, 2019.
[44] S. Khandagale, H. Xiao and R. Babbar, Bonsai - Diverse and Shallow Trees for Extreme Multi-label Classification, in ArXiv 2019.
[45] W. Siblini, F. Meyer and P. Kuntz, CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning, in ICML 2018.
[46] V. Gupta, R. Wadbude, N. Natarajan, H. Karnick, P. Jain and P. Rai, Distributional Semantics meets Multi-Label Learning, in AAAI 2019.
[47] I. Chalkidis, E. Fergadiotis, P. Malakasiotis, N. Aletras and I. Androutsopoulos, Extreme Multi-Label Legal Text Classification: A Case Study in EU Legislation, in Natural Legal Language Processing Workshop 2019.
[47b] I. Chalkidis, E. Fergadiotis, P. Malakasiotis, N. Aletras and I. Androutsopoulos, EURLEX57K Dataset.
[48] I. Chalkidis, E. Fergadiotis, P. Malakasiotis, and I. Androutsopoulos, Large-Scale Multi-Label Text Classification on EU Legislation, in ACL 2019.
[49] G. Tsoumakas, E. Spyromitros-Xioufis, J. Vilcek and I. Vlahavas, Mulan: A Java Library for Multi-Label Learning, in JMLR 2011.
[50] A. Jalan and P. Kar, Accelerating Extreme Classification via Adaptive Feature Agglomeration, in IJCAI 2019.
[51] R. You, S. Dai, Z. Zhang, H. Mamitsuka, and S. Zhu, AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Network, in NeurIPS 2019.
[52] T. K. R. Medini, Q. Huang, Y. Wang, V. Mohan, and A. Shrivastava, Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products, in NeurIPS 2019.
[53] W-C. Chang, H.-F. Yu, K. Zhong, Y. Yang, and I. Dhillon, Taming Pretrained Transformers for Extreme Multi-label Text Classification, in KDD 2020.
[54] T. Jiang, D. Wang, L. Sun, H. Yang, Z. Zhao, F. Zhuang, LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification, in AAAI 2021.
[55] K. Dahiya, D. Saini, A. Mittal, A. Shaw, K. Dave, A. Soni, H. Jain, S. Agarwal and M. Varma, DeepXML: A deep extreme multi-Label learning framework applied to short text documents, in WSDM 2021.
[56] A. Mittal, K. Dahiya, S. Agrawal, D. Saini, S. Agarwal, P. Kar and M. Varma, DECAF: Deep extreme classification with label features, in WSDM 2021.
[57] A. Mittal, N. Sachdeva, S. Agrawal, S. Agarwal, P. Kar and M. Varma, ECLARE: Extreme classification with label graph correlations, in TheWebConf 2021.
[58] D. Saini, A. K. Jain, K. Dave, J. Jiao, A. Singh, R. Zhang and M. Varma, GalaXC: Graph neural networks with labelwise attention for extreme classification, in TheWebConf 2021.
[59] H. Ye, Z. Chen, D.-H. Wang, B.-D. Davison, Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification, in ICML 2020.
[60] Y. Prabhu, A. Kag, S. Gopinath, K. Dahiya, S. Harsola, R. Agrawal and M. Varma, Extreme multi-label learning with label features for warm-start tagging, ranking and recommendation in WSDM, 2018.
[61] M. Qaraei, E. Schultheis, P. Gupta, and R. Babbar, Convex Surrogates for Unbiased Loss Functions in Extreme Classification With Missing Labels in TheWebConf , 2021.
[62] N. Gupta, S. Bohra, Y. Prabhu, S. Purohit and M. Varma, Generalized zero-Shot extreme multi-label learning, in KDD 2021.
[63] K. Dahiya, A. Agarwal, D. Saini, K. Gururaj, J. Jiao, A. Singh, S. Agarwal, P. Kar and M. Varma, SiameseXML: Siamese networks meet extreme classifiers with 100M labels, in ICML 2021.
[64] J. Ni, J. Li and J. McAuley, Justifying recommendations using distantly-labeled reviews and fined-grained aspects in Proceedings of Empirical Methods in Natural Language Processing (EMNLP), 2019.
[65] A. Mittal, K. Dahiya, S. Malani, J. Ramaswamy, S. Kuruvilla, J. Ajmera, K-h. Chang, S. Agarwal, P. Kar and M. Varma, Multimodal extreme classification, in CVPR 2022.
[66] K. Dahiya, N. Gupta, D. Saini, A. Soni, Y. Wang, K. Dave, J. Jiao, G. K, P. Dey, A. Singh, D. Hada, V. Jain, B. Paliwal, A. Mittal, S. Mehta, R. Ramjee, S. Agarwal, P. Kar, M. Varma, NGAME: Negative Mining-aware Mini-batching for Extreme Classification, in ArXiv 2022.
[67] E. Schultheis, R. Babbar NGAME: Speeding-up One-vs-All Training for Extreme Classification via Smart Initialization, in ECML-MLJ 2022.

Appendix

Mediamill


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 87.82 73.45 59.17 87.82 81.50 79.22 70.14 72.76 74.02 70.14 72.31 73.13 - -
CPLST* 83.82 67.32 52.80 83.82 75.29 71.92 66.23 65.28 63.70 66.23 65.89 64.77 - -
CS* 78.95 60.93 44.27 78.95 68.97 62.88 62.53 58.97 53.23 62.53 60.33 56.50 - -
DiSMEC* 81.86 62.52 45.11 81.86 70.21 63.71 62.23 59.85 54.03 62.25 61.05 57.26 - -
FastXML* 83.57 65.78 49.97 83.57 74.06 69.34 66.06 63.83 61.11 66.06 64.83 62.94 - -
LEML* 81.29 64.74 49.83 81.29 72.92 69.37 64.24 62.73 59.92 64.24 63.47 61.57 - -
LPSR* 83.57 65.50 48.57 83.57 73.84 68.18 66.06 63.53 59.38 66.06 64.63 61.84 - -
ML-CSSP 83.98 67.37 53.02 83.98 75.31 72.21 66.88 65.90 64.90 66.88 66.47 65.71 - -
PD-Sparse* - - - - - - - - - - - - - -
PPD-Sparse* 86.50 68.40 53.20 86.50 77.30 75.60 64.30 61.30 60.80 64.30 63.60 62.80 - -
Parabel* - - - - - - - - - - - - - -
PfastreXML* 84.22 67.33 53.04 84.22 75.41 72.37 66.67 65.43 64.30 66.08 66.08 65.24 - -
SLEEC* 84.01 67.20 52.80 84.01 75.23 71.96 66.34 65.11 63.62 66.34 65.79 64.71 - -
WSABIE 83.35 66.18 51.46 83.35 74.21 70.55 65.79 64.07 61.89 65.79 64.88 63.36 - -
kNN* 83.91 67.12 52.99 83.91 75.22 72.21 66.51 65.21 64.30 66.51 65.91 65.20 - -

Bibtex


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

1-vs-All 62.62 39.09 28.79 62.62 59.13 61.58 48.84 52.96 59.29 48.84 51.62 55.09 - -
CPLST* 62.38 37.84 27.62 62.38 57.63 59.71 48.17 50.86 56.42 48.17 49.94 52.96 - -
CS* 58.87 33.53 23.72 58.87 52.19 53.25 46.04 45.08 48.17 46.04 45.25 46.89 - -
DiSMEC* - - - - - - - - - - - - - -
FastXML* 63.42 39.23 28.86 63.42 59.51 61.70 48.54 52.30 58.28 48.54 51.11 54.38 - -
LEML* 62.54 38.41 28.21 62.54 58.22 60.53 47.97 51.42 57.53 47.97 50.25 53.59 - -
LPSR* 62.11 36.65 26.53 62.11 56.50 58.23 49.20 50.14 55.01 49.20 49.78 52.41 - -
ML-CSSP 44.98 30.43 23.53 44.98 44.67 47.97 32.38 38.68 45.96 32.38 36.73 40.74 - -
PD-Sparse* 61.29 35.82 25.74 61.29 55.83 57.35 48.34 48.77 52.93 48.34 48.49 50.72 - -
PPD-Sparse* - - - - - - - - - - - - - -
Parabel* 64.53 38.56 27.94 64.53 59.35 61.06 50.88 52.42 57.36 50.88 51.90 54.58 - -
PfastreXML* 63.46 39.22 29.14 63.46 59.61 62.12 52.28 54.36 60.55 52.28 53.62 56.99 - -
ProXML* 64.60 39.00 28.20 64.40 59.20 61.50 50.10 52.00 58.30 50.10 52.00 55.10 - -
SLEEC* 65.08 39.64 28.87 65.08 60.47 62.64 51.12 53.95 59.56 51.12 52.99 56.04 - -
WSABIE 54.78 32.39 23.98 54.78 50.11 52.39 43.39 44.00 49.30 43.39 43.64 46.50 - -
kNN* 57.04 34.38 25.44 57.04 52.29 54.64 43.71 45.82 51.64 43.71 45.04 48.20 - -

Delicious


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* - - - - - - - - - - - - - -
CPLST* 65.31 59.95 55.31 65.31 61.16 57.80 31.10 32.40 33.02 31.10 32.07 32.55 - -
CS* 61.36 56.46 52.07 61.36 57.66 54.44 30.60 31.84 32.26 30.60 31.54 31.89 - -
DiSMEC* - - - - - - - - - - - - - -
FastXML* 69.61 64.12 59.27 69.61 65.47 61.90 32.35 34.51 35.43 32.35 34.00 34.73 - -
LEML* 65.67 60.55 56.08 65.67 61.77 58.47 30.73 32.43 33.26 30.73 32.01 32.66 - -
LPSR* 65.01 58.96 53.49 65.01 60.45 56.38 31.34 32.57 32.77 31.34 32.29 32.50 - -
ML-CSSP 63.04 56.26 50.16 63.04 57.91 53.36 29.48 30.27 30.02 29.48 30.10 29.98 - -
PD-Sparse* 51.82 44.18 38.95 51.82 46.00 42.02 25.22 24.63 23.85 25.22 24.80 24.25 - -
Parabel* 67.44 61.83 56.75 67.44 63.15 59.41 32.69 34.00 34.53 32.69 33.69 34.10 - -
PfastreXML* 67.13 62.33 58.62 67.13 63.48 60.74 34.57 34.80 35.86 34.57 34.71 35.42 - -
SLEEC* 67.59 61.38 56.56 67.59 62.87 59.28 32.11 33.21 33.83 32.11 32.93 33.41 - -
WSABIE 64.13 58.13 53.64 64.13 59.59 56.25 31.25 32.02 32.47 31.25 31.84 32.18 - -
kNN* 64.95 58.89 54.11 64.95 60.32 56.77 31.03 32.02 32.43 31.03 31.76 32.09 - -

EURLex-4K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 79.26 64.30 52.33 79.26 68.13 61.60 34.25 39.83 42.76 34.25 38.35 40.30 0.09 0.06
APLC-XLNet 87.72 74.56 62.28 87.72 77.90 71.75 42.93 49.84 53.07 42.93 48.00 50.40 0.48 -
Bonsai* 82.96 69.76 58.31 82.96 73.15 67.41 37.08 45.13 49.57 37.08 42.94 46.10 0.02 0.03
CPLST* 58.52 45.51 32.47 58.52 48.67 40.79 24.97 27.46 25.04 24.97 26.82 25.57 - -
CS* 62.09 48.39 40.11 62.09 51.63 47.11 24.94 27.19 28.90 25.94 26.56 27.67 - -
DiSMEC* 82.40 68.50 57.70 82.40 72.50 66.70 41.20 45.40 49.30 41.20 44.30 46.90 - -
FastXML* 76.37 63.36 52.03 76.37 66.63 60.61 33.17 39.68 41.99 33.17 37.92 39.55 0.26 0.07
LEML* 68.55 55.11 45.12 68.55 58.44 53.03 31.16 34.85 36.82 31.16 33.85 35.17 - -
LPSR* 79.89 66.01 53.80 79.89 69.62 63.04 37.97 44.01 46.17 37.97 42.44 43.97 - -
ML-CSSP* 75.45 62.70 52.51 75.45 65.97 60.78 43.86 45.72 46.97 43.86 45.23 46.03 - -
PD-Sparse* 83.83 70.72 59.21 - - - 37.61 46.05 50.79 - - - - -
PPD-Sparse* 83.40 70.90 59.10 83.40 74.40 68.20 45.20 48.50 51.00 45.20 47.50 49.10 - -
Parabel* 82.25 68.71 57.53 82.25 72.17 66.54 36.44 44.08 48.46 36.44 41.99 44.91 0.03 0.02
PfastreXML* 71.36 59.90 50.39 71.36 62.87 58.06 26.62 34.16 38.96 26.62 32.07 35.23 - -
SLEEC* 63.40 50.35 41.28 63.40 53.56 48.47 24.10 27.20 29.09 24.10 26.37 27.62 - -
WSABIE* 72.28 58.16 47.73 72.28 61.64 55.92 28.60 32.49 34.46 28.60 31.45 32.77 - -
XT* 78.97 65.64 54.44 78.97 69.05 63.23 33.52 40.35 44.02 33.52 38.50 41.09 0.03 0.10
kNN* 81.73 68.78 57.44 81.73 72.15 66.40 36.36 44.04 48.29 36.36 41.95 44.78 - -

Wiki10-31K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 86.49 74.27 64.20 86.49 77.13 69.44 11.90 12.76 13.58 11.90 12.53 13.10 0.62 0.39
APLC-XLNet 89.44 78.93 69.73 89.44 81.38 74.41 14.84 15.85 17.04 14.84 15.58 16.40 0.54 -
AttentionXML 87.47 78.48 69.37 87.47 80.61 73.79 15.57 16.80 17.82 - - - - -
Bonsai* 84.69 73.69 64.39 84.69 76.25 69.17 11.78 13.27 14.28 11.78 12.89 13.61 0.13 0.64
DiSMEC* 85.20 74.60 65.90 84.10 77.10 70.40 13.60 13.10 13.80 13.60 13.20 13.60 - -
FastXML* 83.03 67.47 57.76 84.31 75.35 63.36 9.80 10.17 10.54 9.80 10.08 10.33 - -
LEML* 73.47 62.43 54.35 73.47 64.92 58.69 9.41 10.07 10.55 9.41 9.90 10.24 - -
LPSR-NB* 72.72 58.51 49.50 72.72 61.71 54.63 12.79 12.26 12.13 12.79 12.38 12.27 - -
LightXML 89.45 78.96 69.85 - - - - - - - - - - -
Parabel* 84.17 72.46 63.37 84.17 75.22 68.22 11.68 12.73 13.69 11.68 12.47 13.14 0.18 0.20
PfastreXML* 83.57 68.61 59.10 83.57 72.00 64.54 19.02 18.34 18.43 19.02 18.49 18.52 - -
SLEEC* 85.88 72.98 62.70 85.88 76.02 68.13 11.14 11.86 12.40 11.14 11.68 12.06 1.13 0.21
XML-CNN 81.42 66.23 56.11 81.42 69.78 61.83 9.39 10.00 10.20 - - - - -
XT* 86.15 75.18 65.41 86.15 77.76 70.35 11.87 13.08 13.89 11.87 12.78 13.36 0.37 0.39
XTransformer 88.51 78.71 69.62 - - - - - - - - - - -

Delicious-200K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 46.79 40.72 37.67 46.79 42.17 39.84 7.18 8.05 8.74 7.18 7.78 8.22 10.74 2.58
Bonsai* 46.69 39.88 36.38 46.69 41.51 38.84 7.26 7.97 8.53 7.26 7.75 8.10 3.91 64.42
DiSMEC* 45.50 38.70 35.50 45.50 40.90 37.80 6.50 7.60 8.40 6.50 7.50 7.90 - -
FastXML* 43.07 38.66 36.19 43.07 39.70 37.83 6.48 7.52 8.31 6.51 7.26 7.79
-
LEML* 40.73 37.71 35.84 40.73 38.44 37.01 6.06 7.24 8.10 6.06 6.93 7.52 - -
LPSR-NB 18.59 15.43 14.07 18.59 16.17 15.13 3.24 3.42 3.64 3.24 3.37 3.52 - -
PD-Sparse* 34.37 29.48 27.04 34.37 30.60 28.65 5.29 5.80 6.24 5.29 5.66 5.96 - -
PPD-Sparse* - - - - - - - - - - - - - -
Parabel* 46.86 40.08 36.70 46.86 41.69 39.10 7.22 7.94 8.54 7.22 7.71 8.09 6.36 9.58
Parabel* 46.97 40.08 36.63 46.97 41.72 39.07 7.25 7.94 8.52 7.25 7.75 8.15 - -
PfastreXML* 41.72 37.83 35.58 41.72 38.76 37.08 3.15 3.87 4.43 3.15 3.68 4.06 15.34 3.60
SLEEC* 47.85 42.21 39.43 47.85 43.52 41.37 7.17 8.16 8.96 7.17 7.89 8.44 - -
XT* 45.59 39.10 35.92 45.59 40.62 38.17 6.96 7.71 8.33 6.96 7.47 7.86 2.70 31.22

WikiLSHTC-325K


Method P@1 P@3 P@5 N@1 N@3 N@5 PSP@1 PSP@3 PSP@5 PSN@1 PSN@3 PSN@5 Model size (GB) Train time (hr)

AnnexML* 63.30 40.64 29.80 63.30 56.61 56.24 25.13 30.46 34.30 25.13 31.16 34.36 29.70 4.24
Bonsai* 66.41 44.40 32.92 66.41 60.69 60.53 28.11 35.36 39.73 28.11 35.42 38.94 2.43 3.04
DiSMEC* 64.40 42.50 31.50 64.40 58.50 58.40 29.10 35.60 39.50 29.10 35.90 39.40 - -
FastXML* 49.75 33.10 24.45 49.75 45.23 44.75 16.35 20.99 23.56 16.35 19.56 21.02 - -
LEML* 19.82 11.43 8.39 19.82 14.52 13.73 3.48 3.79 4.27 3.48 3.68 3.94 - -
LPSR-NB 27.44 16.23 11.77 27.44 23.04 22.55 6.93 7.21 7.86 6.93 7.11 7.46 - -
PD-Sparse* 61.26 39.48 28.79 61.26 55.08 54.67 28.34 33.50 36.62 28.34 31.92 33.68 - -
PPD-Sparse* 64.08 41.26 30.12 - - - 27.47 33.00 36.29 - - - - -
Parabel* 65.04 43.23 32.05 65.04 59.15 58.93 26.76 33.27 37.36 26.76 31.26 33.57 3.10 0.75
PfastreXML* 56.05 36.79 27.09 56.05 50.59 50.13 30.66 31.55 33.12 30.66 31.24 32.09 14.23 6.34
ProXML* 63.60 41.50 30.80 63.80 57.40 57.10 34.80 37.70 41.00 34.80 38.70 41.50 - -
SLEEC* 54.83 33.42 23.85 54.83 47.25 46.16 20.27 23.18 25.08 20.27 22.27 23.35 - -
XT* 56.54 37.17 27.73 56.54 50.48 50.36 20.56 25.42 28.90 20.56 25.30 27.90 4.50 1.89