The Extreme Classification Repository: Multi-label Datasets & Code


Kush BhatiaKunal DahiyaHimanshu JainAnshul Mittal Yashoteja Prabhu Manik Varma

The objective in extreme multi-label learning is to learn features and classifiers that can automatically tag a datapoint with the most relevant subset of labels from an extremely large label set. This page provides benchmark datasets, metrics, results and code that can be used for evaluating the performance of extreme multi-label algorithms.

Citing the Repository

If you use any of the datasets or results provided on this repository, then please cite

        @Misc{Bhatia16,
          author    = {Bhatia, K. and Dahiya, K. and Jain, H. and Mittal, A. and Prabhu, Y. and Varma, M.},
          title     = {The extreme classification repository: Multi-label datasets and code},
          url       = {http://manikvarma.org/downloads/XC/XMLRepository.html},
          year      = {2016}
        }
        

Download Datasets

These multi-label datasets have been processed from their original source to create train/test splits ensuring that the test set contains as many training labels as possible. This yields more realistic train/test splits as compared to uniform sampling which can drop many of the infrequently occurring, and hard to classify, labels from the test set. For example, on the WikiLSHTC-325K dataset, uniform sampling might loose ninety thousand of the hardest to classify labels from the test set whereas according to our sampling procedure, only forty thousand labels have been dropped from our test set. Results computed on the train/test splits provided on this page are therefore not comparable to results computed on the original sources or through uniform sampling. Please also note that the Ads-1M and Ads-9M datasets are proprietary and not available for download. Bag-of-words (BoW) features have been provided for all other datasets. The raw text of some of the datasets has also been provided for deep learning.

The dataset file format information can be found in the following README file and Python and Matlab scripts for reading the datasets have been provided below.


Dataset Download BoW Feature Label Number of Number of Avg. Points Avg. Labels Original
Dimensionality Dimensionality Train Points Test Points per Label per Point Source


Mediamill BoW Features 120 101 30993 12914 1902.15 4.38 [19]
Bibtex BoW Features 1836 159 4880 2515 111.71 2.40 [20]
Delicious BoW Features 500 983 12920 3185 311.61 19.03 [21]
RCV1-2K BoW Features 47236 2456 623847 155962 1218.56 4.79 [26]
EURLex-4K BoW Features 5000 3993 15539 3809 25.73 5.31 [27] + [45]
EURLex-4.3K BoW Features 200000 4271 45000 6000 60.57 5.07 [43] + [44]
AmazonCat-13K BoW Features Raw text 203882 13330 1186239 306782 448.57 5.04 [28]
AmazonCat-14K BoW Features Raw text 597540 14588 4398050 1099725 1330.1 3.53 [29] + [30]
Wiki10-31K BoW Features Raw text 101938 30938 14146 6616 8.52 18.64 [23]
Delicious-200K BoW Features 782585 205443 196606 100095 72.29 75.54 [24]
WikiLSHTC-325K BoW Features 1617899 325056 1778351 587084 17.46 3.19 [25]
WikiSeeAlsoTitles-350K BoW Features 91414 352072 629418 162491 5.24 2.33 -
WikiTitles-500K BoW Features 185479 501070 1699722 722678 23.62 4.89 -
Wikipedia-500K BoW Features Raw text 2381304 501070 1813391 783743 24.75 4.77 -
AmazonTitles-670K BoW Features 66666 670091 485176 150875 5.11 5.39 -
Amazon-670K BoW Features Raw text 135909 670091 490449 153025 3.99 5.45 [28]
Ads-1M - 164592 1082898 3917928 1563137 7.07 1.95 -
AmazonTitles-3M BoW Features 165431 2812281 1712536 739665 31.55 36.18 [29] + [30]
Amazon-3M BoW Features Raw text 337067 2812281 1717899 742507 31.64 36.17 [29] + [30]
Ads-9M - 2082698 8838461 70455530 22629136 14.32 1.79 -

Table 1: Dataset statistics & download

We have followed the naming convention of appending the number of labels to the dataset name so as to disambiguate various versions of the datasets. Thus, DeliciousLarge has been renamed to Delicious-200K, RCV1-X to RCV1-2K, etc.

Please contact Manik Varma if you would like to contribute a dataset.

Download Tools

The following scripts can be helpful for reading and writing the datasets in the given file format, evaluating various performance measures such as precision, nDCG, etc. and pre-processing the raw text for deep learning as well as extracting bag-of-words features.

Download Code

  1. 1-vs-All classifiers
  2. Trees
  3. Embeddings
  4. Deep Learning

Please contact Manik Varma if you would like us to provide a link to your code.

Metrics and Benchmark Results

Tables 2, 3, 6 and 7 present comparative results of various algorithms on the small scale datasets. Tables 4, 5, 8 and 9 present results on the larger datasets. If an algorithm cannot scale to a dataset then its results are either not shown or reported as "-". Classification accuracy is evaluated according to (PS = Propenisty Scored) Precision$@k$ and nDCG$@k$ defined for a predicted score vector $\hat{\mathbf y} \in {\cal{R}}^{L}$ and ground truth label vector $\mathbf y \in \left\lbrace 0, 1 \right\rbrace^L$ as \[ \text{P}@k := \frac{1}{k} \sum_{l\in \text{rank}_k (\hat{\mathbf y})} \mathbf y_l \] \[ \text{PSP}@k := \frac{1}{k} \sum_{l\in \text{rank}_k (\hat{\mathbf y})} \frac{\mathbf y_l}{p_l} \] \[ \text{DCG}@k := \sum_{l\in {\text{rank}}_k (\hat{\mathbf y})} \frac{\mathbf y_l}{\log(l+1)} \] \[ \text{PSDCG}@k := \sum_{l\in {\text{rank}}_k (\hat{\mathbf y})} \frac{\mathbf y_l}{p_l\log(l+1)} \] \[ \text{nDCG}@k := \frac{{\text{DCG}}@k}{\sum_{l=1}^{\min(k, \|\mathbf y\|_0)} \frac{1}{\log(l+1)}} \] \[ \text{PSnDCG}@k := \frac{{\text{PSDCG}}@k}{\sum_{l=1}^{k} \frac{1}{\log(l+1)}} \] where, $\text{rank}_k(\mathbf y)$ returns the $k$ largest indices of $\mathbf{y}$ ranked in descending order and $p_l$ is the propensity score for label $l$ which helps in making metrics unbiased [31]. Propensity scores for each of the datasets are included in the evaluation script below and it is recommended that you use the script to compute (Propensity Scored) Precision and nDCG so as to be consistent with the results reported in Tables 2-9.

Tools to compute (propensity scored) Precision and nDCG in Python and Matlab.


Dataset Propensity Scored Precision@k Embedding Based Tree Based Other
SLEEC [1] LEML [5] WSABIE [11] CPLST [9] CS [6] ML-CSSP [8] PfastreXML [31] FastXML [2] LPSR [4] 1-vs-All [18] kNN Parabel [36] DiSMEC [32] PD-Sparse [33] PPD-Sparse [34] ProXML [39]

Bibtex PSP@1 51.12 47.97 43.39 48.17 46.04 32.38 52.28 48.54 49.20 48.84 43.71 50.88 - 48.34 - 50.1
PSP@3 53.95 51.42 44.00 50.86 45.08 38.68 54.36 52.30 50.14 52.96 45.82 52.42 - 48.77 - 52.0
PSP@5 59.56 57.53 49.30 56.42 48.17 45.96 60.55 58.28 55.01 59.29 51.64 57.36 - 52.93 - 58.3

Delicious PSP@1 32.11 30.73 31.25 31.10 30.60 29.48 34.57 32.35 31.34 31.95 31.03 32.69 - 25.22 - -
PSP@3 33.21 32.43 32.02 32.40 31.84 30.27 34.80 34.51 32.57 33.24 32.02 34.00 - 24.63 - -
PSP@5 33.83 33.26 32.47 33.02 32.26 30.02 35.86 35.43 32.77 33.47 32.43 34.53 - 23.85 - -

Mediamill PSP@1 70.14 66.34 64.24 65.79 66.23 62.53 66.88 66.67 66.06 66.06 65.71 66.51 - 62.23 - 64.3
PSP@3 72.76 65.11 62.73 64.07 65.28 58.97 65.90 65.43 63.83 63.53 66.23 65.21 - 59.85 - 61.3
PSP@5 74.02 63.62 59.92 61.89 63.70 53.23 64.90 64.30 61.11 59.38 66.14 64.30 - 54.03 - 60.8

EURLex-4K PSP@1 34.25 24.10 31.16 28.60 24.97 24.94 43.86 26.62 33.17 37.97 - 36.36 41.20 38.28 37.61 45.2
PSP@3 39.83 27.20 34.85 32.49 27.46 27.19 45.72 34.16 39.68 44.01 - 44.04 45.40 42.00 46.05 48.5
PSP@5 42.76 29.09 36.82 34.46 25.04 28.90 46.97 38.96 41.99 46.17 - 48.29 49.30 44.89 50.79 51.0

Table 2: Propensity Scored Precision@k on the small scale datasets


Dataset Propensity Scored nDCG@k Embedding Based Tree Based Other
SLEEC [1] LEML [5] WSABIE [11] CPLST [9] CS [6] ML-CSSP [8] PfastreXML [31] FastXML [2] LPSR [4] 1-vs-All [18] kNN Parabel [36] DiSMEC [32] PD-Sparse [33] ProXML [39]

Bibtex PSnDCG@1 51.12 47.97 43.39 48.17 46.04 32.38 52.28 48.54 49.20 48.84 43.71 50.88 - 48.34 50.1
PSnDCG@3 52.99 50.25 43.64 49.94 45.25 36.73 53.62 51.11 49.78 51.62 45.04 51.90 - 48.49 52.0
PSnDCG@5 56.04 53.59 46.50 52.96 46.89 40.74 56.99 54.38 52.41 55.09 48.20 54.58 - 50.72 55.1

Delicious PSnDCG@1 32.11 30.73 31.25 31.10 30.60 29.48 34.57 32.35 31.34 31.95 31.03 32.69 - 25.22 -
PSnDCG@3 32.93 32.01 31.84 32.07 31.54 30.10 34.71 34.00 32.29 32.95 31.76 33.69 - 24.80 -
PSnDCG@5 33.41 32.66 32.18 32.55 31.89 29.98 35.42 34.73 32.50 33.17 32.09 34.10 - 24.25 -

Mediamill PSnDCG@1 70.14 66.34 64.24 65.79 66.23 62.53 66.88 66.08 66.06 66.06 65.71 66.51 - 62.25 64.3
PSnDCG@3 72.31 65.79 63.47 64.88 65.89 60.33 66.47 66.08 64.83 64.63 66.39 65.91 - 61.05 63.6
PSnDCG@5 73.13 64.71 61.57 63.36 64.77 56.50 65.71 65.24 62.94 61.84 66.27 65.20 - 57.26 62.8

EURLex-4K PSnDCG@1 34.25 24.10 31.16 28.60 24.97 25.94 43.86 26.62 33.17 37.97 - 36.36 41.20 38.28 45.2
PSnDCG@3 38.35 26.37 33.85 31.45 26.82 26.56 45.23 32.07 37.92 42.44 - 41.95 44.30 40.96 47.5
PSnDCG@5 40.30 27.62 35.17 32.77 25.57 27.67 46.03 35.23 39.55 43.97 - 44.78 46.90 42.84 49.1

Table 3: Propensity Scored nDCG@k on the small scale datasets


Dataset Propensity Scored Precision@k SLEEC [1] LEML [5] PfastreXML [31] FastXML [2] LPSR-NB [4] Parabel [36] DiSMEC [32] PD-Sparse [33] PPD-Sparse [34] ProXML [39]

AmazonCat-13K PSP@1 46.75 - 69.52 48.31 - 50.93 59.10 49.58 - -
PSP@3 58.46 - 73.22 60.26 - 64.00 67.10 61.63 - -
PSP@5 65.96 - 75.48 69.30 - 72.08 71.20 68.23 - -

Wiki10-31K PSP@1 11.14 9.41 19.02 9.80 12.79 11.66 13.60 - - -
PSP@3 11.86 10.07 18.34 10.17 12.26 12.73 13.10 - - -
PSP@5 12.40 10.55 18.43 10.54 12.13 13.68 13.80 - - -

Delicious-200K PSP@1 7.17 6.06 3.15 6.48 3.24 7.25 6.5 5.29 - -
PSP@3 8.16 7.24 3.87 7.52 3.42 7.94 7.6 5.80 - -
PSP@5 8.96 8.10 4.43 8.31 3.64 8.52 8.4 6.24 - -

WikiLSHTC-325K PSP@1 20.27 3.48 30.66 16.35 6.93 26.76 29.1 28.34 27.47 34.8
PSP@3 23.18 3.79 31.55 20.99 7.21 33.27 35.6 33.50 33.00 37.7
PSP@5 25.08 4.27 33.12 23.56 7.86 37.36 39.5 36.62 36.29 41.0

Amazon-670K PSP@1 20.62 2.07 29.30 19.37 16.68 25.43 27.8 - 26.64 30.8
PSP@3 23.32 2.26 30.80 23.26 18.07 29.43 30.6 - 30.65 32.8
PSP@5 25.98 2.47 32.43 26.85 19.43 32.85 34.2 - 34.65 35.1

Ads-1M PSP@1 10.75 - 15.81 12.69 6.91 10.63 - - - -
PSP@3 15.87 - 20.02 16.42 10.17 16.27 -7 - - -
PSP@5 19.11 - 22.68 18.44 12.41 20.08 - - - -

Amazon-3M PSP@1 - - 21.38 9.77 - 12.82 - - - -
PSP@3 - - 23.22 11.69 - 15.61 - - - -
PSP@5 - - 24.52 13.25 - 17.73 - - - -

Ads-9M PSP@1 - - 13.52 12.89 - 6.54 - - - -
PSP@3 - - 17.95 15.88 - 10.81 - - - -
PSP@5 - - 20.50 17.26 - 13.79 - - - -

Table 4: Propensity Scored Precision@k on the large scale datasets


Dataset Propensity Scored nDCG@k SLEEC [1] LEML [5] PfastreXML [31] FastXML [2] LPSR-NB [4] Parabel [36] DiSMEC [32] PD-Sparse [33] ProXML [39]

AmazonCat-13K PSnDCG@1 46.75 - 69.52 48.31 - 50.93 59.10 49.58 -
PSnDCG@3 55.19 - 72.21 56.90 - 60.37 65.20 58.28 -
PSnDCG@5 60.08 - 73.67 62.75 - 65.68 68.80 62.68 -

Wiki10-31K PSnDCG@1 11.14 9.41 19.02 9.80 12.79 11.66 13.60 - -
PSnDCG@3 11.68 9.90 18.49 10.08 12.38 12.48 13.20 - -
PSnDCG@5 12.06 10.24 18.52 10.33 12.27 13.13 13.60 - -

Delicious-200K PSnDCG@1 7.17 6.06 3.15 6.51 3.24 7.25 6.5 5.29 -
PSnDCG@3 7.89 6.93 3.68 7.26 3.37 7.75 7.5 5.66 -
PSnDCG@5 8.44 7.52 4.06 7.79 3.52 8.15 7.9 5.96 -

WikiLSHTC-325K PSnDCG@1 20.27 3.48 30.66 16.35 6.93 26.76 29.1 28.34 34.8
PSnDCG@3 22.27 3.68 31.24 19.56 7.11 31.26 35.9 31.92 38.7
PSnDCG@5 23.35 3.94 32.09 21.02 7.46 33.57 39.4 33.68 41.5

Amazon-670K PSnDCG@1 20.62 2.07 29.30 19.37 16.68 25.43 27.8 - 30.8
PSnDCG@3 22.63 2.21 30.40 22.25 17.70 28.38 28.8 - 31.7
PSnDCG@5 24.43 2.35 31.49 24.69 18.63 30.71 30.7 - 32.7

Ads-1M PSnDCG@1 10.75 - 15.81 12.69 6.91 10.63 - - -
PSnDCG@3 14.03 - 18.54 15.12 9.02 14.28 - - -
PSnDCG@5 15.67 - 19.93 16.18 10.18 16.26 - - -

Amazon-3M PSnDCG@1 - - 21.38 9.77 - 12.82 - - -
PSnDCG@3 - - 22.75 11.20 - 14.89 - - -
PSnDCG@5 - - 23.68 12.29 - 16.38 - - -

Ads-9M PSnDCG@1 - - 13.52 12.89 - 6.54 - - -
PSnDCG@3 - - 16.43 14.86 - 9.26 - - -
PSnDCG@5 - - 17.79 15.61 - 10.76 - - -

Table 5: Propensity Scored nDCG@k on the large scale datasets


Dataset Precision@k Embedding Based Tree Based Other
SLEEC [1] LEML [5] WSABIE [11] CPLST [9] CS [6] ML-CSSP [8] PfastreXML [31] FastXML [2] LPSR [4] 1-vs-All [18] kNN Parabel [36] DiSMEC [32] PD-Sparse [33] PPD-Sparse [34] ProXML [39]

Bibtex P@1 65.08 62.54 54.78 62.38 58.87 44.98 63.46 63.42 62.11 62.62 57.04 64.53 - 61.29 - 64.6
P@3 39.64 38.41 32.39 37.84 33.53 30.43 39.22 39.23 36.65 39.09 34.38 38.56 - 35.82 - 39.0
P@5 28.87 28.21 23.98 27.62 23.72 23.53 29.14 28.86 26.53 28.79 25.44 27.94 - 25.74 - 28.2

Delicious P@1 67.59 65.67 64.13 65.31 61.36 63.04 67.13 69.61 65.01 65.02 64.95 67.44 - 51.82 - -
P@3 61.38 60.55 58.13 59.95 56.46 56.26 62.33 64.12 58.96 58.88 58.89 61.83 - 44.18 - -
P@5 56.56 56.08 53.64 55.31 52.07 50.16 58.62 59.27 53.49 53.28 54.11 56.75 - 38.95 - -

Mediamill P@1 87.82 84.01 81.29 83.35 83.82 78.95 83.98 84.22 83.57 83.57 82.97 83.91 - 81.86 - 86.5
P@3 73.45 67.20 64.74 66.18 67.32 60.93 67.37 67.33 65.78 65.50 67.91 67.12 - 62.52 39.0 68.4
P@5 59.17 52.80 49.83 51.46 52.80 44.27 53.02 53.04 49.97 48.57 54.23 52.99 - 45.11 28.2 53.2

EURLex-4K P@1 79.26 63.40 68.55 72.28 58.52 62.09 75.45 71.36 76.37 79.89 - 81.73 82.40 76.43 83.83 83.4
P@3 64.30 50.35 55.11 58.16 45.51 48.39 62.70 59.90 63.36 66.01 - 68.78 68.50 60.37 70.72 70.9
P@5 52.33 41.28 45.12 47.73 32.47 40.11 52.51 50.39 52.03 53.80 - 57.44 57.70 49.72 59.21 59.1

Table 6: Precision@k on the small scale datasets


Dataset nDCG@k Embedding Based Tree Based Other
SLEEC [1] LEML [5] WSABIE [11] CPLST [9] CS [6] ML-CSSP [8] PfastreXML [31] FastXML [2] LPSR [4] 1-vs-All [18] kNN Parabel [36] DiSMEC [32] PD-Sparse [33] ProXML [39]

Bibtex nDCG@1 65.08 62.54 54.78 62.38 58.87 44.98 63.46 63.42 62.11 62.62 57.04 64.53 - 61.29 64.4
nDCG@3 60.47 58.22 50.11 57.63 52.19 44.67 59.61 59.51 56.50 59.13 52.29 59.35 - 55.83 59.2
nDCG@5 62.64 60.53 52.39 59.71 53.25 47.97 62.12 61.70 58.23 61.58 54.64 61.06 - 57.35 61.5

Delicious nDCG@1 67.59 65.67 64.13 65.31 61.36 63.04 67.13 69.61 65.01 65.02 64.95 67.44 - 51.82 -
nDCG@3 62.87 61.77 59.59 61.16 57.66 57.91 63.48 65.47 60.45 60.43 60.32 63.15 - 46.00 -
nDCG@5 59.28 58.47 56.25 57.80 54.44 53.36 60.74 61.90 56.38 56.28 56.77 59.41 - 42.02 -

Mediamill nDCG@1 87.82 84.01 81.29 83.35 83.82 78.95 83.98 84.22 83.57 83.57 82.97 83.91 - 81.86 86.5
nDCG@3 81.50 75.23 72.92 74.21 75.29 68.97 75.31 75.41 74.06 73.84 75.44 75.22 - 70.21 77.3
nDCG@5 79.22 71.96 69.37 70.55 71.92 62.88 72.21 72.37 69.34 68.18 72.83 72.21 - 63.71 75.6

EURLex-4K nDCG@1 79.26 63.40 68.55 72.28 58.52 62.09 75.45 71.36 76.37 79.89 - 81.73 82.40 76.43 83.4
nDCG@3 68.13 53.56 58.44 61.64 48.67 51.63 65.97 62.87 66.63 69.62 - 72.15 72.50 64.31 74.4
nDCG@5 61.60 48.47 53.03 55.92 40.79 47.11 60.78 58.06 60.61 63.04 - 66.40 66.70 58.78 68.2

Table 7: nDCG@k on the small scale datasets


Dataset Precision@k SLEEC [1] LEML [5] PfastreXML [31] FastXML [2] LPSR-NB [4] Parabel [36] DiSMEC [32] PD-Sparse [33] PPD-Sparse [34] ProXML [39]

AmazonCat-13K P@1 90.53 - 91.75 93.11 - 93.03 93.40 90.60 - -
P@3 76.33 - 77.97 78.2 - 79.16 79.10 75.14 - -
P@5 61.52 - 63.68 63.41 - 64.52 64.10 60.69 - -

Wiki10-31K P@1 85.88 73.47 83.57 83.03 72.72 84.31 85.20 - - -
P@3 72.98 62.43 68.61 67.47 58.51 72.57 74.60 - - -
P@5 62.70 54.35 59.10 57.76 49.50 63.39 65.90 - - -

Delicious-200K P@1 47.85 40.73 41.72 43.07 18.59 46.97 45.50 34.37 - -
P@3 42.21 37.71 37.83 38.66 15.43 40.08 38.70 29.48 - -
P@5 39.43 35.84 35.58 36.19 14.07 36.63 35.50 27.04 - -

WikiLSHTC-325K P@1 54.83 19.82 56.05 49.75 27.44 65.04 64.40 61.26 64.08 63.6
P@3 33.42 11.43 36.79 33.10 16.23 43.23 42.50 39.48 41.26 41.5
P@5 23.85 8.39 27.09 24.45 11.77 32.05 31.50 28.79 30.12 30.8

Amazon-670K P@1 35.05 8.13 39.46 36.99 28.65 44.89 44.70 - 45.32 43.5
P@3 31.25 6.83 35.81 33.28 24.88 39.80 39.70 - 40.37 38.7
P@5 28.56 6.03 33.05 30.53 22.37 36.00 36.10 - 36.92 35.3

Ads-1M P@1 22.03 - 21.70 24.03 17.95 23.23 - - - -
P@3 13.71 - 14.17 14.71 11.98 15.87 - - - -
P@5 10.33 - 10.97 10.85 9.33 12.48 - - - -

Amazon-3M P@1 - - 43.83 44.24 - 47.48 - - - -
P@3 - - 41.81 40.83 - 44.65 - - - -
P@5 - - 40.09 38.59 - 42.53 - - - -

Ads-9M P@1 - - 15.57 15.11 - 17.10 - - - -
P@3 - - 10.15 9.10 - 11.83 - - - -
P@5 - - 7.73 6.62 - 9.40 - - - -

Table 8: Precision@k on the large scale datasets


Dataset nDCG@k SLEEC [1] LEML [5] PfastreXML [31] FastXML [2] LPSR-NB [4] Parabel [36] DiSMEC [32] PD-Sparse [33] PPD-Sparse [34] ProXML [39]

AmazonCat-13K nDCG@1 90.53 - 91.75 93.11 - 93.03 93.40 90.60 - -
nDCG@3 84.96 - 86.48 87.07 - 87.72 87.70 84.00 - -
nDCG@5 82.77 - 84.96 85.16 - 86.00 85.80 82.05 - -

Wiki10-31K nDCG@1 85.88 73.47 83.57 84.31 72.72 83.03 84.10 - - -
nDCG@3 76.02 64.92 72.00 75.35 61.71 71.01 77.10 - - -
nDCG@5 68.13 58.69 64.54 63.36 54.63 68.30 70.40 - - -

Delicious-200K nDCG@1 47.85 40.73 41.72 43.07 18.59 46.97 45.50 34.37 - -
nDCG@3 43.52 38.44 38.76 39.70 16.17 41.72 40.90 30.60 - -
nDCG@5 41.37 37.01 37.08 37.83 15.13 39.07 37.80 28.65 - -

WikiLSHTC-325K nDCG@1 54.83 19.82 56.05 49.75 27.44 65.04 64.40 61.26 - 63.8
nDCG@3 47.25 14.52 50.59 45.23 23.04 59.15 58.50 55.08 - 57.4
nDCG@5 46.16 13.73 50.13 44.75 22.55 58.93 58.40 54.67 - 57.1

Amazon-670K nDCG@1 34.77 8.13 39.46 36.99 28.65 44.89 44.70 - - 43.5
nDCG@3 32.74 7.30 37.78 35.11 26.40 42.14 42.10 - - 41.1
nDCG@5 31.53 6.85 36.69 33.86 25.03 40.36 40.50 - - 39.7

Ads-1M nDCG@1 22.03 - 21.70 24.03 17.95 23.23 - - - -
nDCG@3 24.32 - 24.09 25.02 19.50 26.13 - - - -
nDCG@5 25.80 - 25.68 25.82 20.65 28.04 - - - -

Amazon-3M nDCG@1 - - 43.83 44.24 - 47.48 - - - -
nDCG@3 - - 42.68 41.92 - 45.73 - - - -
nDCG@5 - - 41.75 40.47 - 44.53 - - - -

Ads-9M nDCG@1 - - 15.57 15.11 - 17.10 - - - -
nDCG@3 - - 17.24 15.58 - 19.89 - - - -
nDCG@5 - - 18.29 16.01 - 21.79 - - - -

Table 9: nDCG@k on the large scale datasets

References

[01]     K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain, Sparse Local Embeddings for Extreme Multi-label Classification, in NIPS, 2015.

[02]     Y. Prabhu, and M. Varma, FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning, in KDD, 2014.

[03]     R. Agrawal, A. Gupta , Y. Prabhu, and M. Varma, Multi-Label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages, in WWW, 2013.

[04]     J. Weston, A. Makadia, and H. Yee, Label Partitioning For Sublinear Ranking, in ICML, 2013.

[05]     H. Yu, P. Jain, P. Kar, and I. Dhillon, Large-scale Multi-label Learning with Missing Labels, in ICML, 2014.

[06]     D. Hsu, S. Kakade, J. Langford, and T. Zhang, Multi-Label Prediction via Compressed Sensing , in NIPS, 2009.

[07]     F. Tai, and H. Lin, Multi-label Classification with Principle Label Space Transformation , in Neural Computation, 2012.

[08]     W. Bi, and J. Kwok, Efficient Multi-label Classification with Many Labels , in ICML, 2013.

[09]     Y. Chen, and H. Lin, Feature-aware Label Space Dimension Reduction for Multi-label Classification , in NIPS, 2012.

[10]     C. Ferng, and H. Lin, Multi-label Classification with Error-correcting Codes, in ACML, 2011.

[11]     J. Weston, S. Bengio, and N. Usunier, WSABIE: Scaling Up To Large Vocabulary Image Annotation , in IJCAI, 2011.

[12]     S. Ji, L. Tang, S. Yu, and J. Ye, Extracting Shared Subspaces for Multi-label Classification , in KDD, 2008.

[13]     Z. Lin, G. Ding, M. Hu, and J. Wang, Multi-label Classification via Feature-aware Implicit Label Space Encoding , in ICML, 2014.

[14]     P. Mineiro, and N. Karampatziakis, Fast Label Embeddings via Randomized Linear Algebra, Preprint, 2015.

[15]     N. Karampatziakis, and P. Mineiro, Scalable Multilabel Prediction via Randomized Methods, Preprint, 2015.

[16]     K. Balasubramanian, and G. Lebanon, The Landmark Selection Method for Multiple Output Prediction , Preprint, 2012.

[17]     M. Cisse, N. Usunier, T. Artieres, and P. Gallinari, Robust Bloom Filters for Large Multilabel Classification Tasks , in NIPS, 2013.

[18]     B. Hariharan, S. Vishwanathan, and M. Varma, Efficient max-margin multi-label classification with applications to zero-shot learning, in Machine Learning Journal, 2012.

[19]     C. Snoek, M. Worring, J. van Gemert, J.-M. Geusebroek, and A. Smeulders, The challenge problem for automated detection of 101 semantic concepts in multimedia, in ACM Multimedia, 2006.

[20]     I. Katakis, G. Tsoumakas, and I. Vlahavas, Multilabel text classification for automated tag suggestion, in ECML/PKDD Discovery Challenge, 2008.

[21]     G. Tsoumakas, I. Katakis, and I. Vlahavas, Effective and efficient multilabel classification in domains with large number of labels, in ECML/PKDD 2008 Workshop on Mining Multidimensional Data, 2008.

[22]     J. Leskovec and A. Krevl, SNAP Datasets: Stanford large network dataset collection, 2014.

[23]     A. Zubiaga, Enhancing navigation on wikipedia with social tags, Preprint, 2009.

[24]     R. Wetzker, C. Zimmermann, and C. Bauckhage, Analyzing social bookmarking systems: A del.icio.us cookbook, in Mining Social Data (MSoDa) Workshop Proceedings, ECAI, 2008.

[25]     I. Partalas, A Kosmopoulos, N Baskiotis, T Artieres, G Paliouras, E Gaussier, I Androutsopoulos, M.-R. Amini and P Galinari, LSHTC: A Benchmark for Large-Scale Text Classification, Preprint , 2015

[26]     D. D. Lewis, Y. Yang, T. Rose, and F. Li, RCV1: A New Benchmark Collection for Text Categorization Research in Journal of Machine Learning Research, 2004.

[27]     E. L. Mencia, and J. Furnkranz, Efficient pairwise multilabel classification for large-scale problems in the legal domain in ECML/PKDD, 2008.

[28]     J. McAuley, and J. Leskovec, Hidden factors and hidden topics: understanding rating dimensions with review text in Proceedings of the 7th ACM conference on Recommender systems ACM, 2013.

[29]     J. McAuley, C. Targett, Q. Shi, and A. v. d. Hengel, Image-based Recommendations on Styles and Substitutes in International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015.

[30]     J. McAuley, R. Pandey, and J. Leskovec, Inferring networks of substitutable and complementary products in KDD, 2015.

[31]     H. Jain, Y. Prabhu, and M. Varma, Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications in KDD, 2016.

[32]     R. Babbar, and B. Schölkopf, DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification in WSDM, 2017.

[33]     I. E. H. Yen, X. Huang, K. Zhong, P. Ravikumar and I. S. Dhillon, PD-Sparse: A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification in ICML, 2016.

[34]     I. E. H. Yen, X. Huang, W. Dai, P. Ravikumar I. S. Dhillon and E. P. Xing, PPDSparse: A Parallel Primal-Dual Sparse Method for Extreme Classification in KDD, 2017.

[35]     K. Jasinska, K. Dembczynski, R. Busa-Fekete, K. Pfannschmidt, T. Klerx and E. Hullermeier, Extreme F-Measure Maximization using Sparse Probability Estimates in Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1435-1444, 2017.

[36]     Y. Prabhu, A. Kag, S. Harsola, R. Agrawal and M. Varma, Parabel: Partitioned Label Trees for Extreme Classification with Application to Dynamic Search Advertising in WWW, 2018.

[37]     A. Niculescu-Mizil and E. Abbasnejad, Label Filters for Large Scale Multilabel Classification in AISTATS, 2017.

[38]    H. Jain,  V. Balasubramanian,  B. Chunduri and M. Varma, Slice: Scalable linear extreme classifiers trained on 100 million labels for related searches, in WSDM 2019.

[39]     R. Babbar, and B. Schölkopf, Data Scarcity, Robustness and Extreme Multi-label Classification in Machine Learning Journal and European Conference on Machine Learning, 2019.

[40]    S. Khandagale,  H. Xiao and  R. Babbar, Bonsai - Diverse and Shallow Trees for Extreme Multi-label Classification, in ArXiv 2019.

[41]    W. Siblini,  F. Meyer and  P. Kuntz, CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning, in ICML 2018.

[42]    V. Gupta,  R. Wadbude,  N. Natarajan,  H. Karnick,  P. Jain and  P. Rai, Distributional Semantics meets Multi-Label Learning, in AAAI 2019.

[43]    I. Chalkidis,  E. Fergadiotis,  P. Malakasiotis,  N. Aletras and  I. Androutsopoulos, Extreme Multi-Label Legal Text Classification: A Case Study in EU Legislation, in Natural Legal Language Processing Workshop 2019.
[43b]    I. Chalkidis,  E. Fergadiotis,  P. Malakasiotis,  N. Aletras and  I. Androutsopoulos, EURLEX57K Dataset.
[44]    I. Chalkidis,  E. Fergadiotis,  P. Malakasiotis, and  I. Androutsopoulos, Large-Scale Multi-Label Text Classification on EU Legislation, in ACL 2019.
[45]    G. Tsoumakas,  E. Spyromitros-Xioufis,  J. Vilcek and  I. Vlahavas and Mulan: A Java Library for Multi-Label Learning, in JMLR 2011.