diff --git a/README.md b/README.md
index 0bdaa74..99bedee 100644
--- a/README.md
+++ b/README.md
@@ -1,7 +1,21 @@
 ---
 language: 
 - multilingual
-- en
+- en 
+- ar 
+- bg 
+- de 
+- el 
+- es 
+- fr 
+- hi 
+- ru 
+- sw 
+- th 
+- tr 
+- ur 
+- vu 
+- zh 
 tags:
 - zero-shot-classification
 - text-classification
@@ -10,8 +24,8 @@ tags:
 metrics:
 - accuracy
 datasets:
-- mnli
 - xnli
+- mnli
 pipeline_tag: zero-shot-classification
 widget:
 - text: "Angela Merkel ist eine Politikerin in Deutschland und Vorsitzende der CDU"
@@ -42,7 +56,7 @@ print(prediction)
 ```
 
 ### Training data
-This model was trained on the development set of the XNLI dataset and the MNLI dataset. The XNLI development set consists of 5010 professionally translated texts for each of 15 languages (see [this paper](https://arxiv.org/pdf/1809.05053.pdf)). Note that the XNLI train set also contains machine 15 machine translated versions of the MNLI dataset, but due to quality issues with these machine translations, the model was only trained on the XNLI development and the original English MNLI training set (392 702 texts). Not using machine translated texts can avoid overfitting the model to the 15 languages and avoid catastrophic forgetting of the other 85 languages mDeBERTa was pre-trained on. 
+This model was trained on the XNLI development dataset and the MNLI train dataset. The XNLI development set consists of 5010 professionally translated texts for each of 15 languages (see [this paper](https://arxiv.org/pdf/1809.05053.pdf)). Note that the XNLI contains a training set of 15 machine translated versions of the MNLI dataset for 15 languages, but due to quality issues with these machine translations, this model was only trained on the professional translations from the XNLI development set and the original English MNLI training set (392 702 texts). Not using machine translated texts can avoid overfitting the model to the 15 languages and avoid catastrophic forgetting of the other 85 languages mDeBERTa was pre-trained on. 
 
 ### Training procedure
 DeBERTa-v3-base-mnli was trained using the Hugging Face trainer with the following hyperparameters.
@@ -57,13 +71,12 @@ training_args = TrainingArguments(
 )
 ```
 ### Eval results
-The model was evaluated using the matched test set and achieves 0.90 accuracy.
+The model was evaluated on the XNLI test set. Note that if other multilingual models on the model hub claim performance of around 90% on languages other than English, the authors have most likely made a mistake during testing since non of the latest papers shows a multilingual average performance of more than a few points above 80% on XNLI (see [here](https://arxiv.org/pdf/2111.09543.pdf) or [here](https://arxiv.org/pdf/1911.02116.pdf). 
 
 average | ar | bg | de | el | en | es | fr | hi | ru | sw | th | tr | ur | vu | zh 
 ---------|----------|---------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------
 0.808 | 0.802 | 0.829 | 0.825 | 0.826 | 0.883 | 0.845 | 0.834 | 0.771 | 0.813 | 0.748 | 0.793 | 0.807 | 0.740 | 0.795 | 0.8116
 
-{'ar': 0.8017964071856287, 'bg': 0.8287425149700599, 'de': 0.8253493013972056, 'el': 0.8267465069860279, 'en': 0.8830339321357286, 'es': 0.8449101796407186, 'fr': 0.8343313373253493, 'hi': 0.7712574850299401, 'ru': 0.8127744510978044, 'sw': 0.7483033932135729, 'th': 0.792814371257485, 'tr': 0.8065868263473054, 'ur': 0.7403193612774451, 'vi': 0.7954091816367266, 'zh': 0.8115768463073852}
 
 ## Limitations and bias
 Please consult the original DeBERTa-V3 paper and literature on different NLI datasets for potential biases.