From dada2f51640768682a07331e7c244c0d25e9dc85 Mon Sep 17 00:00:00 2001 From: Moritz Laurer Date: Sat, 18 Jun 2022 09:27:15 +0000 Subject: [PATCH] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index c139fb1..3a8b4a9 100644 --- a/README.md +++ b/README.md @@ -61,7 +61,7 @@ print(prediction) ``` ### Training data -This model was trained on the XNLI development dataset and the MNLI train dataset. The XNLI development set consists of 2490 professionally translated texts for each of 15 languages (37350 in total) (see [this paper](https://arxiv.org/pdf/1809.05053.pdf)). Note that the XNLI contains a training set of 15 machine translated versions of the MNLI dataset for 15 languages, but due to quality issues with these machine translations, this model was only trained on the professional translations from the XNLI development set and the original English MNLI training set (392 702 texts). Not using machine translated texts can avoid overfitting the model to the 15 languages; avoids catastrophic forgetting of the other 85 languages mDeBERTa was pre-trained on; and significantly reduces training costs. +This model was trained on the XNLI development dataset and the MNLI train dataset. The XNLI development set consists of 2490 professionally translated texts from English to 14 other languages (37350 texts in total) (see [this paper](https://arxiv.org/pdf/1809.05053.pdf)). Note that the XNLI contains a training set of 15 machine translated versions of the MNLI dataset for 15 languages, but due to quality issues with these machine translations, this model was only trained on the professional translations from the XNLI development set and the original English MNLI training set (392 702 texts). Not using machine translated texts can avoid overfitting the model to the 15 languages; avoids catastrophic forgetting of the other 85 languages mDeBERTa was pre-trained on; and significantly reduces training costs. ### Training procedure mDeBERTa-v3-base-mnli-xnli was trained using the Hugging Face trainer with the following hyperparameters.