xlm-roberta-base-language-d.../README.md

---
license: mit
tags:
- generated_from_trainer
metrics:
- accuracy
- f1
model-index:
- name: xlm-roberta-base-language-detection
  results: []
---

# xlm-roberta-base-language-detection

This model is a fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) on the [Language Identification](https://huggingface.co/datasets/papluca/language-identification#additional-information) dataset.

## Intended uses & limitations

You can directly use this model as a language detector, i.e. for sequence classification tasks. Currently, it supports the following 20 languages: 

`arabic (ar), bulgarian (bg), german (de), modern greek (el), english (en), spanish (es), french (fr), hindi (hi), italian (it), japanese (ja), dutch (nl), polish (pl), portuguese (pt), russian (ru), swahili (sw), thai (th), turkish (tr), urdu (ur), vietnamese (vi), and chinese (zh)`

## Training and evaluation data

It achieves the following results on the evaluation set:
- Loss: 0.0103
- Accuracy: 0.9977
- F1: 0.9977

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 64
- eval_batch_size: 128
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 2
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1     |
|:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|
| 0.2492        | 1.0   | 1094 | 0.0149          | 0.9969   | 0.9969 |
| 0.0101        | 2.0   | 2188 | 0.0103          | 0.9977   | 0.9977 |


### Framework versions

- Transformers 4.12.5
- Pytorch 1.10.0+cu111
- Datasets 1.15.1
- Tokenizers 0.10.3
Create README.md 2021-11-23 09:52:22 +00:00			`---`
			`license: mit`
			`tags:`
			`- generated_from_trainer`
			`metrics:`
			`- accuracy`
			`- f1`
			`model-index:`
			`- name: xlm-roberta-base-language-detection`
			`results: []`
			`---`

			`# xlm-roberta-base-language-detection`

Add dataset info 2021-11-24 22:19:10 +00:00			`This model is a fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) on the [Language Identification](https://huggingface.co/datasets/papluca/language-identification#additional-information) dataset.`
Create README.md 2021-11-23 09:52:22 +00:00
			`## Intended uses & limitations`

Add some details to README file 2021-11-23 20:13:39 +00:00			`You can directly use this model as a language detector, i.e. for sequence classification tasks. Currently, it supports the following 20 languages:`

			`arabic (ar), bulgarian (bg), german (de), modern greek (el), english (en), spanish (es), french (fr), hindi (hi), italian (it), japanese (ja), dutch (nl), polish (pl), portuguese (pt), russian (ru), swahili (sw), thai (th), turkish (tr), urdu (ur), vietnamese (vi), and chinese (zh)`
Create README.md 2021-11-23 09:52:22 +00:00
			`## Training and evaluation data`

Add some details to README file 2021-11-23 20:13:39 +00:00			`It achieves the following results on the evaluation set:`
			`- Loss: 0.0103`
			`- Accuracy: 0.9977`
			`- F1: 0.9977`
Create README.md 2021-11-23 09:52:22 +00:00
			`## Training procedure`

			`### Training hyperparameters`

			`The following hyperparameters were used during training:`
			`- learning_rate: 2e-05`
			`- train_batch_size: 64`
			`- eval_batch_size: 128`
			`- seed: 42`
			`- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08`
			`- lr_scheduler_type: linear`
			`- num_epochs: 2`
			`- mixed_precision_training: Native AMP`

			`### Training results`

			`\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \| F1 \|`
			`\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|:------:\|`
			`\| 0.2492 \| 1.0 \| 1094 \| 0.0149 \| 0.9969 \| 0.9969 \|`
			`\| 0.0101 \| 2.0 \| 2188 \| 0.0103 \| 0.9977 \| 0.9977 \|`


			`### Framework versions`

			`- Transformers 4.12.5`
			`- Pytorch 1.10.0+cu111`
			`- Datasets 1.15.1`
			`- Tokenizers 0.10.3`