New improved model trained on full dataset

This commit is contained in:
jeanpoll 2021-04-27 23:35:49 -04:00
parent 6de73976f0
commit 58670260a4
2 changed files with 22 additions and 19 deletions

View File

@ -3,7 +3,7 @@ language: fr
datasets:
- Jean-Baptiste/wikiner_fr
widget:
- text: "Je m'appelle Jean-Baptiste et je vis à Paris"
- text: "Je m'appelle jean-baptiste et je vis à montréal"
---
# camembert-ner: model fine-tuned from camemBERT for NER task.
@ -11,7 +11,9 @@ widget:
## Introduction
[camembert-ner] is a NER model that was fine-tuned from camemBERT on wikiner-fr dataset.
Model was trained on subset of wikiner-fr dataset (~36 000 sentences)
Model was trained on wikiner-fr dataset (~170 634 sentences).
Model was validated on emails/chat data and surperformed other models on this type of data specifically.
In particular the model seems to work better on entity that don't start with an upper case.
## How to use camembert-ner with HuggingFace
@ -81,24 +83,25 @@ nlp("Apple est créée le 1er avril 1976 dans le garage de la maison d'enfance d
Global
```
'precision': 0.8830965723967158
'recall': 0.8915789473684211
'f1': 0.8873174883781837
'precision': 0.8859
'recall': 0.8971
'f1': 0.8914
```
By entity
```
'LOC': {'precision': 0.8701754385964913,
'recall': 0.8878281622911695,
'f1': 0.8789131718842291},
'MISC': {'precision': 0.831053901850362,
'recall': 0.815955766192733,
'f1': 0.823435631725787},
'ORG': {'precision': 0.8620199146514936,
'recall': 0.8335625859697386,
'f1': 0.8475524475524475},
'PER': {'precision': 0.9367143476376246,
'recall': 0.9583148558758315,
'f1': 0.947391494958}
'LOC': {'precision': 0.8905576596578294,
'recall': 0.900554675118859,
'f1': 0.8955282684352223},
'MISC': {'precision': 0.8175627240143369,
'recall': 0.8117437722419929,
'f1': 0.8146428571428571},
'ORG': {'precision': 0.8099480326651819,
'recall': 0.8265151515151515,
'f1': 0.8181477315335584},
'PER': {'precision': 0.9372509960159362,
'recall': 0.959812321501428,
'f1': 0.9483975005039308}
```

BIN
pytorch_model.bin (Stored with Git LFS)

Binary file not shown.