Update README.md
This commit is contained in:
parent
12cca13ad4
commit
ea64d5f399
64
README.md
64
README.md
|
@ -24,16 +24,18 @@ Training data was classified as follow:
|
||||||
|
|
||||||
Abbreviation|Description
|
Abbreviation|Description
|
||||||
-|-
|
-|-
|
||||||
O| Outside of a named entity
|
O |Outside of a named entity
|
||||||
MISC | Miscellaneous entity
|
MISC |Miscellaneous entity
|
||||||
PER | Person’s name
|
PER |Person’s name
|
||||||
ORG | Organization
|
ORG |Organization
|
||||||
LOC | Location
|
LOC |Location
|
||||||
|
|
||||||
In order to simplify, the prefix B- or I- from original conll2003 was removed.
|
In order to simplify, the prefix B- or I- from original conll2003 was removed.
|
||||||
I used the train and test dataset from original conll2003 for training and the "validation" dataset for validation. This resulted in a dataset of size:
|
I used the train and test dataset from original conll2003 for training and the "validation" dataset for validation. This resulted in a dataset of size:
|
||||||
Train | 17494
|
|
||||||
Validation | 3250
|
Train | Validation
|
||||||
|
-|-
|
||||||
|
17494 | 3250
|
||||||
|
|
||||||
## How to use camembert-ner with HuggingFace
|
## How to use camembert-ner with HuggingFace
|
||||||
|
|
||||||
|
@ -90,31 +92,31 @@ nlp("Apple was founded in 1976 by Steve Jobs, Steve Wozniak and Ronald Wayne to
|
||||||
## Model performances
|
## Model performances
|
||||||
|
|
||||||
Model performances computed on conll2003 validation dataset (computed on the tokens predictions)
|
Model performances computed on conll2003 validation dataset (computed on the tokens predictions)
|
||||||
```
|
|
||||||
entity | precision | recall | f1
|
entity|precision|recall|f1
|
||||||
- | - | - | -
|
-|-|-|-
|
||||||
PER | 0.9914 | 0.9927 | 0.9920
|
PER|0.9914|0.9927|0.9920
|
||||||
ORG | 0.9627 | 0.9661 | 0.9644
|
PER|0.9914|0.9927|0.9920
|
||||||
LOC | 0.9795 | 0.9862 | 0.9828
|
ORG|0.9627|0.9661|0.9644
|
||||||
MISC | 0.9292 | 0.9262 | 0.9277
|
LOC|0.9795|0.9862|0.9828
|
||||||
Overall | 0.9740 | 0.9766 | 0.9753
|
MISC|0.9292|0.9262|0.9277
|
||||||
```
|
Overall|0.9740|0.9766|0.9753
|
||||||
|
|
||||||
|
|
||||||
On private dataset (email, chat, informal discussion), computed on word predictions:
|
On private dataset (email, chat, informal discussion), computed on word predictions:
|
||||||
```
|
|
||||||
entity | precision | recall | f1
|
|
||||||
- | - | - | -
|
|
||||||
PER | 0.8823 | 0.9116 | 0.8967
|
|
||||||
ORG | 0.7694 | 0.7292 | 0.7487
|
|
||||||
LOC | 0.8619 | 0.7768 | 0.8171
|
|
||||||
```
|
|
||||||
|
|
||||||
Spacy (en_core_web_trf-3.2.0) on the same private dataset was giving:
|
entity|precision|recall|f1
|
||||||
```
|
-|-|-|-
|
||||||
entity | precision | recall | f1
|
PER|0.8823|0.9116|0.8967
|
||||||
- | - | - | -
|
ORG|0.7694|0.7292|0.7487
|
||||||
PER | 0.9146 | 0.8287 | 0.8695
|
LOC|0.8619|0.7768|0.8171
|
||||||
ORG | 0.7655 | 0.6437 | 0.6993
|
|
||||||
LOC | 0.8727 | 0.6180 | 0.7236
|
By comparison on the same private dataset, Spacy (en_core_web_trf-3.2.0) was giving:
|
||||||
```
|
|
||||||
|
entity|precision|recall|f1
|
||||||
|
-|-|-|-
|
||||||
|
PER|0.9146|0.8287|0.8695
|
||||||
|
ORG|0.7655|0.6437|0.6993
|
||||||
|
LOC|0.8727|0.6180|0.7236
|
||||||
|
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue