Jean-Baptiste/camembert-ner is a forked repo from huggingface. License: mit
Go to file
JB Polle 6de88b8686 Update config.json
Test on labels
2021-08-29 20:43:08 +00:00
.gitattributes first release 2021-03-12 09:25:57 -05:00
README.md typo in readme 2021-04-27 23:39:34 -04:00
config.json Update config.json 2021-08-29 20:43:08 +00:00
pytorch_model.bin New improved model trained on full dataset 2021-04-27 23:35:49 -04:00
sentencepiece.bpe.model first release 2021-03-12 09:25:57 -05:00
special_tokens_map.json first release 2021-03-12 09:25:57 -05:00
tokenizer_config.json first release 2021-03-12 09:25:57 -05:00

README.md

language datasets widget
fr
Jean-Baptiste/wikiner_fr
text
Je m'appelle jean-baptiste et je vis à montréal

camembert-ner: model fine-tuned from camemBERT for NER task.

Introduction

[camembert-ner] is a NER model that was fine-tuned from camemBERT on wikiner-fr dataset. Model was trained on wikiner-fr dataset (~170 634 sentences). Model was validated on emails/chat data and overperformed other models on this type of data specifically. In particular the model seems to work better on entity that don't start with an upper case.

How to use camembert-ner with HuggingFace

Load camembert-ner and its sub-word tokenizer :
from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("Jean-Baptiste/camembert-ner")
model = AutoModelForTokenClassification.from_pretrained("Jean-Baptiste/camembert-ner")


##### Process text sample (from wikipedia)

from transformers import pipeline

nlp = pipeline('ner', model=model, tokenizer=tokenizer, grouped_entities=True)
nlp("Apple est créée le 1er avril 1976 dans le garage de la maison d'enfance de Steve Jobs à Los Altos en Californie par Steve Jobs, Steve Wozniak et Ronald Wayne14, puis constituée sous forme de société le 3 janvier 1977 à l'origine sous le nom d'Apple Computer, mais pour ses 30 ans et pour refléter la diversification de ses produits, le mot « computer » est retiré le 9 janvier 2015.")


[{'entity_group': 'ORG',
  'score': 0.9472818374633789,
  'word': 'Apple',
  'start': 0,
  'end': 5},
 {'entity_group': 'PER',
  'score': 0.9838564991950989,
  'word': 'Steve Jobs',
  'start': 74,
  'end': 85},
 {'entity_group': 'LOC',
  'score': 0.9831605950991312,
  'word': 'Los Altos',
  'start': 87,
  'end': 97},
 {'entity_group': 'LOC',
  'score': 0.9834540486335754,
  'word': 'Californie',
  'start': 100,
  'end': 111},
 {'entity_group': 'PER',
  'score': 0.9841555754343668,
  'word': 'Steve Jobs',
  'start': 115,
  'end': 126},
 {'entity_group': 'PER',
  'score': 0.9843501806259155,
  'word': 'Steve Wozniak',
  'start': 127,
  'end': 141},
 {'entity_group': 'PER',
  'score': 0.9841533899307251,
  'word': 'Ronald Wayne',
  'start': 144,
  'end': 157},
 {'entity_group': 'ORG',
  'score': 0.9468960364659628,
  'word': 'Apple Computer',
  'start': 243,
  'end': 257}]

Model performances (metric: seqeval)

Global

'precision': 0.8859
'recall': 0.8971
'f1': 0.8914

By entity

'LOC': {'precision': 0.8905576596578294,
		'recall': 0.900554675118859,
		'f1': 0.8955282684352223},
'MISC': {'precision': 0.8175627240143369,
		 'recall': 0.8117437722419929,
		 'f1': 0.8146428571428571},
'ORG': {'precision': 0.8099480326651819,
		'recall': 0.8265151515151515,
		'f1': 0.8181477315335584},
'PER': {'precision': 0.9372509960159362,
		'recall': 0.959812321501428,
		'f1': 0.9483975005039308}