diff --git a/README.md b/README.md index 3d5e894..631f277 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,17 @@ Model was trained on wikiner-fr dataset (~170 634 sentences). Model was validated on emails/chat data and overperformed other models on this type of data specifically. In particular the model seems to work better on entity that don't start with an upper case. +## Training data +Training data was classified as follow: + +Abbreviation|Description +-|- +O |Outside of a named entity +MISC |Miscellaneous entity +PER |Person’s name +ORG |Organization +LOC |Location + ## How to use camembert-ner with HuggingFace @@ -81,29 +92,23 @@ nlp("Apple est créée le 1er avril 1976 dans le garage de la maison d'enfance d ## Model performances (metric: seqeval) -Global -``` -'precision': 0.8859 -'recall': 0.8971 -'f1': 0.8914 -``` +Overall + +precision|recall|f1 +-|-|- +0.8859|0.8971|0.8914 By entity -``` -'LOC': {'precision': 0.8905576596578294, - 'recall': 0.900554675118859, - 'f1': 0.8955282684352223}, -'MISC': {'precision': 0.8175627240143369, - 'recall': 0.8117437722419929, - 'f1': 0.8146428571428571}, -'ORG': {'precision': 0.8099480326651819, - 'recall': 0.8265151515151515, - 'f1': 0.8181477315335584}, -'PER': {'precision': 0.9372509960159362, - 'recall': 0.959812321501428, - 'f1': 0.9483975005039308} - ``` +entity|precision|recall|f1 +-|-|-|- +PER|0.9372|0.9598|0.9483 +ORG|0.8099|0.8265|0.8181 +LOC|0.8905|0.9005|0.8955 +MISC|0.8175|0.8117|0.8146 + + + A short article on how I used the result of this model to train a LSTM model for signature detection in emails: https://medium.com/@jean-baptiste.polle/lstm-model-for-email-signature-detection-8e990384fefa