Camembert-base model fine-tuned on french part of XLNI dataset.
One of the few Zero-Shot classification model working on french 🇫🇷
Intended uses & limitations
How to use
Two different usages :
As a Zero-Shot sequence classifier :
classifier=pipeline("zero-shot-classification",model="BaptisteDoyen/camembert-base-xnli")sequence="L'équipe de France joue aujourd'hui au Parc des Princes"candidate_labels=["sport","politique","science"]hypothesis_template="Ce texte parle de {}."classifier(sequence,candidate_labels,hypothesis_template=hypothesis_template)# outputs : # {'sequence': "L'équipe de France joue aujourd'hui au Parc des Princes",# 'labels': ['sport', 'politique', 'science'],# 'scores': [0.8595073223114014, 0.10821866989135742, 0.0322740375995636]}
As a premise/hypothesis checker :
The idea is here to compute a probability of the form \(P(premise|hypothesis)\)
# load model and tokenizernli_model=AutoModelForSequenceClassification.from_pretrained("BaptisteDoyen/camembert-base-xnli")tokenizer=AutoTokenizer.from_pretrained("BaptisteDoyen/camembert-base-xnli")# sequencespremise="le score pour les bleus est élevé"hypothesis="L'équipe de France a fait un bon match"# tokenize and run through modelx=tokenizer.encode(premise,hypothesis,return_tensors='pt')logits=nli_model(x)[0]# we throw away "neutral" (dim 1) and take the probability of# "entailment" (0) as the probability of the label being true entail_contradiction_logits=logits[:,::2]probs=entail_contradiction_logits.softmax(dim=1)prob_label_is_true=probs[:,0]prob_label_is_true[0].tolist()*100# outputs# 86.40775084495544
Training data
Training data is the french fold of the XLNI dataset released in 2018 by Facebook.
Available with great ease using the datasets library :