cardiffnlp/twitter-roberta-base-sentiment-latest is a forked repo from huggingface. License: None
Go to file
Jose Camacho Collados f8f82141e1 Add dataset 2023-01-14 05:48:40 +00:00
.gitattributes initial commit 2022-03-15 01:21:58 +00:00
README.md Add dataset 2023-01-14 05:48:40 +00:00
config.json Update config.json 2022-11-28 11:30:04 +00:00
merges.txt Adding merges file 2022-03-15 01:42:43 +00:00
pytorch_model.bin Adding tweeteval classifier 2022-03-15 01:24:25 +00:00
special_tokens_map.json Adding tweeteval classifier 2022-03-15 01:24:25 +00:00
tf_model.h5 Adding tweeteval classifier 2022-03-15 01:24:25 +00:00
vocab.json Adding tweeteval classifier 2022-03-15 01:24:25 +00:00

README.md

language widget datasets
en
text
Covid cases are increasing fast!
tweet_eval

Twitter-roBERTa-base for Sentiment Analysis - UPDATED (2022)

This is a RoBERTa-base model trained on ~124M tweets from January 2018 to December 2021, and finetuned for sentiment analysis with the TweetEval benchmark. The original Twitter-based RoBERTa model can be found here and the original reference paper is TweetEval. This model is suitable for English.

Labels: 0 -> Negative; 1 -> Neutral; 2 -> Positive

This sentiment analysis model has been integrated into TweetNLP. You can access the demo here.

Example Pipeline

from transformers import pipeline
sentiment_task = pipeline("sentiment-analysis", model=model_path, tokenizer=model_path)
sentiment_task("Covid cases are increasing fast!")
[{'label': 'Negative', 'score': 0.7236}]

Full classification example

from transformers import AutoModelForSequenceClassification
from transformers import TFAutoModelForSequenceClassification
from transformers import AutoTokenizer, AutoConfig
import numpy as np
from scipy.special import softmax
# Preprocess text (username and link placeholders)
def preprocess(text):
    new_text = []
    for t in text.split(" "):
        t = '@user' if t.startswith('@') and len(t) > 1 else t
        t = 'http' if t.startswith('http') else t
        new_text.append(t)
    return " ".join(new_text)
MODEL = f"cardiffnlp/twitter-roberta-base-sentiment-latest"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
config = AutoConfig.from_pretrained(MODEL)
# PT
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
#model.save_pretrained(MODEL)
text = "Covid cases are increasing fast!"
text = preprocess(text)
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
scores = output[0][0].detach().numpy()
scores = softmax(scores)
# # TF
# model = TFAutoModelForSequenceClassification.from_pretrained(MODEL)
# model.save_pretrained(MODEL)
# text = "Covid cases are increasing fast!"
# encoded_input = tokenizer(text, return_tensors='tf')
# output = model(encoded_input)
# scores = output[0][0].numpy()
# scores = softmax(scores)
# Print labels and scores
ranking = np.argsort(scores)
ranking = ranking[::-1]
for i in range(scores.shape[0]):
    l = config.id2label[ranking[i]]
    s = scores[ranking[i]]
    print(f"{i+1}) {l} {np.round(float(s), 4)}")

Output:

1) Negative 0.7236
2) Neutral 0.2287
3) Positive 0.0477