Compare commits

..

10 Commits

Author SHA1 Message Date
Jose Camacho Collados daefdd1f6a Add reference paper 2023-01-20 09:52:13 +00:00
Jose Camacho Collados 08b4d993d8 Add metadata, links 2023-01-11 21:33:18 +00:00
Jose Camacho Collados b636d90b2e Add link to sentiment-latest 2022-04-06 08:10:31 +00:00
Cardiff NLP d6af26ce06 Update language 2022-02-18 08:09:12 +00:00
Cardiff NLP 8956fe4147 Add labels 2022-01-20 17:02:25 +00:00
Patrick von Platen c8c5458081 upload flax model 2021-05-20 15:06:21 +00:00
Patrick von Platen b9aa737625 allow flax 2021-05-20 15:06:02 +00:00
Cardiff NLP ad3b2523ea Update README.md 2020-11-13 11:23:30 +00:00
Cardiff NLP 34a6247c25 Update README.md 2020-11-12 19:23:21 +00:00
Cardiff NLP 8eacbc59c0 Update README.md 2020-11-12 19:21:55 +00:00
3 changed files with 57 additions and 4 deletions

1
.gitattributes vendored
View File

@ -6,3 +6,4 @@
*.tar.gz filter=lfs diff=lfs merge=lfs -text *.tar.gz filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text *.ot filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text *.onnx filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text

View File

@ -1,8 +1,23 @@
# Twitter-roBERTa-base ---
datasets:
- tweet_eval
language:
- en
---
# Twitter-roBERTa-base for Sentiment Analysis
This is a roBERTa-base model trained on ~58M tweets and finetuned for the Sentiment Analysis task at Semeval 2018. This is a roBERTa-base model trained on ~58M tweets and finetuned for sentiment analysis with the TweetEval benchmark. This model is suitable for English (for a similar multilingual model, see [XLM-T](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment)).
For full description: [_TweetEval_ benchmark (Findings of EMNLP 2020)](https://arxiv.org/pdf/2010.12421.pdf).
To evaluate this and other models on Twitter-specific data, please refer to the [Tweeteval official repository](https://github.com/cardiffnlp/tweeteval). - Reference Paper: [_TweetEval_ (Findings of EMNLP 2020)](https://arxiv.org/pdf/2010.12421.pdf).
- Git Repo: [Tweeteval official repository](https://github.com/cardiffnlp/tweeteval).
<b>Labels</b>:
0 -> Negative;
1 -> Neutral;
2 -> Positive
<b>New!</b> We just released a new sentiment analysis model trained on more recent and a larger quantity of tweets.
See [twitter-roberta-base-sentiment-latest](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest) and [TweetNLP](https://tweetnlp.org) for more details.
## Example of classification ## Example of classification
@ -15,6 +30,17 @@ from scipy.special import softmax
import csv import csv
import urllib.request import urllib.request
# Preprocess text (username and link placeholders)
def preprocess(text):
new_text = []
for t in text.split(" "):
t = '@user' if t.startswith('@') and len(t) > 1 else t
t = 'http' if t.startswith('http') else t
new_text.append(t)
return " ".join(new_text)
# Tasks: # Tasks:
# emoji, emotion, hate, irony, offensive, sentiment # emoji, emotion, hate, irony, offensive, sentiment
# stance/abortion, stance/atheism, stance/climate, stance/feminist, stance/hillary # stance/abortion, stance/atheism, stance/climate, stance/feminist, stance/hillary
@ -37,6 +63,7 @@ model = AutoModelForSequenceClassification.from_pretrained(MODEL)
model.save_pretrained(MODEL) model.save_pretrained(MODEL)
text = "Good night 😊" text = "Good night 😊"
text = preprocess(text)
encoded_input = tokenizer(text, return_tensors='pt') encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input) output = model(**encoded_input)
scores = output[0][0].detach().numpy() scores = output[0][0].detach().numpy()
@ -68,3 +95,25 @@ Output:
2) neutral 0.1458 2) neutral 0.1458
3) negative 0.0076 3) negative 0.0076
``` ```
### BibTeX entry and citation info
Please cite the [reference paper](https://aclanthology.org/2020.findings-emnlp.148/) if you use this model.
```bibtex
@inproceedings{barbieri-etal-2020-tweeteval,
title = "{T}weet{E}val: Unified Benchmark and Comparative Evaluation for Tweet Classification",
author = "Barbieri, Francesco and
Camacho-Collados, Jose and
Espinosa Anke, Luis and
Neves, Leonardo",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.148",
doi = "10.18653/v1/2020.findings-emnlp.148",
pages = "1644--1650"
}
```

BIN
flax_model.msgpack (Stored with Git LFS) Normal file

Binary file not shown.