diff --git a/README.md b/README.md index dd252ff..ea4fd5c 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,9 @@ # Twitter-roBERTa-base for Sentiment Analysis -This is a roBERTa-base model trained on ~58M tweets and finetuned for the Sentiment Analysis task at Semeval 2018. -For full description: [_TweetEval_ benchmark (Findings of EMNLP 2020)](https://arxiv.org/pdf/2010.12421.pdf). -To evaluate this and other models on Twitter-specific data, please refer to the [Tweeteval official repository](https://github.com/cardiffnlp/tweeteval). +This is a roBERTa-base model trained on ~58M tweets and finetuned for sentiment analysis with the TweetEval benchmark. + +- Paper: [_TweetEval_ benchmark (Findings of EMNLP 2020)](https://arxiv.org/pdf/2010.12421.pdf). +- Git Repo: [Tweeteval official repository](https://github.com/cardiffnlp/tweeteval). ## Example of classification @@ -18,6 +19,8 @@ import urllib.request # Preprocess text (username and link placeholders) def preprocess(text): new_text = [] + + for t in text.split(" "): t = '@user' if t.startswith('@') and len(t) > 1 else t t = 'http' if t.startswith('http') else t