j-hartmann/emotion-english-distilroberta-base is a forked repo from huggingface. License: None
Go to file
Hartmann a51ab62183 Update README.md 2021-06-28 07:30:44 +00:00
.gitattributes initial commit 2021-06-15 09:43:00 +00:00
README.md Update README.md 2021-06-28 07:30:44 +00:00
config.json updated labels 2021-06-15 10:33:11 +00:00
merges.txt initial commit 2021-06-15 09:55:44 +00:00
pytorch_model.bin initial commit 2021-06-15 09:55:44 +00:00
special_tokens_map.json initial commit 2021-06-15 09:55:44 +00:00
tokenizer.json initial commit 2021-06-15 09:55:44 +00:00
tokenizer_config.json initial commit 2021-06-15 09:55:44 +00:00
training_args.bin initial commit 2021-06-15 09:55:44 +00:00
vocab.json initial commit 2021-06-15 09:55:44 +00:00

README.md

language tags widget
en
distilroberta
sentiment
emotion
twitter
reddit
text
Oh wow. I didn't know that.
text
This movie always makes me cry..
text
Oh Happy Day

Description

With this model, you can classify emotions in English text data. The model was trained on 6 diverse datasets (see Appendix below) and predicts Ekman's 6 basic emotions, plus a neutral class:

  1. anger 🤬
  2. disgust 🤢
  3. fear 😨
  4. joy 😀
  5. neutral 😐
  6. sadness 😭
  7. surprise 😲

The model is a fine-tuned checkpoint of DistilRoBERTa-base.

Application 🚀

a) Run emotion model with 3 lines of code on single text example using Hugging Face's pipeline command on Google Colab:

Open In Colab

b) Run emotion model on multiple examples and full datasets (e.g., .csv files) on Google Colab:

Open In Colab

Contact 💻

Please reach out to jochen.hartmann@uni-hamburg.de if you have any questions or feedback.

Thanks to Samuel Domdey and chrsiebert for their support in making this model available.

Appendix 📚

Please find an overview of the datasets used for training below. All datasets contain English text. The table summarizes which emotions are available in each of the datasets.

Name anger disgust fear joy neutral sadness surprise
Crowdflower (2016) Yes - - Yes Yes Yes Yes
Emotion Dataset, Elvis et al. (2018) Yes - Yes Yes - Yes Yes
GoEmotions, Demszky et al. (2020) Yes Yes Yes Yes Yes Yes Yes
ISEAR, Vikash (2018) Yes Yes Yes Yes - Yes -
MELD, Poria et al. (2019) Yes Yes Yes Yes Yes Yes Yes
SemEval-2018, EI-reg (Mohammad et al. 2018) Yes - Yes Yes - Yes -

The datasets represent a diverse collection of text types. Specifically, they contain emotion labels for texts from Twitter, Reddit, student self-reports, and utterances from TV dialogues. As MELD (Multimodal EmotionLines Dataset) extends the popular EmotionLines dataset, EmotionLines itself is not included here.