emotion-english-distilrober.../README.md

60 lines
2.4 KiB
Markdown
Raw Normal View History

2021-06-16 10:00:25 +00:00
---
2021-06-16 14:06:46 +00:00
language: "en"
tags:
2021-06-22 07:16:15 +00:00
- distilroberta
2021-06-16 14:06:46 +00:00
- sentiment
- emotion
- twitter
2021-06-21 06:30:19 +00:00
- reddit
2021-06-16 10:00:25 +00:00
widget:
2021-06-16 10:03:54 +00:00
- text: "Oh wow. I didn't know that."
2021-06-16 10:00:25 +00:00
- text: "This movie always makes me cry.."
2021-06-17 09:40:51 +00:00
- text: "Oh Happy Day"
2021-06-16 10:00:25 +00:00
---
2021-06-24 07:43:30 +00:00
## Description
2021-06-16 09:32:20 +00:00
2021-06-28 07:30:44 +00:00
With this model, you can classify emotions in English text data. The model was trained on 6 diverse datasets (see Appendix below) and predicts Ekman's 6 basic emotions, plus a neutral class:
2021-06-16 09:32:20 +00:00
2021-06-23 07:33:42 +00:00
1) anger 🤬
2) disgust 🤢
3) fear 😨
4) joy 😀
5) neutral 😐
6) sadness 😭
7) surprise 😲
2021-06-16 09:32:20 +00:00
2021-06-25 14:25:24 +00:00
The model is a fine-tuned checkpoint of [DistilRoBERTa-base](https://huggingface.co/distilroberta-base).
2021-06-16 09:35:11 +00:00
2021-06-24 07:43:30 +00:00
## Application 🚀
2021-06-16 09:32:20 +00:00
2021-06-16 09:48:55 +00:00
a) Run emotion model with 3 lines of code on single text example using Hugging Face's pipeline command on Google Colab:
2021-06-16 09:29:04 +00:00
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/j-hartmann/emotion-english-distilroberta-base/blob/main/simple_emotion_pipeline.ipynb)
2021-06-16 09:48:55 +00:00
b) Run emotion model on multiple examples and full datasets (e.g., .csv files) on Google Colab:
2021-06-16 09:29:04 +00:00
2021-06-16 09:34:10 +00:00
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/j-hartmann/emotion-english-distilroberta-base/blob/main/emotion_prediction_example.ipynb)
2021-06-24 07:43:30 +00:00
## Contact 💻
2021-06-16 09:34:10 +00:00
2021-06-16 09:40:03 +00:00
Please reach out to jochen.hartmann@uni-hamburg.de if you have any questions or feedback.
2021-06-17 09:05:48 +00:00
Thanks to Samuel Domdey and chrsiebert for their support in making this model available.
2021-06-24 07:43:30 +00:00
## Appendix 📚
2021-06-17 09:05:48 +00:00
2021-06-21 06:15:21 +00:00
Please find an overview of the datasets used for training below. All datasets contain English text. The table summarizes which emotions are available in each of the datasets.
2021-06-17 09:05:48 +00:00
2021-06-17 09:10:15 +00:00
|Name|anger|disgust|fear|joy|neutral|sadness|surprise|
|---|---|---|---|---|---|---|---|
2021-06-17 09:23:49 +00:00
|Crowdflower (2016)|Yes|-|-|Yes|Yes|Yes|Yes|
2021-06-21 06:27:54 +00:00
|Emotion Dataset, Elvis et al. (2018)|Yes|-|Yes|Yes|-|Yes|Yes|
2021-06-17 09:22:35 +00:00
|GoEmotions, Demszky et al. (2020)|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
2021-06-17 09:23:49 +00:00
|ISEAR, Vikash (2018)|Yes|Yes|Yes|Yes|-|Yes|-|
2021-06-17 09:17:20 +00:00
|MELD, Poria et al. (2019)|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
2021-06-21 06:19:15 +00:00
|SemEval-2018, EI-reg (Mohammad et al. 2018) |Yes|-|Yes|Yes|-|Yes|-|
2021-06-21 06:20:11 +00:00
The datasets represent a diverse collection of text types. Specifically, they contain emotion labels for texts from Twitter, Reddit, student self-reports, and utterances from TV dialogues. As MELD (Multimodal EmotionLines Dataset) extends the popular EmotionLines dataset, EmotionLines itself is not included here.