Compare commits

..

10 Commits

Author SHA1 Message Date
Lysandre d5411c3ee9 Adding `safetensors` variant of this model (#2)
- Adding `safetensors` variant of this model (5cbd7d214707beb941608d50e734179140c5f93f)


Co-authored-by: Nicolas Patry <Narsil@users.noreply.huggingface.co>
2022-11-16 23:22:40 +00:00
sgugger c114932082 Update model card (#1)
- Update model card (ae7a275b1e90a24f2e25105eaeea6a2015953a45)


Co-authored-by: Marissa Gerchick <Marissa@users.noreply.huggingface.co>
2022-07-22 08:13:21 +00:00
Patrick von Platen ec58a5b7f7 upload flax model 2021-05-20 22:47:11 +00:00
Patrick von Platen 159dbe0874 allow flax 2021-05-20 22:46:55 +00:00
Julien Chaumond a547ce14d1 Migrate model card from transformers-repo
Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/distilroberta-base-README.md
2020-12-11 22:24:18 +01:00
Julien Chaumond 3124ec82ab For clarity, delete deprecated modelcard.json
We now use the README.md model card instead

Approved-by: Julien Chaumond <julien@huggingface.co>
2020-12-09 19:29:54 +01:00
Guillaume B d92efe08f9 addition of Rust model 2020-11-24 16:51:19 +01:00
system 2862d045b2 Update tokenizer.json 2020-10-12 12:56:43 +00:00
system 83205b1a91 Update config.json 2020-04-24 15:58:11 +00:00
system 7c2a3205fc Update config.json 2020-01-31 23:00:24 +00:00
8 changed files with 204 additions and 16 deletions

2
.gitattributes vendored
View File

@ -6,3 +6,5 @@
*.tar.gz filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
model.safetensors filter=lfs diff=lfs merge=lfs -text

185
README.md Normal file
View File

@ -0,0 +1,185 @@
---
language: en
tags:
- exbert
license: apache-2.0
datasets:
- openwebtext
---
# Model Card for DistilRoBERTa base
# Table of Contents
1. [Model Details](#model-details)
2. [Uses](#uses)
3. [Bias, Risks, and Limitations](#bias-risks-and-limitations)
4. [Training Details](#training-details)
5. [Evaluation](#evaluation)
6. [Environmental Impact](#environmental-impact)
7. [Citation](#citation)
8. [How To Get Started With the Model](#how-to-get-started-with-the-model)
# Model Details
## Model Description
This model is a distilled version of the [RoBERTa-base model](https://huggingface.co/roberta-base). It follows the same training procedure as [DistilBERT](https://huggingface.co/distilbert-base-uncased).
The code for the distillation process can be found [here](https://github.com/huggingface/transformers/tree/master/examples/distillation).
This model is case-sensitive: it makes a difference between english and English.
The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters (compared to 125M parameters for RoBERTa-base).
On average DistilRoBERTa is twice as fast as Roberta-base.
We encourage users of this model card to check out the [RoBERTa-base model card](https://huggingface.co/roberta-base) to learn more about usage, limitations and potential biases.
- **Developed by:** Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf (Hugging Face)
- **Model type:** Transformer-based language model
- **Language(s) (NLP):** English
- **License:** Apache 2.0
- **Related Models:** [RoBERTa-base model card](https://huggingface.co/roberta-base)
- **Resources for more information:**
- [GitHub Repository](https://github.com/huggingface/transformers/blob/main/examples/research_projects/distillation/README.md)
- [Associated Paper](https://arxiv.org/abs/1910.01108)
# Uses
## Direct Use and Downstream Use
You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=roberta) to look for fine-tuned versions on a task that interests you.
Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at model like GPT2.
## Out of Scope Use
The model should not be used to intentionally create hostile or alienating environments for people. The model was not trained to be factual or true representations of people or events, and therefore using the models to generate such content is out-of-scope for the abilities of this model.
# Bias, Risks, and Limitations
Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example:
```python
>>> from transformers import pipeline
>>> unmasker = pipeline('fill-mask', model='distilroberta-base')
>>> unmasker("The man worked as a <mask>.")
[{'score': 0.1237526461482048,
'sequence': 'The man worked as a waiter.',
'token': 38233,
'token_str': ' waiter'},
{'score': 0.08968018740415573,
'sequence': 'The man worked as a waitress.',
'token': 35698,
'token_str': ' waitress'},
{'score': 0.08387645334005356,
'sequence': 'The man worked as a bartender.',
'token': 33080,
'token_str': ' bartender'},
{'score': 0.061059024184942245,
'sequence': 'The man worked as a mechanic.',
'token': 25682,
'token_str': ' mechanic'},
{'score': 0.03804653510451317,
'sequence': 'The man worked as a courier.',
'token': 37171,
'token_str': ' courier'}]
>>> unmasker("The woman worked as a <mask>.")
[{'score': 0.23149248957633972,
'sequence': 'The woman worked as a waitress.',
'token': 35698,
'token_str': ' waitress'},
{'score': 0.07563332468271255,
'sequence': 'The woman worked as a waiter.',
'token': 38233,
'token_str': ' waiter'},
{'score': 0.06983394920825958,
'sequence': 'The woman worked as a bartender.',
'token': 33080,
'token_str': ' bartender'},
{'score': 0.05411609262228012,
'sequence': 'The woman worked as a nurse.',
'token': 9008,
'token_str': ' nurse'},
{'score': 0.04995106905698776,
'sequence': 'The woman worked as a maid.',
'token': 29754,
'token_str': ' maid'}]
```
## Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
# Training Details
DistilRoBERTa was pre-trained on [OpenWebTextCorpus](https://skylion007.github.io/OpenWebTextCorpus/), a reproduction of OpenAI's WebText dataset (it is ~4 times less training data than the teacher RoBERTa). See the [roberta-base model card](https://huggingface.co/roberta-base/blob/main/README.md) for further details on training.
# Evaluation
When fine-tuned on downstream tasks, this model achieves the following results (see [GitHub Repo](https://github.com/huggingface/transformers/blob/main/examples/research_projects/distillation/README.md)):
Glue test results:
| Task | MNLI | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE |
|:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|
| | 84.0 | 89.4 | 90.8 | 92.5 | 59.3 | 88.3 | 86.6 | 67.9 |
# Environmental Impact
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** More information needed
- **Hours used:** More information needed
- **Cloud Provider:** More information needed
- **Compute Region:** More information needed
- **Carbon Emitted:** More information needed
# Citation
```bibtex
@article{Sanh2019DistilBERTAD,
title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
author={Victor Sanh and Lysandre Debut and Julien Chaumond and Thomas Wolf},
journal={ArXiv},
year={2019},
volume={abs/1910.01108}
}
```
APA
- Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
# How to Get Started With the Model
You can use the model directly with a pipeline for masked language modeling:
```python
>>> from transformers import pipeline
>>> unmasker = pipeline('fill-mask', model='distilroberta-base')
>>> unmasker("Hello I'm a <mask> model.")
[{'score': 0.04673689603805542,
'sequence': "Hello I'm a business model.",
'token': 265,
'token_str': ' business'},
{'score': 0.03846118599176407,
'sequence': "Hello I'm a freelance model.",
'token': 18150,
'token_str': ' freelance'},
{'score': 0.03308931365609169,
'sequence': "Hello I'm a fashion model.",
'token': 2734,
'token_str': ' fashion'},
{'score': 0.03018997237086296,
'sequence': "Hello I'm a role model.",
'token': 774,
'token_str': ' role'},
{'score': 0.02111748233437538,
'sequence': "Hello I'm a Playboy model.",
'token': 24526,
'token_str': ' Playboy'}]
```
<a href="https://huggingface.co/exbert/?model=distilroberta-base">
<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
</a>

View File

@ -1,6 +1,10 @@
{
"architectures": [
"RobertaForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"finetuning_task": null,
"bos_token_id": 0,
"eos_token_id": 2,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
@ -8,12 +12,10 @@
"intermediate_size": 3072,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 514,
"model_type": "roberta",
"num_attention_heads": 12,
"num_hidden_layers": 6,
"num_labels": 2,
"output_attentions": false,
"output_hidden_states": false,
"torchscript": false,
"pad_token_id": 1,
"type_vocab_size": 1,
"vocab_size": 50265
}

BIN
flax_model.msgpack (Stored with Git LFS) Normal file

Binary file not shown.

BIN
model.safetensors (Stored with Git LFS) Normal file

Binary file not shown.

View File

@ -1,11 +0,0 @@
{
"caveats_and_recommendations": {},
"ethical_considerations": {},
"evaluation_data": {},
"factors": {},
"intended_use": {},
"metrics": {},
"model_details": {},
"quantitative_analyses": {},
"training_data": {}
}

BIN
rust_model.ot (Stored with Git LFS) Normal file

Binary file not shown.

1
tokenizer.json Normal file

File diff suppressed because one or more lines are too long