Update README.md
This commit is contained in:
parent
7a0f2c191a
commit
337a855425
26
README.md
26
README.md
|
@ -7,24 +7,38 @@ tags:
|
||||||
license: mit
|
license: mit
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
||||||
# OPT : Open Pre-trained Transformer Language Models
|
# OPT : Open Pre-trained Transformer Language Models
|
||||||
|
|
||||||
OPT was predominantly pretrained with English text, but a small amount of non-English data is still present within the training corpus via CommonCrawl. The model was pretrained using a causal language modeling (CLM) objective.
|
|
||||||
|
|
||||||
OPT was first introduced in [Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) and first released in [metaseq's repository](https://github.com/facebookresearch/metaseq) on May 3rd 2022 by Meta AI.
|
OPT was first introduced in [Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) and first released in [metaseq's repository](https://github.com/facebookresearch/metaseq) on May 3rd 2022 by Meta AI.
|
||||||
|
|
||||||
**Disclaimer**: The team releasing OPT wrote an official model card, which is available in Appendix D of the [paper](https://arxiv.org/pdf/2205.01068.pdf).
|
**Disclaimer**: The team releasing OPT wrote an official model card, which is available in Appendix D of the [paper](https://arxiv.org/pdf/2205.01068.pdf).
|
||||||
Content from **this** model card has been written by the Hugging Face team.
|
Content from **this** model card has been written by the Hugging Face team.
|
||||||
|
|
||||||
|
## Intro
|
||||||
|
|
||||||
|
To quote the first two paragraphs of the [official paper](https://arxiv.org/abs/2205.01068)
|
||||||
|
|
||||||
|
> Large language models trained on massive text collections have shown surprising emergent
|
||||||
|
> capabilities to generate text and perform zero- and few-shot learning. While in some cases the public
|
||||||
|
> can interact with these models through paid APIs, full model access is currently limited to only a
|
||||||
|
> few highly resourced labs. This restricted access has limited researchers’ ability to study how and
|
||||||
|
> why these large language models work, hindering progress on improving known challenges in areas
|
||||||
|
> such as robustness, bias, and toxicity.
|
||||||
|
|
||||||
|
> We present Open Pretrained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M
|
||||||
|
> to 175B parameters, which we aim to fully and responsibly share with interested researchers. We train the OPT models to roughly match
|
||||||
|
> the performance and sizes of the GPT-3 class of models, while also applying the latest best practices in data
|
||||||
|
> collection and efficient training. Our aim in developing this suite of OPT models is to enable reproducible and responsible research at scale, and
|
||||||
|
> to bring more voices to the table in studying the impact of these LLMs. Definitions of risk, harm, bias, and toxicity, etc., should be articulated by the
|
||||||
|
> collective research community as a whole, which is only possible when models are available for study.
|
||||||
|
|
||||||
## Model description
|
## Model description
|
||||||
|
|
||||||
OPT belongs to the same family of decoder-only models like [GPT-3](https://arxiv.org/abs/2005.14165). As such, it was pretrained using the self-supervised causal language modedling
|
OPT was predominantly pretrained with English text, but a small amount of non-English data is still present within the training corpus via CommonCrawl. The model was pretrained using a causal language modeling (CLM) objective.
|
||||||
objective.
|
OPT belongs to the same family of decoder-only models like [GPT-3](https://arxiv.org/abs/2005.14165). As such, it was pretrained using the self-supervised causal language modedling objective.
|
||||||
|
|
||||||
For evaluation, OPT follows [GPT-3](https://arxiv.org/abs/2005.14165) by using their prompts and overall experimental setup. For more details, please read
|
For evaluation, OPT follows [GPT-3](https://arxiv.org/abs/2005.14165) by using their prompts and overall experimental setup. For more details, please read
|
||||||
the [official paper](https://arxiv.org/abs/2205.01068).
|
the [official paper](https://arxiv.org/abs/2205.01068).
|
||||||
|
|
||||||
## Intended uses & limitations
|
## Intended uses & limitations
|
||||||
|
|
||||||
The pretrained-only model can be used for prompting for evaluation of downstream tasks as well as text generation.
|
The pretrained-only model can be used for prompting for evaluation of downstream tasks as well as text generation.
|
||||||
|
|
Loading…
Reference in New Issue