Compare commits

...

10 Commits

Author SHA1 Message Date
Stella Biderman a0e5677e86 Correcting metadata 2022-12-08 16:52:36 +00:00
Stella Biderman 51568a6e0a Update README.md 2021-12-31 13:46:21 +00:00
Stella Biderman 5e755b1c9d Updated tags to correctly link with the Pile 2021-12-31 13:45:57 +00:00
Stella Biderman 6f231487a5 Update README.md 2021-09-11 18:26:25 +00:00
Stella Biderman 88f8889ee0 Update README.md 2021-09-11 18:25:28 +00:00
patil-suraj 0b8087bb43 add flax model 2021-07-04 09:15:19 +00:00
Leo Gao b41a392439 Update README.md 2021-05-21 00:00:44 +00:00
Stella Biderman 1172dffaf8 Updated citation info 2021-05-18 19:38:48 +00:00
guillaume df3bd66031 Updated LFS tracked files 2021-05-06 07:30:55 +02:00
guillaume 9b4ecbcecd Addition of Rust model 2021-05-05 22:07:04 +02:00
5 changed files with 37 additions and 12 deletions

1
.gitattributes vendored
View File

@ -15,3 +15,4 @@
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
pytorch_model.bin filter=lfs diff=lfs merge=lfs -text
rust_model.ot filter=lfs diff=lfs merge=lfs -text

View File

@ -4,11 +4,10 @@ language:
tags:
- text generation
- pytorch
- the Pile
- causal-lm
license: apache-2.0
license: mit
datasets:
- the Pile
- the_pile
---
# GPT-Neo 2.7B
@ -23,7 +22,7 @@ GPT-Neo 2.7B was trained on the Pile, a large scale curated dataset created by E
## Training procedure
This model was trained for 400,000 steps on the Pile. It was trained as a masked autoregressive language model, using cross-entropy loss.
This model was trained for 420 billion tokens over 400,000 steps. It was trained as a masked autoregressive language model, using cross-entropy loss.
## Intended Use and Limitations
@ -77,7 +76,26 @@ TBD
### BibTeX entry and citation info
To cite this model, use
```bibtex
@software{gpt-neo,
author = {Black, Sid and
Leo, Gao and
Wang, Phil and
Leahy, Connor and
Biderman, Stella},
title = {{GPT-Neo: Large Scale Autoregressive Language
Modeling with Mesh-Tensorflow}},
month = mar,
year = 2021,
note = {{If you use this software, please cite it using
these metadata.}},
publisher = {Zenodo},
version = {1.0},
doi = {10.5281/zenodo.5297715},
url = {https://doi.org/10.5281/zenodo.5297715}
}
@article{gao2020pile,
title={The Pile: An 800GB Dataset of Diverse Text for Language Modeling},
author={Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and others},

View File

@ -65,16 +65,16 @@
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"transformers_version": "4.5.0.dev0",
"use_cache": true,
"vocab_size": 50257,
"window_size": 256,
"tokenizer_class": "GPT2Tokenizer",
"task_specific_params": {
"text-generation": {
"do_sample": true,
"temperature": 0.9,
"max_length": 50
"max_length": 50,
"temperature": 0.9
}
}
},
"tokenizer_class": "GPT2Tokenizer",
"transformers_version": "4.9.0.dev0",
"use_cache": true,
"vocab_size": 50257,
"window_size": 256
}

BIN
flax_model.msgpack (Stored with Git LFS) Normal file

Binary file not shown.

BIN
rust_model.ot (Stored with Git LFS) Normal file

Binary file not shown.