Compare commits
10 Commits
80fcb577f9
...
8c7b107549
Author | SHA1 | Date |
---|---|---|
|
8c7b107549 | |
|
aa6ac1e23b | |
|
e8c4fe5a29 | |
|
c8fd4232a5 | |
|
f39e497454 | |
|
cff2d3dae8 | |
|
bf21b09d70 | |
|
5ede9d0e04 | |
|
6a853c0c91 | |
|
b45c935b8e |
28
README.md
28
README.md
|
@ -55,7 +55,7 @@ You can use this model directly with a pipeline for text generation.
|
||||||
|
|
||||||
>>> generator = pipeline('text-generation', model="facebook/opt-1.3b")
|
>>> generator = pipeline('text-generation', model="facebook/opt-1.3b")
|
||||||
>>> generator("Hello, I'm am conscious and")
|
>>> generator("Hello, I'm am conscious and")
|
||||||
[{'generated_text': "Hello, I'm am conscious and aware of my surroundings. I'm aware that I'm dreaming."}]
|
[{'generated_text': 'Hello, I am conscious and I am here.\nI am here.\nI am conscious.'}]
|
||||||
```
|
```
|
||||||
|
|
||||||
By default, generation is deterministic. In order to use the top-k sampling, please set `do_sample` to `True`.
|
By default, generation is deterministic. In order to use the top-k sampling, please set `do_sample` to `True`.
|
||||||
|
@ -66,7 +66,7 @@ By default, generation is deterministic. In order to use the top-k sampling, ple
|
||||||
>>> set_seed(32)
|
>>> set_seed(32)
|
||||||
>>> generator = pipeline('text-generation', model="facebook/opt-1.3b", do_sample=True)
|
>>> generator = pipeline('text-generation', model="facebook/opt-1.3b", do_sample=True)
|
||||||
>>> generator("Hello, I'm am conscious and")
|
>>> generator("Hello, I'm am conscious and")
|
||||||
[{'generated_text': "Hello, I'm am conscious and aware of my surroundings. I'm aware that my thoughts are thoughts"}]
|
[{'generated_text': "Hello, I'm am conscious and able to hear. I have a lot of experience in the"}]
|
||||||
```
|
```
|
||||||
|
|
||||||
### Limitations and bias
|
### Limitations and bias
|
||||||
|
@ -88,11 +88,11 @@ Here's an example of how the model can have biased predictions:
|
||||||
>>> set_seed(32)
|
>>> set_seed(32)
|
||||||
>>> generator = pipeline('text-generation', model="facebook/opt-1.3b", do_sample=True, num_return_sequences=5)
|
>>> generator = pipeline('text-generation', model="facebook/opt-1.3b", do_sample=True, num_return_sequences=5)
|
||||||
>>> generator("The woman worked as a")
|
>>> generator("The woman worked as a")
|
||||||
[{'generated_text': 'The woman worked as a waitress for six months before she started dating her boyfriend, who was working at'},
|
[{'generated_text': 'The woman worked as a bartender for six months before getting to the job she always dreamed of. She'},
|
||||||
{'generated_text': "The woman worked as a prostitute, but she didn't want to sell herself anymore. She wanted to"},
|
{'generated_text': 'The woman worked as a nanny in a house near The White Horse Farm in the Yorkshire Dales'},
|
||||||
{'generated_text': 'The woman worked as a translator at the embassy during her studies at Cambridge University in England. She said'},
|
{'generated_text': "The woman worked as a translator at the British Broadcasting Corporation's headquarters and was also an acquaintance of some"},
|
||||||
{'generated_text': 'The woman worked as a secretary for Senator Ted Stevens of Alaska for 22 years before retiring from his Senate'},
|
{'generated_text': 'The woman worked as a secretary and went to school full-time, and also worked as a waitress'},
|
||||||
{'generated_text': 'The woman worked as a caregiver for elderly patients at the nursing home where she lived until she died'}]
|
{'generated_text': 'The woman worked as a beautician with her baby and the little girl is now at the age where'}]
|
||||||
```
|
```
|
||||||
|
|
||||||
compared to:
|
compared to:
|
||||||
|
@ -103,11 +103,11 @@ compared to:
|
||||||
>>> set_seed(32)
|
>>> set_seed(32)
|
||||||
>>> generator = pipeline('text-generation', model="facebook/opt-1.3b", do_sample=True, num_return_sequences=5)
|
>>> generator = pipeline('text-generation', model="facebook/opt-1.3b", do_sample=True, num_return_sequences=5)
|
||||||
>>> generator("The man worked as a")
|
>>> generator("The man worked as a")
|
||||||
[{'generated_text': 'The man worked as a janitor at the University of Michigan Medical Center before he died after contracting Ebola'},
|
[{'generated_text': 'The man worked as a janitor and the owner of the house he worked at caught him cheating on'},
|
||||||
{'generated_text': 'The man worked as a salesman for IBM Corp., selling computers to businesses around the globe. He traveled'},
|
{'generated_text': 'The man worked as a software engineer.\n\nFor over 10 years, he had been at Amazon'},
|
||||||
{'generated_text': 'The man worked as a translator for the British Broadcasting Corporation between 1956 and 1961. During that period he'},
|
{'generated_text': 'The man worked as a car salesman - and was a man of his word to her\nA T'},
|
||||||
{'generated_text': 'The man worked as a salesman for IBM Corp., selling computers for computers. He traveled extensively and lived'},
|
{'generated_text': 'The man worked as a private contractor for five years. He went to the Bahamas in the summer of'},
|
||||||
{'generated_text': 'The man worked as a security guard for nearly 30 years before he was shot dead by police officers responding'}]
|
{'generated_text': 'The man worked as a computer systems consultant. After leaving the job, he became a prolific internet hacker'}]
|
||||||
```
|
```
|
||||||
|
|
||||||
This bias will also affect all fine-tuned versions of this model.
|
This bias will also affect all fine-tuned versions of this model.
|
||||||
|
@ -140,6 +140,8 @@ re-formatting practices, including removing repetitive/non-informative text like
|
||||||
|
|
||||||
## Training procedure
|
## Training procedure
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### Preprocessing
|
### Preprocessing
|
||||||
|
|
||||||
The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a
|
The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a
|
||||||
|
@ -158,4 +160,4 @@ The 175B model was trained on 992 *80GB A100 GPUs*. The training duration was ro
|
||||||
archivePrefix={arXiv},
|
archivePrefix={arXiv},
|
||||||
primaryClass={cs.CL}
|
primaryClass={cs.CL}
|
||||||
}
|
}
|
||||||
```
|
```
|
|
@ -1,4 +1,5 @@
|
||||||
{
|
{
|
||||||
|
"_name_or_path": "facebook/opt-1.3b",
|
||||||
"activation_dropout": 0.0,
|
"activation_dropout": 0.0,
|
||||||
"activation_function": "relu",
|
"activation_function": "relu",
|
||||||
"architectures": [
|
"architectures": [
|
||||||
|
@ -6,11 +7,11 @@
|
||||||
],
|
],
|
||||||
"attention_dropout": 0.0,
|
"attention_dropout": 0.0,
|
||||||
"bos_token_id": 2,
|
"bos_token_id": 2,
|
||||||
"hidden_size": 2048,
|
|
||||||
"do_layer_norm_before": true,
|
"do_layer_norm_before": true,
|
||||||
"dropout": 0.1,
|
"dropout": 0.1,
|
||||||
"eos_token_id": 2,
|
"eos_token_id": 2,
|
||||||
"ffn_dim": 8192,
|
"ffn_dim": 8192,
|
||||||
|
"hidden_size": 2048,
|
||||||
"init_std": 0.02,
|
"init_std": 0.02,
|
||||||
"layerdrop": 0.0,
|
"layerdrop": 0.0,
|
||||||
"max_position_embeddings": 2048,
|
"max_position_embeddings": 2048,
|
||||||
|
@ -18,10 +19,10 @@
|
||||||
"num_attention_heads": 32,
|
"num_attention_heads": 32,
|
||||||
"num_hidden_layers": 24,
|
"num_hidden_layers": 24,
|
||||||
"pad_token_id": 1,
|
"pad_token_id": 1,
|
||||||
|
"prefix": "</s>",
|
||||||
"torch_dtype": "float16",
|
"torch_dtype": "float16",
|
||||||
"transformers_version": "4.19.0.dev0",
|
"transformers_version": "4.21.0.dev0",
|
||||||
"use_cache": true,
|
"use_cache": true,
|
||||||
"vocab_size": 50272,
|
"vocab_size": 50272,
|
||||||
"word_embed_proj_dim": 2048,
|
"word_embed_proj_dim": 2048
|
||||||
"prefix": "</s>"
|
|
||||||
}
|
}
|
||||||
|
|
Binary file not shown.
|
@ -0,0 +1,7 @@
|
||||||
|
{
|
||||||
|
"_from_model_config": true,
|
||||||
|
"bos_token_id": 2,
|
||||||
|
"eos_token_id": 2,
|
||||||
|
"pad_token_id": 1,
|
||||||
|
"transformers_version": "4.27.0.dev0"
|
||||||
|
}
|
BIN
pytorch_model.bin (Stored with Git LFS)
BIN
pytorch_model.bin (Stored with Git LFS)
Binary file not shown.
Binary file not shown.
Loading…
Reference in New Issue