Update README.md

This commit is contained in:
Lysandre 2021-01-13 13:40:54 +00:00 committed by huggingface-web
parent b8df2493f9
commit b279e8a7d4
1 changed files with 4 additions and 1 deletions

View File

@ -77,7 +77,10 @@ The model was trained on 4 cloud TPUs in Pod configuration (16 TPU chips total)
of 256. The sequence length was limited to 128 tokens for 90% of the steps and 512 for the remaining 10%. The optimizer of 256. The sequence length was limited to 128 tokens for 90% of the steps and 512 for the remaining 10%. The optimizer
used is Adam with a learning rate of 1e-4, \\(\beta_{1} = 0.9\\) and \\(\beta_{2} = 0.999\\), a weight decay of 0.01, used is Adam with a learning rate of 1e-4, \\(\beta_{1} = 0.9\\) and \\(\beta_{2} = 0.999\\), a weight decay of 0.01,
learning rate warmup for 10,000 steps and linear decay of the learning rate after. learning rate warmup for 10,000 steps and linear decay of the learning rate after.
### Fine-tuningAfter pre-training, this model was fine-tuned on the SQuAD dataset with one of our fine-tuning scripts. In order to reproduce the training, you may use the following command:
### Fine-tuning
After pre-training, this model was fine-tuned on the SQuAD dataset with one of our fine-tuning scripts. In order to reproduce the training, you may use the following command:
``` ```
python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answering/run_qa.py \ python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answering/run_qa.py \
--model_name_or_path bert-large-uncased-whole-word-masking \ --model_name_or_path bert-large-uncased-whole-word-masking \