From b279e8a7d4b1342d43c3694fb223afd1bf7668a1 Mon Sep 17 00:00:00 2001 From: Lysandre Date: Wed, 13 Jan 2021 13:40:54 +0000 Subject: [PATCH] Update README.md --- README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 9c05e78..61a7db6 100644 --- a/README.md +++ b/README.md @@ -77,7 +77,10 @@ The model was trained on 4 cloud TPUs in Pod configuration (16 TPU chips total) of 256. The sequence length was limited to 128 tokens for 90% of the steps and 512 for the remaining 10%. The optimizer used is Adam with a learning rate of 1e-4, \\(\beta_{1} = 0.9\\) and \\(\beta_{2} = 0.999\\), a weight decay of 0.01, learning rate warmup for 10,000 steps and linear decay of the learning rate after. -### Fine-tuningAfter pre-training, this model was fine-tuned on the SQuAD dataset with one of our fine-tuning scripts. In order to reproduce the training, you may use the following command: + +### Fine-tuning + +After pre-training, this model was fine-tuned on the SQuAD dataset with one of our fine-tuning scripts. In order to reproduce the training, you may use the following command: ``` python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answering/run_qa.py \ --model_name_or_path bert-large-uncased-whole-word-masking \