From 406982260f3b131f2a94aa3a52bb8234a1974311 Mon Sep 17 00:00:00 2001 From: Lysandre Date: Wed, 13 Jan 2021 14:18:35 +0000 Subject: [PATCH] Update dimensions --- README.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/README.md b/README.md index 61a7db6..05f22c5 100644 --- a/README.md +++ b/README.md @@ -42,6 +42,13 @@ This way, the model learns an inner representation of the English language that useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the BERT model as inputs. +This model has the following configuration: + +- 24-layer +- 1024 hidden dimension +- 16 attention heads +- 336M parameters. + ## Intended uses & limitations This model should be used as a question-answering model. You may use it in a question answering pipeline, or use it to output raw results given a query and a context. You may see other use cases in the [task summary](https://huggingface.co/transformers/task_summary.html#extractive-question-answering) of the transformers documentation.## Training data