Migrate model card from transformers-repo

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755 Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/sentence-transformers/bert-base-nli-mean-tokens/README.md
2020-12-11 23:01:45 +01:00 · 2020-12-11 23:01:45 +01:00 · d84dc57432
parent cb5c300720
commit d84dc57432
1 changed files with 85 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,85 @@
 ---
 language: en
 tags:
 - exbert
 license: apache-2.0
 datasets:
 - snli
 - multi_nli
 ---
 # BERT base model (uncased) for Sentence Embeddings
 This is the `bert-base-nli-mean-tokens` model from the [sentence-transformers](https://github.com/UKPLab/sentence-transformers)-repository. The sentence-transformers repository allows to train and use Transformer models for generating sentence and text embeddings. 
 The model is described in  the paper  [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084)
 ## Usage (HuggingFace Models Repository)
 You can use the model directly from the model repository to compute sentence embeddings:
 ```python
 from transformers import AutoTokenizer, AutoModel
 import torch
 #Mean Pooling - Take attention mask into account for correct averaging
 def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, 1)
    sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)
    return sum_embeddings / sum_mask
 #Sentences we want sentence embeddings for
 sentences = ['This framework generates embeddings for each input sentence',
             'Sentences are passed as a list of string.',
             'The quick brown fox jumps over the lazy dog.']
 #Load AutoModel from huggingface model repository
 tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/bert-base-nli-mean-tokens")
 model = AutoModel.from_pretrained("sentence-transformers/bert-base-nli-mean-tokens")
 #Tokenize sentences
 encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=128, return_tensors='pt')
 #Compute token embeddings
 with torch.no_grad():
    model_output = model(**encoded_input)
 #Perform pooling. In this case, mean pooling
 sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
 ```
 ## Usage (Sentence-Transformers)
 Using this model becomes more convenient when you have [sentence-transformers](https://github.com/UKPLab/sentence-transformers) installed:
 ```
 pip install -U sentence-transformers
 ```
 Then you can use the model like this:
 ```python
 from sentence_transformers import SentenceTransformer
 model = SentenceTransformer('bert-base-nli-mean-tokens')
 sentences = ['This framework generates embeddings for each input sentence',
    'Sentences are passed as a list of string.', 
    'The quick brown fox jumps over the lazy dog.']
 sentence_embeddings = model.encode(sentences)
 print("Sentence embeddings:")
 print(sentence_embeddings)
 ```
 ## Citing & Authors
 If you find this model helpful, feel free to cite our publication [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084):
 ``` 
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "http://arxiv.org/abs/1908.10084",
 }
 ```