diff --git a/README.md b/README.md index 767a87c..f35c0e0 100644 --- a/README.md +++ b/README.md @@ -9,22 +9,36 @@ tags: --- # shibing624/text2vec This is a CoSENT(Cosine Sentence) model: It maps sentences to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. + ## Usage (text2vec) Using this model becomes easy when you have [text2vec](https://github.com/shibing624/text2vec) installed: + ``` pip install -U text2vec ``` + Then you can use the model like this: + ```python -from text2vec import SBert +from text2vec import SentenceModel sentences = ['如何更换花呗绑定银行卡', '花呗更改绑定银行卡'] -model = SBert('shibing624/text2vec-base-chinese') +model = SentenceModel('shibing624/text2vec-base-chinese') embeddings = model.encode(sentences) print(embeddings) ``` + ## Usage (HuggingFace Transformers) -Without [text2vec](https://github.com/shibing624/text2vec), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. +Without [text2vec](https://github.com/shibing624/text2vec), you can use the model like this: + +First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. + +Install transformers: +``` +pip install transformers +``` + +Then load model and predict: ```python from transformers import BertTokenizer, BertModel import torch @@ -50,6 +64,28 @@ sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'] print("Sentence embeddings:") print(sentence_embeddings) ``` + +## Usage (sentence-transformers) +[sentence-transformers](https://github.com/UKPLab/sentence-transformers) is a popular library to compute dense vector representations for sentences. + +Install sentence-transformers: +``` +pip install -U sentence-transformers +``` + +Then load model and predict: + +```python +from sentence_transformers import SentenceTransformer + +m = SentenceTransformer("shibing624/text2vec-base-chinese") +sentences = ['如何更换花呗绑定银行卡', '花呗更改绑定银行卡'] + +sentence_embeddings = m.encode(sentences) +print("Sentence embeddings:") +print(sentence_embeddings) +``` + ## Evaluation Results For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [text2vec](https://github.com/shibing624/text2vec)