Update README.md

2021-09-01 10:38:15 +00:00 · 2021-09-01 10:38:15 +00:00 · abfb9d49f5
parent af304f64a2
commit abfb9d49f5
1 changed files with 39 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -12,6 +12,7 @@ license: apache-2.0
 # Hubert-Base for Keyword Spotting

 [S3PRL speech toolkit](https://github.com/s3prl/s3prl)
+
 [Facebook's Hubert](https://ai.facebook.com/blog/hubert-self-supervised-representation-learning-for-speech-recognition-generation-and-compression)

 The base model is pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. 
@ -26,4 +27,41 @@ Self-supervised learning (SSL) has proven vital for advancing research in natura

 The original model can be found under https://github.com/s3prl/s3prl/tree/master/s3prl/downstream/speech_commands.

-# Usage
+The base model is [hubert-base-ls960](https://huggingface.co/facebook/hubert-base-ls960)
+
+# Usage examples
+
+You can use the model via the Audio Classification pipeline:
+```python
+import numpy as np
+from datasets import load_dataset
+from transformers import pipeline, PreTrainedTokenizer
+
+superb_ks = load_dataset("anton-l/superb_dummy", "ks", split="test")
+model = "superb/hubert-base-superb-ks"
+tokenizer = PreTrainedTokenizer()  # a dummy tokenizer, since the classifier doesn't need a real one
+classifier = pipeline("audio-classification", model=model, feature_extractor=model, tokenizer=tokenizer)
+
+audio = np.array(superb_ks[0]["speech"])
+labels = classifier(audio, top_k=5)
+```
+
+Or use the model directly:
+```python
+import torch
+import numpy as np
+from datasets import load_dataset
+from transformers import HubertForSequenceClassification, Wav2Vec2FeatureExtractor
+
+superb_ks = load_dataset("anton-l/superb_dummy", "ks", split="test")
+model = HubertForSequenceClassification.from_pretrained("superb/hubert-base-superb-ks")
+feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("superb/hubert-base-superb-ks")
+
+audio = np.array(superb_ks[0]["speech"])
+# compute attention masks and normalize the waveform if needed
+inputs = feature_extractor(audio, sampling_rate=16_000, return_tensors="pt")
+
+logits = model(**inputs).logits
+predicted_ids = torch.argmax(logits, dim=-1)
+labels = [model.config.id2label[_id] for _id in predicted_ids.tolist()]
+```