From ff022d095678a2995f3c49bab18a96a9e553f782 Mon Sep 17 00:00:00 2001
From: Patrick von Platen <patrick.v.platen@gmail.com>
Date: Fri, 5 Nov 2021 12:42:57 +0000
Subject: [PATCH] Update README.md

---
 README.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index a30d2b0..06eb884 100644
--- a/README.md
+++ b/README.md
@@ -11,7 +11,9 @@ license: apache-2.0
 
 [Facebook's Hubert](https://ai.facebook.com/blog/hubert-self-supervised-representation-learning-for-speech-recognition-generation-and-compression)
 
-The large model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Note that this model should be fine-tuned on a downstream task, like Automatic Speech Recognition, Speaker Identification, Intent Classification, Emotion Recognition, etc...
+The large model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
+
+**Note**: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model **speech recognition**, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for more in-detail explanation of how to fine-tune the model.
 
 The model was pretrained on [Libri-Light](https://github.com/facebookresearch/libri-light).