diff --git a/README.md b/README.md
index 358287a..da2ce18 100644
--- a/README.md
+++ b/README.md
@@ -18,8 +18,7 @@ Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART).
 
 ## Intended uses & limitations
 
-You can use the raw model for image classification. See the [model hub](https://huggingface.co/models?search=google/vit) to look for
-fine-tuned versions on a task that interests you.
+This model is meant to be fine-tuned on a downstream task, like document image classification or document parsing. See the [model hub](https://huggingface.co/models?search=donut) to look for fine-tuned versions on a task that interests you.
 
 ### How to use