diff --git a/README.md b/README.md index 358287a..da2ce18 100644 --- a/README.md +++ b/README.md @@ -18,8 +18,7 @@ Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). ## Intended uses & limitations -You can use the raw model for image classification. See the [model hub](https://huggingface.co/models?search=google/vit) to look for -fine-tuned versions on a task that interests you. +This model is meant to be fine-tuned on a downstream task, like document image classification or document parsing. See the [model hub](https://huggingface.co/models?search=donut) to look for fine-tuned versions on a task that interests you. ### How to use