From 4355f59b0bb6b8382b98fba55129bb7ce8ea52f5 Mon Sep 17 00:00:00 2001
From: Niels Rogge <niels.rogge1@gmail.com>
Date: Sun, 23 Jan 2022 09:40:33 +0000
Subject: [PATCH] Add code example

---
 README.md | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index 5927263..51abe91 100644
--- a/README.md
+++ b/README.md
@@ -9,17 +9,36 @@ Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by
 
 Disclaimer: The team releasing ViLT did not write a model card for this model so this model card has been written by the Hugging Face team.
 
-## Model description
-
-(to do)
-
 ## Intended uses & limitations
 
 You can use the raw model for visual question answering. 
 
 ### How to use
 
-(to do)
+Here is how to use this model in PyTorch:
+
+```python
+from transformers import ViltProcessor, ViltForQuestionAnswering
+import requests
+from PIL import Image
+
+# prepare image + question
+url = "http://images.cocodataset.org/val2017/000000039769.jpg"
+image = Image.open(requests.get(url, stream=True).raw)
+text = "How many cats are there?"
+
+processor = ViltProcessor.from_pretrained("dandelin/vilt-b32-finetuned-vqa")
+model = ViltForQuestionAnswering.from_pretrained("dandelin/vilt-b32-finetuned-vqa")
+
+# prepare inputs
+encoding = processor(image, text, return_tensors="pt")
+
+# forward pass
+outputs = model(**encoding)
+logits = outputs.logits
+idx = logits.argmax(-1).item()
+print("Predicted answer:", model.config.id2label[idx])
+```
 
 ## Training data