Add code example

2022-01-23 09:40:33 +00:00 · 2022-01-23 09:40:33 +00:00 · 4355f59b0b
parent 1d0521195d
commit 4355f59b0b
1 changed files with 24 additions and 5 deletions
--- a/README.md
+++ b/README.md
@ -9,17 +9,36 @@ Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by

 Disclaimer: The team releasing ViLT did not write a model card for this model so this model card has been written by the Hugging Face team.

-## Model description
-
-(to do)
-
 ## Intended uses & limitations

 You can use the raw model for visual question answering. 

 ### How to use

-(to do)
+Here is how to use this model in PyTorch:
+
+```python
+from transformers import ViltProcessor, ViltForQuestionAnswering
+import requests
+from PIL import Image
+
+# prepare image + question
+url = "http://images.cocodataset.org/val2017/000000039769.jpg"
+image = Image.open(requests.get(url, stream=True).raw)
+text = "How many cats are there?"
+
+processor = ViltProcessor.from_pretrained("dandelin/vilt-b32-finetuned-vqa")
+model = ViltForQuestionAnswering.from_pretrained("dandelin/vilt-b32-finetuned-vqa")
+
+# prepare inputs
+encoding = processor(image, text, return_tensors="pt")
+
+# forward pass
+outputs = model(**encoding)
+logits = outputs.logits
+idx = logits.argmax(-1).item()
+print("Predicted answer:", model.config.id2label[idx])
+```

 ## Training data