From 4355f59b0bb6b8382b98fba55129bb7ce8ea52f5 Mon Sep 17 00:00:00 2001 From: Niels Rogge Date: Sun, 23 Jan 2022 09:40:33 +0000 Subject: [PATCH] Add code example --- README.md | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 5927263..51abe91 100644 --- a/README.md +++ b/README.md @@ -9,17 +9,36 @@ Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Disclaimer: The team releasing ViLT did not write a model card for this model so this model card has been written by the Hugging Face team. -## Model description - -(to do) - ## Intended uses & limitations You can use the raw model for visual question answering. ### How to use -(to do) +Here is how to use this model in PyTorch: + +```python +from transformers import ViltProcessor, ViltForQuestionAnswering +import requests +from PIL import Image + +# prepare image + question +url = "http://images.cocodataset.org/val2017/000000039769.jpg" +image = Image.open(requests.get(url, stream=True).raw) +text = "How many cats are there?" + +processor = ViltProcessor.from_pretrained("dandelin/vilt-b32-finetuned-vqa") +model = ViltForQuestionAnswering.from_pretrained("dandelin/vilt-b32-finetuned-vqa") + +# prepare inputs +encoding = processor(image, text, return_tensors="pt") + +# forward pass +outputs = model(**encoding) +logits = outputs.logits +idx = logits.argmax(-1).item() +print("Predicted answer:", model.config.id2label[idx]) +``` ## Training data