Add code example

This commit is contained in:
Niels Rogge 2022-01-23 09:40:33 +00:00 committed by huggingface-web
parent 1d0521195d
commit 4355f59b0b
1 changed files with 24 additions and 5 deletions

View File

@ -9,17 +9,36 @@ Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by
Disclaimer: The team releasing ViLT did not write a model card for this model so this model card has been written by the Hugging Face team.
## Model description
(to do)
## Intended uses & limitations
You can use the raw model for visual question answering.
### How to use
(to do)
Here is how to use this model in PyTorch:
```python
from transformers import ViltProcessor, ViltForQuestionAnswering
import requests
from PIL import Image
# prepare image + question
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
text = "How many cats are there?"
processor = ViltProcessor.from_pretrained("dandelin/vilt-b32-finetuned-vqa")
model = ViltForQuestionAnswering.from_pretrained("dandelin/vilt-b32-finetuned-vqa")
# prepare inputs
encoding = processor(image, text, return_tensors="pt")
# forward pass
outputs = model(**encoding)
logits = outputs.logits
idx = logits.argmax(-1).item()
print("Predicted answer:", model.config.id2label[idx])
```
## Training data