Add code example
This commit is contained in:
parent
1d0521195d
commit
4355f59b0b
29
README.md
29
README.md
|
@ -9,17 +9,36 @@ Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by
|
||||||
|
|
||||||
Disclaimer: The team releasing ViLT did not write a model card for this model so this model card has been written by the Hugging Face team.
|
Disclaimer: The team releasing ViLT did not write a model card for this model so this model card has been written by the Hugging Face team.
|
||||||
|
|
||||||
## Model description
|
|
||||||
|
|
||||||
(to do)
|
|
||||||
|
|
||||||
## Intended uses & limitations
|
## Intended uses & limitations
|
||||||
|
|
||||||
You can use the raw model for visual question answering.
|
You can use the raw model for visual question answering.
|
||||||
|
|
||||||
### How to use
|
### How to use
|
||||||
|
|
||||||
(to do)
|
Here is how to use this model in PyTorch:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from transformers import ViltProcessor, ViltForQuestionAnswering
|
||||||
|
import requests
|
||||||
|
from PIL import Image
|
||||||
|
|
||||||
|
# prepare image + question
|
||||||
|
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
||||||
|
image = Image.open(requests.get(url, stream=True).raw)
|
||||||
|
text = "How many cats are there?"
|
||||||
|
|
||||||
|
processor = ViltProcessor.from_pretrained("dandelin/vilt-b32-finetuned-vqa")
|
||||||
|
model = ViltForQuestionAnswering.from_pretrained("dandelin/vilt-b32-finetuned-vqa")
|
||||||
|
|
||||||
|
# prepare inputs
|
||||||
|
encoding = processor(image, text, return_tensors="pt")
|
||||||
|
|
||||||
|
# forward pass
|
||||||
|
outputs = model(**encoding)
|
||||||
|
logits = outputs.logits
|
||||||
|
idx = logits.argmax(-1).item()
|
||||||
|
print("Predicted answer:", model.config.id2label[idx])
|
||||||
|
```
|
||||||
|
|
||||||
## Training data
|
## Training data
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue