From c6b2049c8aac3ebe34ab3e7733ab10359a541bda Mon Sep 17 00:00:00 2001 From: Niels Rogge Date: Sat, 27 Nov 2021 10:13:58 +0000 Subject: [PATCH] Add link --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 4d5abaa..91f580d 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ datasets: # Vision-and-Language Transformer (ViLT), fine-tuned on VQAv2 -Vision-and-Language Transformer (ViLT) model fine-tuned on [VQAv2](). It was introduced in the paper [ViLT: Vision-and-Language Transformer +Vision-and-Language Transformer (ViLT) model fine-tuned on [VQAv2](https://visualqa.org/). It was introduced in the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Kim et al. and first released in [this repository](https://github.com/dandelin/ViLT). Disclaimer: The team releasing ViLT did not write a model card for this model so this model card has been written by the Hugging Face team.