Create README.md

2021-11-26 13:24:24 +00:00 · 2021-11-26 13:24:24 +00:00 · ad5d7e37ae
parent 89c23fb3b6
commit ad5d7e37ae
1 changed files with 56 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,56 @@
+---
+license: apache-2.0
+tags:
+datasets:
+- imagenet-21k
+---
+
+# Vision-and-Language Transformer (ViLT), fine-tuned on VQAv2
+
+Vision-and-Language Transformer (ViLT) model fine-tuned on [VQAv2](). It was introduced in the paper [ViLT: Vision-and-Language Transformer
+Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Kim et al. and first released in [this repository](https://github.com/dandelin/ViLT). 
+
+Disclaimer: The team releasing ViLT did not write a model card for this model so this model card has been written by the Hugging Face team.
+
+## Model description
+
+(to do)
+
+## Intended uses & limitations
+
+You can use the raw model for visual question answering. 
+
+### How to use
+
+(to do)
+
+## Training data
+
+(to do)
+
+## Training procedure
+
+### Preprocessing
+
+(to do)
+
+### Pretraining
+
+(to do)
+
+## Evaluation results
+
+(to do)
+
+### BibTeX entry and citation info
+
+```bibtex
+@misc{kim2021vilt,
+      title={ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision}, 
+      author={Wonjae Kim and Bokyung Son and Ildoo Kim},
+      year={2021},
+      eprint={2102.03334},
+      archivePrefix={arXiv},
+      primaryClass={stat.ML}
+}
+```