Create README.md

2021-11-26 13:24:24 +00:00 · 2021-11-26 13:24:24 +00:00 · ad5d7e37ae
parent 89c23fb3b6
commit ad5d7e37ae
1 changed files with 56 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,56 @@
 ---
 license: apache-2.0
 tags:
 datasets:
 - imagenet-21k
 ---
 # Vision-and-Language Transformer (ViLT), fine-tuned on VQAv2
 Vision-and-Language Transformer (ViLT) model fine-tuned on [VQAv2](). It was introduced in the paper [ViLT: Vision-and-Language Transformer
 Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Kim et al. and first released in [this repository](https://github.com/dandelin/ViLT). 
 Disclaimer: The team releasing ViLT did not write a model card for this model so this model card has been written by the Hugging Face team.
 ## Model description
 (to do)
 ## Intended uses & limitations
 You can use the raw model for visual question answering. 
 ### How to use
 (to do)
 ## Training data
 (to do)
 ## Training procedure
 ### Preprocessing
 (to do)
 ### Pretraining
 (to do)
 ## Evaluation results
 (to do)
 ### BibTeX entry and citation info
 ```bibtex
@misc{kim2021vilt,
      title={ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision}, 
      author={Wonjae Kim and Bokyung Son and Ildoo Kim},
      year={2021},
      eprint={2102.03334},
      archivePrefix={arXiv},
      primaryClass={stat.ML}
 }
 ```