add image encoder's information

2022-05-16 11:05:47 +09:00 · 2022-05-16 11:05:47 +09:00 · 055792af34
parent 58c914a486
commit 055792af34
1 changed files with 1 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -62,7 +62,7 @@ print("Label probs:", text_probs)  # prints: [[1.0, 0.0, 0.0]]
 ```
 # Model architecture
-The model was trained  a ViT-B/16 Transformer architecture as an image encoder and uses a 12-layer RoBERTa as a text encoder. The text encoder was trained upon the Japanese pre-trained RoBERTa model [rinna/japanese-roberta-base](https://huggingface.co/rinna/japanese-roberta-base) with the same sentencepiece tokenizer.
+The model was trained  a ViT-B/16 Transformer architecture as an image encoder and uses a 12-layer RoBERTa as a text encoder. It was initialized with [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) as the image encoder and the Japanese pre-trained RoBERTa model [rinna/japanese-roberta-base](https://huggingface.co/rinna/japanese-roberta-base) with the same sentencepiece tokenizer as the text encoder.
 # Training
 The model was trained on [CC12M](https://github.com/google-research-datasets/conceptual-12m) translated the captions to Japanese.