release v0.2.0 models

This commit is contained in:
mkshing 2022-07-19 14:43:12 +09:00
parent 055792af34
commit 2707159f64
3 changed files with 17 additions and 2018 deletions

View File

@ -62,7 +62,7 @@ print("Label probs:", text_probs) # prints: [[1.0, 0.0, 0.0]]
```
# Model architecture
The model was trained a ViT-B/16 Transformer architecture as an image encoder and uses a 12-layer RoBERTa as a text encoder. It was initialized with [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) as the image encoder and the Japanese pre-trained RoBERTa model [rinna/japanese-roberta-base](https://huggingface.co/rinna/japanese-roberta-base) with the same sentencepiece tokenizer as the text encoder.
The model was trained a ViT-B/16 Transformer architecture as an image encoder and uses a 12-layer BERT as a text encoder. The image encoder was initialized from the [AugReg `vit-base-patch16-224` model](https://github.com/google-research/vision_transformer).
# Training
The model was trained on [CC12M](https://github.com/google-research-datasets/conceptual-12m) translated the captions to Japanese.

File diff suppressed because it is too large Load Diff

BIN
pytorch_model.bin (Stored with Git LFS)

Binary file not shown.