Compare commits

..

No commits in common. "577833e50353203aad9b0f01c9ed54f45d7f0dd9" and "a7d435bbac92236646c04fa6f9551b56a2026a85" have entirely different histories.

3 changed files with 2018 additions and 34 deletions

View File

@ -1,9 +1,7 @@
---
language: ja
thumbnail: https://github.com/rinnakk/japanese-pretrained-models/blob/master/rinna.png
license: apache-2.0
tags:
- feature-extraction
- ja
- japanese
- clip
@ -14,22 +12,16 @@ tags:
![rinna-icon](./rinna.png)
This is a Japanese [CLIP (Contrastive Language-Image Pre-Training)](https://arxiv.org/abs/2103.00020) model trained by [rinna Co., Ltd.](https://corp.rinna.co.jp/).
Please see [japanese-clip](https://github.com/rinnakk/japanese-clip) for the other available models.
This repository provides a Japanese [CLIP (Contrastive Language-Image Pre-Training)](https://arxiv.org/abs/2103.00020) model. The model was trained by [rinna Co., Ltd.](https://corp.rinna.co.jp/)
# How to use the model
1. Install package
```shell
$ pip install git+https://github.com/rinnakk/japanese-clip.git
```
2. Run
```python
import io
import requests
@ -61,13 +53,3 @@ with torch.no_grad():
print("Label probs:", text_probs) # prints: [[1.0, 0.0, 0.0]]
```
# Model architecture
The model was trained a ViT-B/16 Transformer architecture as an image encoder and uses a 12-layer BERT as a text encoder. The image encoder was initialized from the [AugReg `vit-base-patch16-224` model](https://github.com/google-research/vision_transformer).
# Training
The model was trained on [CC12M](https://github.com/google-research-datasets/conceptual-12m) translated the captions to Japanese.
# License
[The Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0)

File diff suppressed because it is too large Load Diff

BIN
pytorch_model.bin (Stored with Git LFS)

Binary file not shown.