Compare commits
10 Commits
a7d435bbac
...
577833e503
Author | SHA1 | Date |
---|---|---|
|
577833e503 | |
|
2707159f64 | |
|
055792af34 | |
|
58c914a486 | |
|
05b501981a | |
|
357abcbe9d | |
|
a7dfb0a85f | |
|
da2e08c95d | |
|
f093721140 | |
|
062b689ee1 |
20
README.md
20
README.md
|
@ -1,7 +1,9 @@
|
||||||
---
|
---
|
||||||
language: ja
|
language: ja
|
||||||
thumbnail: https://github.com/rinnakk/japanese-pretrained-models/blob/master/rinna.png
|
thumbnail: https://github.com/rinnakk/japanese-pretrained-models/blob/master/rinna.png
|
||||||
|
license: apache-2.0
|
||||||
tags:
|
tags:
|
||||||
|
- feature-extraction
|
||||||
- ja
|
- ja
|
||||||
- japanese
|
- japanese
|
||||||
- clip
|
- clip
|
||||||
|
@ -12,16 +14,22 @@ tags:
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
This repository provides a Japanese [CLIP (Contrastive Language-Image Pre-Training)](https://arxiv.org/abs/2103.00020) model. The model was trained by [rinna Co., Ltd.](https://corp.rinna.co.jp/)
|
This is a Japanese [CLIP (Contrastive Language-Image Pre-Training)](https://arxiv.org/abs/2103.00020) model trained by [rinna Co., Ltd.](https://corp.rinna.co.jp/).
|
||||||
|
|
||||||
|
Please see [japanese-clip](https://github.com/rinnakk/japanese-clip) for the other available models.
|
||||||
|
|
||||||
|
|
||||||
# How to use the model
|
# How to use the model
|
||||||
|
|
||||||
|
|
||||||
1. Install package
|
1. Install package
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
$ pip install git+https://github.com/rinnakk/japanese-clip.git
|
$ pip install git+https://github.com/rinnakk/japanese-clip.git
|
||||||
```
|
```
|
||||||
|
|
||||||
2. Run
|
2. Run
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import io
|
import io
|
||||||
import requests
|
import requests
|
||||||
|
@ -53,3 +61,13 @@ with torch.no_grad():
|
||||||
print("Label probs:", text_probs) # prints: [[1.0, 0.0, 0.0]]
|
print("Label probs:", text_probs) # prints: [[1.0, 0.0, 0.0]]
|
||||||
```
|
```
|
||||||
|
|
||||||
|
# Model architecture
|
||||||
|
The model was trained a ViT-B/16 Transformer architecture as an image encoder and uses a 12-layer BERT as a text encoder. The image encoder was initialized from the [AugReg `vit-base-patch16-224` model](https://github.com/google-research/vision_transformer).
|
||||||
|
|
||||||
|
# Training
|
||||||
|
The model was trained on [CC12M](https://github.com/google-research-datasets/conceptual-12m) translated the captions to Japanese.
|
||||||
|
|
||||||
|
|
||||||
|
# License
|
||||||
|
|
||||||
|
[The Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0)
|
||||||
|
|
2028
config.json
2028
config.json
File diff suppressed because it is too large
Load Diff
BIN
pytorch_model.bin (Stored with Git LFS)
BIN
pytorch_model.bin (Stored with Git LFS)
Binary file not shown.
Loading…
Reference in New Issue