Compare commits
No commits in common. "577833e50353203aad9b0f01c9ed54f45d7f0dd9" and "a7d435bbac92236646c04fa6f9551b56a2026a85" have entirely different histories.
577833e503
...
a7d435bbac
20
README.md
20
README.md
|
@ -1,9 +1,7 @@
|
||||||
---
|
---
|
||||||
language: ja
|
language: ja
|
||||||
thumbnail: https://github.com/rinnakk/japanese-pretrained-models/blob/master/rinna.png
|
thumbnail: https://github.com/rinnakk/japanese-pretrained-models/blob/master/rinna.png
|
||||||
license: apache-2.0
|
|
||||||
tags:
|
tags:
|
||||||
- feature-extraction
|
|
||||||
- ja
|
- ja
|
||||||
- japanese
|
- japanese
|
||||||
- clip
|
- clip
|
||||||
|
@ -14,22 +12,16 @@ tags:
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
This is a Japanese [CLIP (Contrastive Language-Image Pre-Training)](https://arxiv.org/abs/2103.00020) model trained by [rinna Co., Ltd.](https://corp.rinna.co.jp/).
|
This repository provides a Japanese [CLIP (Contrastive Language-Image Pre-Training)](https://arxiv.org/abs/2103.00020) model. The model was trained by [rinna Co., Ltd.](https://corp.rinna.co.jp/)
|
||||||
|
|
||||||
Please see [japanese-clip](https://github.com/rinnakk/japanese-clip) for the other available models.
|
|
||||||
|
|
||||||
|
|
||||||
# How to use the model
|
# How to use the model
|
||||||
|
|
||||||
|
|
||||||
1. Install package
|
1. Install package
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
$ pip install git+https://github.com/rinnakk/japanese-clip.git
|
$ pip install git+https://github.com/rinnakk/japanese-clip.git
|
||||||
```
|
```
|
||||||
|
|
||||||
2. Run
|
2. Run
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import io
|
import io
|
||||||
import requests
|
import requests
|
||||||
|
@ -61,13 +53,3 @@ with torch.no_grad():
|
||||||
print("Label probs:", text_probs) # prints: [[1.0, 0.0, 0.0]]
|
print("Label probs:", text_probs) # prints: [[1.0, 0.0, 0.0]]
|
||||||
```
|
```
|
||||||
|
|
||||||
# Model architecture
|
|
||||||
The model was trained a ViT-B/16 Transformer architecture as an image encoder and uses a 12-layer BERT as a text encoder. The image encoder was initialized from the [AugReg `vit-base-patch16-224` model](https://github.com/google-research/vision_transformer).
|
|
||||||
|
|
||||||
# Training
|
|
||||||
The model was trained on [CC12M](https://github.com/google-research-datasets/conceptual-12m) translated the captions to Japanese.
|
|
||||||
|
|
||||||
|
|
||||||
# License
|
|
||||||
|
|
||||||
[The Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0)
|
|
||||||
|
|
2028
config.json
2028
config.json
File diff suppressed because it is too large
Load Diff
BIN
pytorch_model.bin (Stored with Git LFS)
BIN
pytorch_model.bin (Stored with Git LFS)
Binary file not shown.
Loading…
Reference in New Issue