Compare commits

..

No commits in common. "56fa1691779eaa22d603ca6ffa463f9adc05ac5f" and "2786583d5e4f113e69dd8fc7873537f38e681f04" have entirely different histories.

5 changed files with 83 additions and 202 deletions

159
README.md
View File

@ -1,159 +0,0 @@
---
language: en
license: mit
tags:
- vision
- image-to-text
- image-captioning
- visual-question-answering
pipeline_tag: image-to-text
inference: false
---
# BLIP-2, Flan T5-xl, pre-trained only
BLIP-2 model, leveraging [Flan T5-xl](https://huggingface.co/google/flan-t5-xl) (a large language model).
It was introduced in the paper [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/abs/2301.12597) by Li et al. and first released in [this repository](https://github.com/salesforce/LAVIS/tree/main/projects/blip2).
Disclaimer: The team releasing BLIP-2 did not write a model card for this model so this model card has been written by the Hugging Face team.
## Model description
BLIP-2 consists of 3 models: a CLIP-like image encoder, a Querying Transformer (Q-Former) and a large language model.
The authors initialize the weights of the image encoder and large language model from pre-trained checkpoints and keep them frozen
while training the Querying Transformer, which is a BERT-like Transformer encoder that maps a set of "query tokens" to query embeddings,
which bridge the gap between the embedding space of the image encoder and the large language model.
The goal for the model is simply to predict the next text token, giving the query embeddings and the previous text.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/blip2_architecture.jpg"
alt="drawing" width="600"/>
This allows the model to be used for tasks like:
- image captioning
- visual question answering (VQA)
- chat-like conversations by feeding the image and the previous conversation as prompt to the model
## Direct Use and Downstream Use
You can use the raw model for conditional text generation given an image and optional text. See the [model hub](https://huggingface.co/models?search=Salesforce/blip) to look for
fine-tuned versions on a task that interests you.
## Bias, Risks, Limitations, and Ethical Considerations
BLIP2-FlanT5 uses off-the-shelf Flan-T5 as the language model. It inherits the same risks and limitations from [Flan-T5](https://arxiv.org/pdf/2210.11416.pdf):
> Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.
BLIP2 is fine-tuned on image-text datasets (e.g. [LAION](https://laion.ai/blog/laion-400-open-dataset/) ) collected from the internet. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data.
BLIP2 has not been tested in real world applications. It should not be directly deployed in any applications. Researchers should first carefully assess the safety and fairness of the model in relation to the specific context theyre being deployed within.
### How to use
For code examples, we refer to the [documentation](https://huggingface.co/docs/transformers/main/en/model_doc/blip-2#transformers.Blip2ForConditionalGeneration.forward.example).
#### Running the model on CPU
<details>
<summary> Click to expand </summary>
```python
import requests
from PIL import Image
from transformers import BlipProcessor, Blip2ForConditionalGeneration
processor = BlipProcessor.from_pretrained("Salesforce/blip2-flan-t5-xl")
model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-flan-t5-xl")
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
question = "how many dogs are in the picture?"
inputs = processor(raw_image, question, return_tensors="pt")
out = model.generate(**inputs)
print(processor.decode(out[0], skip_special_tokens=True))
```
</details>
#### Running the model on GPU
##### In full precision
<details>
<summary> Click to expand </summary>
```python
# pip install accelerate
import requests
from PIL import Image
from transformers import Blip2Processor, Blip2ForConditionalGeneration
processor = Blip2Processor.from_pretrained("Salesforce/blip2-flan-t5-xl")
model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-flan-t5-xl", device_map="auto")
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
question = "how many dogs are in the picture?"
inputs = processor(raw_image, question, return_tensors="pt").to("cuda")
out = model.generate(**inputs)
print(processor.decode(out[0], skip_special_tokens=True))
```
</details>
##### In half precision (`float16`)
<details>
<summary> Click to expand </summary>
```python
# pip install accelerate
import torch
import requests
from PIL import Image
from transformers import Blip2Processor, Blip2ForConditionalGeneration
processor = Blip2Processor.from_pretrained("Salesforce/blip2-flan-t5-xl")
model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-flan-t5-xl", torch_dtype=torch.float16, device_map="auto")
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
question = "how many dogs are in the picture?"
inputs = processor(raw_image, question, return_tensors="pt").to("cuda", torch.float16)
out = model.generate(**inputs)
print(processor.decode(out[0], skip_special_tokens=True))
```
</details>
##### In 8-bit precision (`int8`)
<details>
<summary> Click to expand </summary>
```python
# pip install accelerate bitsandbytes
import torch
import requests
from PIL import Image
from transformers import Blip2Processor, Blip2ForConditionalGeneration
processor = Blip2Processor.from_pretrained("Salesforce/blip2-flan-t5-xl")
model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-flan-t5-xl", load_in_8bit=True, device_map="auto")
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
question = "how many dogs are in the picture?"
inputs = processor(raw_image, question, return_tensors="pt").to("cuda", torch.float16)
out = model.generate(**inputs)
print(processor.decode(out[0], skip_special_tokens=True))
```
</details>

View File

@ -94,7 +94,7 @@
], ],
"bad_words_ids": null, "bad_words_ids": null,
"begin_suppress_tokens": null, "begin_suppress_tokens": null,
"bos_token_id": 1, "bos_token_id": null,
"chunk_size_feed_forward": 0, "chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null, "cross_attention_hidden_size": null,
"d_ff": 5120, "d_ff": 5120,

View File

@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1 version https://git-lfs.github.com/spec/v1
oid sha256:78da3eebf4732877fd22b73dc2ed57dae30c466d5a44b7c329de376fa4c79a88 oid sha256:537bb68668d2cc0e84297ee5c23a77e5a86fc28f528afc642155bf5b02dc7e7f
size 9441403325 size 9441197415

View File

@ -1,6 +1,6 @@
{ {
"metadata": { "metadata": {
"total_size": 16296171520 "total_size": 16295951872
}, },
"weight_map": { "weight_map": {
"language_model.decoder.block.0.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin", "language_model.decoder.block.0.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
@ -834,8 +834,9 @@
"vision_model.encoder.layers.0.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.0.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.0.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.0.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.0.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.0.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.0.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.0.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.0.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.0.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.0.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.1.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.1.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.1.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.1.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.1.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.1.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -846,8 +847,9 @@
"vision_model.encoder.layers.1.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.1.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.1.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.1.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.1.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.1.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.1.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.1.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.1.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.1.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.1.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.10.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.10.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.10.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.10.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.10.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.10.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -858,8 +860,9 @@
"vision_model.encoder.layers.10.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.10.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.10.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.10.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.10.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.10.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.10.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.10.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.10.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.10.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.10.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.11.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.11.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.11.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.11.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.11.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.11.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -870,8 +873,9 @@
"vision_model.encoder.layers.11.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.11.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.11.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.11.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.11.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.11.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.11.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.11.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.11.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.11.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.11.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.12.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.12.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.12.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.12.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.12.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.12.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -882,8 +886,9 @@
"vision_model.encoder.layers.12.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.12.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.12.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.12.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.12.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.12.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.12.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.12.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.12.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.12.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.12.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.13.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.13.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.13.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.13.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.13.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.13.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -894,8 +899,9 @@
"vision_model.encoder.layers.13.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.13.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.13.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.13.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.13.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.13.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.13.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.13.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.13.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.13.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.13.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.14.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.14.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.14.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.14.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.14.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.14.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -906,8 +912,9 @@
"vision_model.encoder.layers.14.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.14.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.14.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.14.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.14.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.14.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.14.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.14.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.14.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.14.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.14.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.15.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.15.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.15.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.15.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.15.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.15.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -918,8 +925,9 @@
"vision_model.encoder.layers.15.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.15.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.15.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.15.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.15.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.15.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.15.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.15.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.15.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.15.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.15.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.16.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.16.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.16.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.16.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.16.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.16.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -930,8 +938,9 @@
"vision_model.encoder.layers.16.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.16.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.16.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.16.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.16.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.16.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.16.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.16.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.16.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.16.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.16.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.17.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.17.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.17.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.17.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.17.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.17.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -942,8 +951,9 @@
"vision_model.encoder.layers.17.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.17.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.17.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.17.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.17.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.17.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.17.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.17.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.17.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.17.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.17.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.18.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.18.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.18.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.18.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.18.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.18.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -954,8 +964,9 @@
"vision_model.encoder.layers.18.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.18.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.18.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.18.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.18.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.18.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.18.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.18.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.18.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.18.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.18.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.19.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.19.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.19.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.19.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.19.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.19.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -966,8 +977,9 @@
"vision_model.encoder.layers.19.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.19.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.19.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.19.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.19.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.19.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.19.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.19.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.19.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.19.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.19.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.2.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.2.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.2.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.2.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.2.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.2.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -978,8 +990,9 @@
"vision_model.encoder.layers.2.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.2.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.2.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.2.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.2.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.2.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.2.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.2.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.2.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.2.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.2.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.20.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.20.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.20.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.20.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.20.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.20.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -990,8 +1003,9 @@
"vision_model.encoder.layers.20.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.20.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.20.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.20.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.20.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.20.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.20.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.20.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.20.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.20.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.20.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.21.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.21.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.21.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.21.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.21.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.21.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1002,8 +1016,9 @@
"vision_model.encoder.layers.21.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.21.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.21.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.21.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.21.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.21.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.21.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.21.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.21.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.21.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.21.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.22.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.22.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.22.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.22.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.22.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.22.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1014,8 +1029,9 @@
"vision_model.encoder.layers.22.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.22.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.22.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.22.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.22.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.22.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.22.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.22.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.22.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.22.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.22.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.23.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.23.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.23.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.23.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.23.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.23.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1026,8 +1042,9 @@
"vision_model.encoder.layers.23.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.23.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.23.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.23.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.23.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.23.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.23.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.23.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.23.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.23.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.23.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.24.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.24.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.24.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.24.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.24.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.24.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1038,8 +1055,9 @@
"vision_model.encoder.layers.24.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.24.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.24.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.24.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.24.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.24.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.24.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.24.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.24.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.24.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.24.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.25.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.25.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.25.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.25.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.25.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.25.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1050,8 +1068,9 @@
"vision_model.encoder.layers.25.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.25.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.25.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.25.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.25.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.25.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.25.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.25.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.25.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.25.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.25.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.26.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.26.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.26.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.26.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.26.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.26.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1062,8 +1081,9 @@
"vision_model.encoder.layers.26.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.26.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.26.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.26.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.26.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.26.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.26.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.26.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.26.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.26.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.26.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.27.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.27.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.27.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.27.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.27.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.27.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1074,8 +1094,9 @@
"vision_model.encoder.layers.27.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.27.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.27.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.27.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.27.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.27.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.27.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.27.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.27.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.27.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.27.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.28.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.28.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.28.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.28.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.28.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.28.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1086,8 +1107,9 @@
"vision_model.encoder.layers.28.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.28.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.28.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.28.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.28.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.28.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.28.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.28.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.28.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.28.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.28.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.29.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.29.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.29.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.29.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.29.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.29.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1098,8 +1120,9 @@
"vision_model.encoder.layers.29.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.29.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.29.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.29.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.29.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.29.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.29.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.29.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.29.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.29.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.29.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.3.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.3.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.3.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.3.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.3.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.3.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1110,8 +1133,9 @@
"vision_model.encoder.layers.3.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.3.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.3.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.3.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.3.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.3.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.3.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.3.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.3.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.3.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.3.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.30.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.30.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.30.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.30.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.30.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.30.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1122,8 +1146,9 @@
"vision_model.encoder.layers.30.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.30.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.30.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.30.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.30.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.30.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.30.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.30.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.30.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.30.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.30.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.31.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.31.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.31.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.31.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.31.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.31.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1134,8 +1159,9 @@
"vision_model.encoder.layers.31.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.31.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.31.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.31.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.31.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.31.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.31.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.31.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.31.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.31.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.31.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.32.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.32.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.32.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.32.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.32.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.32.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1146,8 +1172,9 @@
"vision_model.encoder.layers.32.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.32.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.32.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.32.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.32.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.32.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.32.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.32.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.32.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.32.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.32.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.33.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.33.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.33.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.33.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.33.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.33.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1158,8 +1185,9 @@
"vision_model.encoder.layers.33.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.33.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.33.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.33.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.33.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.33.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.33.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.33.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.33.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.33.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.33.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.34.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.34.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.34.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.34.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.34.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.34.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1170,8 +1198,9 @@
"vision_model.encoder.layers.34.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.34.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.34.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.34.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.34.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.34.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.34.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.34.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.34.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.34.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.34.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.35.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.35.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.35.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.35.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.35.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.35.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1182,8 +1211,9 @@
"vision_model.encoder.layers.35.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.35.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.35.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.35.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.35.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.35.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.35.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.35.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.35.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.35.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.35.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.36.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.36.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.36.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.36.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.36.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.36.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1194,8 +1224,9 @@
"vision_model.encoder.layers.36.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.36.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.36.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.36.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.36.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.36.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.36.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.36.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.36.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.36.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.36.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.37.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.37.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.37.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.37.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.37.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.37.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1206,8 +1237,9 @@
"vision_model.encoder.layers.37.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.37.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.37.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.37.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.37.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.37.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.37.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.37.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.37.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.37.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.37.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.38.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.38.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.38.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.38.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.38.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.38.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1218,8 +1250,9 @@
"vision_model.encoder.layers.38.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.38.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.38.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.38.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.38.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.38.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.38.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.38.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.38.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.38.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.38.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.4.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.4.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.4.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.4.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.4.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.4.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1230,8 +1263,9 @@
"vision_model.encoder.layers.4.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.4.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.4.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.4.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.4.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.4.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.4.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.4.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.4.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.4.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.4.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.5.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.5.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.5.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.5.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.5.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.5.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1242,8 +1276,9 @@
"vision_model.encoder.layers.5.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.5.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.5.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.5.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.5.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.5.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.5.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.5.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.5.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.5.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.5.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.6.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.6.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.6.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.6.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.6.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.6.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1254,8 +1289,9 @@
"vision_model.encoder.layers.6.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.6.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.6.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.6.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.6.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.6.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.6.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.6.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.6.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.6.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.6.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.7.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.7.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.7.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.7.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.7.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.7.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1266,8 +1302,9 @@
"vision_model.encoder.layers.7.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.7.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.7.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.7.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.7.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.7.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.7.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.7.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.7.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.7.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.7.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.8.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.8.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.8.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.8.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.8.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.8.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1278,8 +1315,9 @@
"vision_model.encoder.layers.8.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.8.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.8.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.8.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.8.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.8.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.8.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.8.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.8.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.8.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.8.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.9.layer_norm1.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.9.layer_norm1.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.9.layer_norm1.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.9.layer_norm1.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.9.layer_norm2.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.9.layer_norm2.bias": "pytorch_model-00001-of-00002.bin",
@ -1290,8 +1328,9 @@
"vision_model.encoder.layers.9.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.9.mlp.fc2.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.9.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.9.self_attn.projection.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.9.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.9.self_attn.projection.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.9.self_attn.qkv.bias": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.9.self_attn.q_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.9.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin", "vision_model.encoder.layers.9.self_attn.qkv.weight": "pytorch_model-00001-of-00002.bin",
"vision_model.encoder.layers.9.self_attn.v_bias": "pytorch_model-00001-of-00002.bin",
"vision_model.post_layernorm.bias": "pytorch_model-00001-of-00002.bin", "vision_model.post_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"vision_model.post_layernorm.weight": "pytorch_model-00001-of-00002.bin" "vision_model.post_layernorm.weight": "pytorch_model-00001-of-00002.bin"
} }

View File

@ -104,6 +104,7 @@
"eos_token": "</s>", "eos_token": "</s>",
"extra_ids": 100, "extra_ids": 100,
"model_max_length": 512, "model_max_length": 512,
"name_or_path": "google/flan-t5-xl",
"pad_token": "<pad>", "pad_token": "<pad>",
"processor_class": "Blip2Processor", "processor_class": "Blip2Processor",
"sp_model_kwargs": {}, "sp_model_kwargs": {},