diff --git a/README.md b/README.md index 6cbcfd5..4a180ff 100644 --- a/README.md +++ b/README.md @@ -43,4 +43,107 @@ fine-tuned versions on a task that interests you. ### How to use -For code examples, we refer to the [documentation](https://huggingface.co/docs/transformers/main/en/model_doc/blip-2#transformers.Blip2ForConditionalGeneration.forward.example). \ No newline at end of file +For code examples, we refer to the [documentation](https://huggingface.co/docs/transformers/main/en/model_doc/blip-2#transformers.Blip2ForConditionalGeneration.forward.example), or refer to the snippets below depending on your usecase: + +#### Running the model on CPU + +
+ Click to expand + +```python +import requests +from PIL import Image +from transformers import BlipProcessor, Blip2ForConditionalGeneration + +processor = BlipProcessor.from_pretrained("Salesforce/blip2-flan-t5-xxl") +model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-flan-t5-xxl") + +img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg' +raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB') + +question = "how many dogs are in the picture?" +inputs = processor(raw_image, question, return_tensors="pt") + +out = model.generate(**inputs) +print(processor.decode(out[0], skip_special_tokens=True)) +``` +
+ +#### Running the model on GPU + +##### In full precision + +
+ Click to expand + +```python +# pip install accelerate +import requests +from PIL import Image +from transformers import Blip2Processor, Blip2ForConditionalGeneration + +processor = Blip2Processor.from_pretrained("Salesforce/blip2-flan-t5-xxl") +model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-flan-t5-xxl", device_map="auto") + +img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg' +raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB') + +question = "how many dogs are in the picture?" +inputs = processor(raw_image, question, return_tensors="pt").to("cuda") + +out = model.generate(**inputs) +print(processor.decode(out[0], skip_special_tokens=True)) +``` +
+ +##### In half precision (`float16`) + +
+ Click to expand + +```python +# pip install accelerate +import torch +import requests +from PIL import Image +from transformers import Blip2Processor, Blip2ForConditionalGeneration + +processor = Bli2pProcessor.from_pretrained("Salesforce/blip2-flan-t5-xxl") +model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-flan-t5-xxl", torch_dtype=torch.float16, device_map="auto") + +img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg' +raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB') + +question = "how many dogs are in the picture?" +inputs = processor(raw_image, question, return_tensors="pt").to("cuda", torch.float16) + +out = model.generate(**inputs) +print(processor.decode(out[0], skip_special_tokens=True)) +``` +
+ +##### In 8-bit precision (`int8`) + +
+ Click to expand + +```python +# pip install accelerate bitsandbytes +import torch +import requests +from PIL import Image +from transformers import Blip2Processor, Blip2ForConditionalGeneration + +processor = Bli2pProcessor.from_pretrained("Salesforce/blip2-flan-t5-xxl") +model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-flan-t5-xxl", load_in_8bit=True, device_map="auto") + +img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg' +raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB') + +question = "how many dogs are in the picture?" +inputs = processor(raw_image, question, return_tensors="pt").to("cuda", torch.float16) + +out = model.generate(**inputs) +print(processor.decode(out[0], skip_special_tokens=True)) +``` +
\ No newline at end of file