diff --git a/README.md b/README.md
index 6a2cdd1..e6ff437 100644
--- a/README.md
+++ b/README.md
@@ -22,7 +22,7 @@ Disclaimer: The team releasing DPT did not write a model card for this model so
 
 DPT uses the Vision Transformer (ViT) as backbone and adds a neck + head on top for monocular depth estimation.
 
-![model image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/dpt_architecture.png)
+![model image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/dpt_architecture.jpg)
 
 ## Intended uses & limitations