Dense Prediction Transformer (DPT) model trained on 1.4 million images for monocular depth estimation. It was introduced in the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by Ranftl et al. and first released in [this repository](https://github.com/isl-org/DPT). This repository hosts the "hybrid" version of the model as stated in the paper.
Disclaimer: The team releasing DPT did not write a model card for this model so this model card has been written by the Hugging Face team.
DPT-Hybrid diverges from DPT by using [ViT-hybrid](https://huggingface.co/google/vit-hybrid-base-bit-384) as a backbone and taking some activations from the backbone.
## Intended uses & limitations
You can use the raw model for zero-shot monocular depth estimation. See the [model hub](https://huggingface.co/models?search=dpt) to look for
fine-tuned versions on a task that interests you.
### How to use
Here is how to use this model for zero-shot depth estimation on an image:
```python
from PIL import Image
import numpy as np
import requests
import torch
from transformers import DPTForDepthEstimation, DPTFeatureExtractor
model = DPTForDepthEstimation.from_pretrained("Intel/dpt-hybrid-midas", low_cpu_mem_usage=True)