diff --git a/README.md b/README.md index f09ce75..7421c88 100644 --- a/README.md +++ b/README.md @@ -44,7 +44,7 @@ The model card has been written in combination by the Hugging Face team and Inte | Version | 1 | | Type | Computer Vision - Monocular Depth Estimation | | Paper or Other Resources | [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) and [GitHub Repo](https://github.com/isl-org/DPT) | -| License | [MIT](https://github.com/isl-org/DPT/blob/main/LICENSE) | +| License | Apache 2.0 | | Questions or Comments | [Community Tab](https://huggingface.co/Intel/dpt-large/discussions) and [Intel Developers Discord](https://discord.gg/rv2Gp55UJQ)| | Intended Use | Description | @@ -53,6 +53,48 @@ The model card has been written in combination by the Hugging Face team and Inte | Primary intended users | Anyone doing monocular depth estimation | | Out-of-scope uses | This model in most cases will need to be fine-tuned for your particular task. | + +### How to use + +Here is how to use this model for zero-shot depth estimation on an image: + +```python +from transformers import DPTFeatureExtractor, DPTForDepthEstimation +import torch +import numpy as np +from PIL import Image +import requests + +url = "http://images.cocodataset.org/val2017/000000039769.jpg" +image = Image.open(requests.get(url, stream=True).raw) + +feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-large") +model = DPTForDepthEstimation.from_pretrained("Intel/dpt-large") + +# prepare image for the model +inputs = feature_extractor(images=image, return_tensors="pt") + +with torch.no_grad(): + outputs = model(**inputs) + predicted_depth = outputs.predicted_depth + +# interpolate to original size +prediction = torch.nn.functional.interpolate( + predicted_depth.unsqueeze(1), + size=image.size[::-1], + mode="bicubic", + align_corners=False, +) + +# visualize the prediction +output = prediction.squeeze().cpu().numpy() +formatted = (output * 255 / np.max(output)).astype("uint8") +depth = Image.fromarray(formatted) +``` + +For more code examples, we refer to the [documentation](https://huggingface.co/docs/transformers/master/en/model_doc/dpt). + + | Factors | Description | | ----------- | ----------- | | Groups | Multiple datasets compiled together | @@ -102,45 +144,6 @@ protocol defined in [30]. Relative performance is computed with respect to the o | There are no additional caveats or recommendations for this model. | -### How to use - -Here is how to use this model for zero-shot depth estimation on an image: - -```python -from transformers import DPTFeatureExtractor, DPTForDepthEstimation -import torch -import numpy as np -from PIL import Image -import requests - -url = "http://images.cocodataset.org/val2017/000000039769.jpg" -image = Image.open(requests.get(url, stream=True).raw) - -feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-large") -model = DPTForDepthEstimation.from_pretrained("Intel/dpt-large") - -# prepare image for the model -inputs = feature_extractor(images=image, return_tensors="pt") - -with torch.no_grad(): - outputs = model(**inputs) - predicted_depth = outputs.predicted_depth - -# interpolate to original size -prediction = torch.nn.functional.interpolate( - predicted_depth.unsqueeze(1), - size=image.size[::-1], - mode="bicubic", - align_corners=False, -) - -# visualize the prediction -output = prediction.squeeze().cpu().numpy() -formatted = (output * 255 / np.max(output)).astype("uint8") -depth = Image.fromarray(formatted) -``` - -For more code examples, we refer to the [documentation](https://huggingface.co/docs/transformers/master/en/model_doc/dpt). ### BibTeX entry and citation info