Depth Anything V2

Lihe Yang¹ · Bingyi Kang^2† · Zilong Huang²
Zhen Zhao · Xiaogang Xu · Jiashi Feng² · Hengshuang Zhao^1*

¹HKU ²TikTok
†project lead *corresponding author

This work presents Depth Anything V2. It significantly outperforms V1 in fine-grained details and robustness. Compared with SD-based models, it enjoys faster inference speed, fewer parameters, and higher depth accuracy.

News

2024-06-22: We release smaller metric depth models based on Depth-Anything-V2-Small and Base.
2024-06-20: Our repository and project page are flagged by GitHub and removed from the public for 6 days. Sorry for the inconvenience.
2024-06-14: Paper, project page, code, models, demo, and benchmark are all released.

Pre-trained Models

We provide four models of varying scales for robust relative depth estimation:

Model	Params	Checkpoint
Depth-Anything-V2-Small	24.8M	Download
Depth-Anything-V2-Base	97.5M	Download
Depth-Anything-V2-Large	335.3M	Download
Depth-Anything-V2-Giant	1.3B	Coming soon

Usage

Prepraration

git clone https://github.com/DepthAnything/Depth-Anything-V2
cd Depth-Anything-V2
pip install -r requirements.txt

Download the checkpoints listed here and put them under the checkpoints directory.

Use our models

import cv2
import torch

from depth_anything_v2.dpt import DepthAnythingV2

model_configs = {
    'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
    'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
    'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
    'vitg': {'encoder': 'vitg', 'features': 384, 'out_channels': [1536, 1536, 1536, 1536]}
}

encoder = 'vitl' # or 'vits', 'vitb', 'vitg'

model = DepthAnythingV2(**model_configs[encoder])
model.load_state_dict(torch.load(f'checkpoints/depth_anything_v2_{encoder}.pth', map_location='cpu'))
model.eval()

raw_img = cv2.imread('your/image/path')
depth = model.infer_image(raw_img) # HxW raw depth map in numpy

Running script on images

python run.py \
  --encoder <vits | vitb | vitl | vitg> \
  --img-path <path> --outdir <outdir> \
  [--input-size <size>] [--pred-only] [--grayscale]

Options:

--img-path: You can either 1) point it to an image directory storing all interested images, 2) point it to a single image, or 3) point it to a text file storing all image paths.
--input-size (optional): By default, we use input size 518 for model inference. You can increase the size for even more fine-grained results.
--pred-only (optional): Only save the predicted depth map, without raw image.
--grayscale (optional): Save the grayscale depth map, without applying color palette.

For example:

python run.py --encoder vitl --img-path assets/examples --outdir depth_vis

Running script on videos

python run_video.py \
  --encoder <vits | vitb | vitl | vitg> \
  --video-path assets/examples_video --outdir video_depth_vis \
  [--input-size <size>] [--pred-only] [--grayscale]

Our larger model has better temporal consistency on videos.

Gradio demo

To use our gradio demo locally:

python app.py

You can also try our online demo.

Note: Compared to V1, we have made a minor modification to the DINOv2-DPT architecture (originating from this issue). In V1, we unintentionally used features from the last four layers of DINOv2 for decoding. In V2, we use intermediate features instead. Although this modification did not improve details or accuracy, we decided to follow this common practice.

Fine-tuned to Metric Depth Estimation

Please refer to metric depth estimation.

DA-2K Evaluation Benchmark

Please refer to DA-2K benchmark.

Community Support

We sincerely appreciate all the community support for our Depth Anything series. Thank you a lot!

TensorRT: https://github.com/spacewalk01/depth-anything-tensorrt
ComfyUI: https://github.com/kijai/ComfyUI-DepthAnythingV2
Transformers.js (real-time depth in web): https://huggingface.co/spaces/Xenova/webgpu-realtime-depth-estimation
Android:
- https://github.com/shubham0204/Depth-Anything-Android
- https://github.com/FeiGeChuanShu/ncnn-android-depth_anything

LICENSE

Depth-Anything-V2-Small model is under the Apache-2.0 license. Depth-Anything-V2-Base/Large/Giant models are under the CC-BY-NC-4.0 license.

Citation

If you find this project useful, please consider citing:

@article{depth_anything_v2,
  title={Depth Anything V2},
  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
  journal={arXiv:2406.09414},
  year={2024}
}

@inproceedings{depth_anything_v1,
  title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, 
  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
  booktitle={CVPR},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Depth Anything V2

News

Pre-trained Models

Usage

Prepraration

Use our models

Running script on images

Running script on videos

Gradio demo

Fine-tuned to Metric Depth Estimation

DA-2K Evaluation Benchmark

Community Support

LICENSE

Citation

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
depth_anything_v2		depth_anything_v2
metric_depth		metric_depth
DA-2K.md		DA-2K.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
run.py		run.py
run_video.py		run_video.py

License

DepthAnything/Depth-Anything-V2

Folders and files

Latest commit

History

Repository files navigation

Depth Anything V2

News

Pre-trained Models

Usage

Prepraration

Use our models

Running script on images

Running script on videos

Gradio demo

Fine-tuned to Metric Depth Estimation

DA-2K Evaluation Benchmark

Community Support

LICENSE

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages