Skip to main content

ComfyUI Depth Anything 3 Introduction

Depth Anything 3 (DA3), from ByteDance Seed, is a vision transformer that recovers spatially consistent geometry from arbitrary visual inputs, with or without known camera poses. A single plain DINO encoder and unified depth-ray representation let one model family cover monocular depth, multi-view depth, camera pose estimation, and 3D reconstruction. Key capabilities:
  • Unified monocular & multi-view depth: estimate depth from a single image or multiple views
  • Camera pose estimation: recover camera positions from unordered image sets
  • 3D reconstruction from multi-view inputs
  • Video depth estimation: per-frame depth sequences for video inputs
  • Multiple model variants: Small, Base, Mono/Metric Large
Make sure your ComfyUI is updated.Workflows in this guide can be found in the Workflow Templates. If you can’t find them in the template, your ComfyUI may be outdated. (Desktop version’s update will delay sometime)If nodes are missing when loading a workflow, possible reasons:
  1. You are not using the latest ComfyUI version (Nightly version)
  2. Some nodes failed to import at startup
ComfyUI now natively supports Depth Anything 3 nodes. Make sure you have updated to the latest version of ComfyUI before starting.

Model Installation

Download the Depth Anything 3 checkpoint(s) and save them to the corresponding ComfyUI folder:
ComfyUI/
├── models/
│   ├── geometry_estimation/
│   │   ├── depth_anything_3_small.safetensors
│   │   ├── depth_anything_3_base.safetensors
│   │   ├── depth_anything_3_mono_large.safetensors
│   │   └── depth_anything_3_metric_large.safetensors

Example Workflows


1. Image Depth Estimation

What it does: Upload one image and run Image Depth Estimation (Depth Anything 3) to produce a depth map. The result is shown in Depth Preview, with a side-by-side comparison view of the original image and depth output.

Download Workflow

Download JSON or search “Depth Anything 3” in Template Library

Download Sample Image

Get the example input image for this workflow
Image Depth Estimation outputImage Depth Estimation comparison

Steps to Run

  1. LoadImage — load your input image
  2. LoadDA3Model — select a Depth Anything 3 variant
  3. Run — click Queue or use Cmd+Enter
  4. The workflow outputs a depth map and side-by-side comparison

Learn about Subgraph

This workflow uses Subgraph nodes for modular processing. Check out the Subgraph documentation to learn how to customize and extend the workflow.

2. Video Depth Estimation

What it does: Upload a video and run Video Depth Estimation (Depth Anything 3) to produce a per-frame depth sequence. Inside the subgraph, GetVideoComponents splits the input video into frames, LoadDA3Model loads the model, and SetVideoComponents reassembles the depth frames back into a video output.

Download Workflow

Download JSON or search “Depth Anything 3” in Template Library

Run on Comfy Cloud

Open in Comfy Cloud
Video Depth Estimation preview

Steps to Run

  1. LoadVideo — load your input video
  2. Select Model — choose between Small, Base, Mono-Large, or Metric-Large
  3. Run — click Queue or use Cmd+Enter
  4. The workflow outputs a video with per-frame depth maps

Learn about Subgraph

This workflow uses Subgraph nodes for modular processing. Check out the Subgraph documentation to learn how to customize and extend the workflow.

Model Variants

Varianthead_typehas_skyhas_confidencecamera_decoderBest for
SmalldualdptFast inference, mobile/edge
BasedualdptBalanced performance
Mono-LargedptMonocular depth with sky detection
Metric-LargedptPhysical metric depth in metres
  • Small and Base use the dualdpt head type with confidence estimation and camera decoder support for multi-view applications.
  • Mono-Large and Metric-Large use the dpt head type with sky detection. Metric-Large outputs raw depth in metres.

Community Resources