Depth Estimation

Depth estimation predicts how far each pixel in an image is from the camera, producing a dense depth map from a single 2D photo or a stereo pair. This is a core capability for autonomous driving, robotics, augmented reality, and 3D scene reconstruction — any application that needs to understand spatial layout from camera input.

Monocular depth estimation (from a single image) has improved dramatically with deep learning. Models like MiDaS, DPT, and Depth Anything learn relative depth ordering from large-scale training data. ZoeDepth and Metric3D go further by predicting metric (absolute) depth values. These models pick up on visual cues that humans use intuitively: perspective convergence, object sizes, texture gradients, and occlusion patterns. Stereo depth estimation uses two images from slightly different viewpoints and computes depth through triangulation, similar to how human binocular vision works. RAFT-Stereo and CREStereo use deep networks to match correspondences between the image pair.

Depth maps are used as input to other vision tasks: 3D object detection from monocular cameras, point cloud generation for robotic grasping, AR object placement, and obstacle avoidance for drones and mobile robots.

Get Started Now

Get Started using Datature’s platform now for free.