Depth Estimation

Depth estimation predicts how far each pixel in an image is from the camera, producing a dense depth map from a single 2D photo or a stereo pair. This is a core capability for autonomous driving, robotics, augmented reality, and 3D scene reconstruction — any application that needs to understand spatial layout from camera input.

Monocular depth estimation (from a single image) has improved dramatically with deep learning. Models like MiDaS, DPT, and Depth Anything learn relative depth ordering from large-scale training data. ZoeDepth and Metric3D go further by predicting metric (absolute) depth values. These models pick up on visual cues that humans use intuitively: perspective convergence, object sizes, texture gradients, and occlusion patterns. Stereo depth estimation uses two images from slightly different viewpoints and computes depth through triangulation, similar to how human binocular vision works. RAFT-Stereo and CREStereo use deep networks to match correspondences between the image pair.

Depth maps are used as input to other vision tasks: 3D object detection from monocular cameras, point cloud generation for robotic grasping, AR object placement, and obstacle avoidance for drones and mobile robots.

Resources

Relevant Blog Posts ↘

Glossary

Our Blog

Documentation

A Comprehensive Guide to 3D Models for Medical Image Segmentation

MIN READ

March 4, 2026

This article introduces 3D segmentation, partitioning volumetric data into labeled regions for applications in medical imaging, robotics, and more. Focusing on 3D semantic segmentation, it uses the Swin UNETR architecture for brain tumor segmentation as an example. The article covers core concepts, training on the BraTS dataset including MRI normalization, input/output processing, computational challenges, and adapting Swin UNETR for 3D image classification.

Read

Get Started Now

Get Started using Datature’s computer vision platform now for free.

Book Demo