The Segment Anything Model (SAM) is a foundation model for image segmentation developed by Meta AI, trained on the SA-1B dataset with over 1 billion masks across 11 million images. Released in April 2023, SAM introduced promptable segmentation: given an image and a prompt (a point click, bounding box, text description, or rough mask), it produces a high-quality segmentation mask in real time without any task-specific training.
The architecture has three parts. A heavyweight image encoder (ViT-based) processes the image once and produces feature embeddings. A lightweight prompt encoder converts the user's click, box, or text into a prompt embedding. A fast mask decoder (transformer-based, runs in about 50ms) combines both to predict the final mask. This design means the expensive image encoding runs once, and users can query multiple objects interactively with near-instant response.
SAM 2 (2024) extended the model to video, adding memory-based temporal propagation so you can click on an object in one frame and track its mask across the entire video. SAM2Long improved long-video performance through memory tree search. SAM 3 (2025) advanced boundary quality and multi-granularity outputs. SAM's impact has been substantial: it enables zero-shot segmentation across domains (medical, satellite, industrial) and powers interactive annotation tools, including Datature Nexus's smart segmentation feature for one-click mask generation.

