Semantic Segmentation

Semantic segmentation assigns a class label to every pixel in an image, producing a dense map where each pixel is colored according to its category (road, sidewalk, building, sky, person, car, etc.). Unlike object detection (which draws bounding boxes) or instance segmentation (which separates individual objects), semantic segmentation does not distinguish between separate instances of the same class. All pixels belonging to "road" get the same label, regardless of whether they belong to one road or multiple road segments.

Architectures for semantic segmentation follow the encoder-decoder pattern. The encoder (a CNN or ViT backbone) extracts features at progressively lower resolutions, and the decoder upsamples them back to the original image size for per-pixel classification. Key architectures include FCN (the first fully convolutional approach, 2015), U-Net (encoder-decoder with skip connections, dominant in medical imaging), DeepLabV3+ (atrous/dilated convolutions with ASPP for multi-scale context), and SegFormer (lightweight transformer-based, efficient for real-time applications).

Semantic segmentation is used in autonomous driving (parsing road scenes into drivable area, lanes, obstacles), medical imaging (delineating organs, tumors, and tissue types), satellite and aerial imagery (land use classification, building footprint extraction), agriculture (crop vs. weed classification), and robotics (understanding navigable surfaces). Evaluation uses mean Intersection over Union (mIoU), which averages the pixel-level IoU across all classes.

Get Started Now

Get Started using Datature’s computer vision platform now for free.