Intersection over Union (IoU) measures how well a predicted bounding box or mask overlaps with the ground truth. It's calculated by dividing the area of overlap between the two regions by the area of their union. An IoU of 1.0 means perfect overlap; 0.0 means no overlap at all. IoU is the most widely used metric for evaluating localization quality in object detection and segmentation.
In detection evaluation, IoU determines whether a prediction counts as correct. The standard COCO benchmark uses IoU = 0.5 as the threshold for a true positive (AP@0.5), and also reports the stricter AP@0.75 and the averaged AP@[0.5:0.95] (mean across thresholds from 0.5 to 0.95 in 0.05 steps). Higher IoU thresholds demand tighter bounding boxes, penalizing loose or offset predictions. During training, IoU-based losses (GIoU, DIoU, CIoU, SIoU) have replaced simple L1/L2 box regression losses because they directly optimize the overlap metric and handle non-overlapping boxes more gracefully.
For segmentation, IoU is computed per-pixel between the predicted mask and ground truth mask. Mean IoU (mIoU) averages across all classes and is the standard metric for semantic segmentation benchmarks like ADE20K and Cityscapes. Low IoU on specific classes often indicates class confusion at boundaries or insufficient training examples for that class.


