COCO
COCO (Common Objects in Context) is one of the most widely used benchmark datasets in computer vision. It contains over 200,000 labeled images covering 80 object categories with bounding boxes, segmentation masks, keypoints, and captions. Researchers and practitioners use COCO to train models, evaluate performance, and compare results against published baselines.
The COCO annotation format has become a standard beyond the dataset itself. It stores annotations in JSON with fields for image metadata, category definitions, and per-instance annotations including bounding boxes (x, y, width, height), segmentation polygons (lists of vertices), and keypoint coordinates. Most detection and segmentation frameworks (Detectron2, MMDetection, Ultralytics) accept COCO-format input natively.
COCO evaluation metrics are the standard reporting format for object detection and segmentation research. The primary metric is mAP (mean Average Precision) averaged across IoU thresholds from 0.50 to 0.95 in steps of 0.05, written as AP@[.50:.95]. COCO also reports AP@.50 (the VOC-style metric), AP@.75 (a stricter threshold), and AP broken down by object size (small, medium, large). When a paper reports "AP" without qualification, it almost always means COCO AP@[.50:.95].
