COCO

COCO (Common Objects in Context) is one of the most widely used benchmark datasets in computer vision. It contains over 200,000 labeled images covering 80 object categories with bounding boxes, segmentation masks, keypoints, and captions. Researchers and practitioners use COCO to train models, evaluate performance, and compare results against published baselines.

The COCO annotation format has become a standard beyond the dataset itself. It stores annotations in JSON with fields for image metadata, category definitions, and per-instance annotations including bounding boxes (x, y, width, height), segmentation polygons (lists of vertices), and keypoint coordinates. Most detection and segmentation frameworks (Detectron2, MMDetection, Ultralytics) accept COCO-format input natively.

COCO evaluation metrics are the standard reporting format for object detection and segmentation research. The primary metric is mAP (mean Average Precision) averaged across IoU thresholds from 0.50 to 0.95 in steps of 0.05, written as AP@[.50:.95]. COCO also reports AP@.50 (the VOC-style metric), AP@.75 (a stricter threshold), and AP broken down by object size (small, medium, large). When a paper reports "AP" without qualification, it almost always means COCO AP@[.50:.95].

Resources

Relevant Blog Posts ↘

Glossary

Our Blog

Documentation

YOLO11: Step-by-Step Training on Custom Data and Comparison with YOLOv8

MIN READ

March 7, 2026

Ultralytics YOLO11 represents the latest breakthrough in real-time object detection, building on YOLOv8 to address the need for quicker and more accurate predictions in fields such as self-driving cars and surveillance. This article presents a step-by-step guide to training an object detection model using YOLO11 on a crop dataset, comparing its performance with YOLOv8 to showcase its capabilities and emphasize its effectiveness in high-demand situations.

Read

Get Started Now

Get Started using Datature’s computer vision platform now for free.

Book Demo