Ground truth is the set of correct, human-verified annotations that a machine learning model is trained and evaluated against. In object detection, ground truth consists of bounding boxes with class labels drawn around every object of interest in each image. In segmentation, it's pixel-level masks. In classification, it's the correct class tag. The model's job is to produce predictions that match the ground truth as closely as possible.
Ground truth quality sets the ceiling for model performance. If the bounding boxes are sloppy (not tight around objects), the model learns sloppy localization. If annotators disagree on ambiguous cases (is that shadow a crack or not?), the model gets conflicting training signals. If classes are mislabeled, the model learns wrong associations. This is why annotation guidelines, quality control workflows, and inter-annotator agreement checks matter as much as the model architecture itself.
In evaluation, ground truth serves as the reference for computing all metrics: IoU between predicted and ground truth boxes determines true/false positives, which flow into precision, recall, mAP, and F1 calculations. Some datasets have known ground truth errors (COCO has approximately 1-2% label noise), so perfect scores are neither expected nor necessarily desirable.
.jpg)

