Anchor Box

An anchor box is a predefined rectangular template used by certain object detection models to propose candidate object locations. Before the model sees any image, it defines a set of boxes at various sizes and aspect ratios (for example, 1:1, 1:2, 2:1) tiled across every position in the feature map. During training, the model learns to classify each anchor as object or background and to refine its coordinates to match the actual object boundaries.

Anchor-based detectors like Faster R-CNN, SSD, and earlier YOLO versions (v2 through v5) rely heavily on this mechanism. The anchor sizes are usually set by clustering the bounding box dimensions in the training dataset using k-means, so the anchors match the typical object shapes in that domain. Getting the anchor configuration wrong (too few sizes, wrong aspect ratios) directly hurts detection accuracy, especially for objects with unusual proportions.

The trend in recent architectures has moved away from anchors. Anchor-free detectors like FCOS, CenterNet, and the latest YOLO versions predict object locations directly from feature map points, removing the need to predefine box templates. Transformer-based detectors like DETR use learned object queries instead of anchors entirely. These approaches simplify the training pipeline and eliminate anchor-related hyperparameter tuning.

Get Started Now

Get Started using Datature’s computer vision platform now for free.