Small object detection focuses on finding and localizing targets that occupy a tiny fraction of the image area, typically under 32x32 pixels in standard benchmarks like COCO. Detecting these objects is harder than detecting large ones because they contain very few pixels of useful information, and downsampling through network layers can erase their features entirely before the detection head ever sees them.
Several architectural techniques address this challenge. Feature Pyramid Networks (FPN) build multi-scale feature maps so that shallow, high-resolution layers handle small objects while deeper layers handle large ones. Tiling or slicing strategies break large images into overlapping patches, run detection on each patch, then stitch results back together. SAHI (Slicing Aided Hyper Inference) automates this workflow and has become a standard tool for aerial imagery and surveillance applications where targets are often just a handful of pixels.
Training strategies matter as much as architecture. Oversampling images that contain small objects, using higher input resolutions, and tuning anchor box sizes to match the target scale all improve recall. In domains like satellite imaging, drone inspection, and microscopy, small object detection is not a niche concern but the primary challenge. Getting it right often means the difference between a useful system and one that misses most of what it should find.

