Feature Pyramid Network (FPN)

A Feature Pyramid Network (FPN) is a multi-scale feature extraction architecture that builds a top-down pathway with lateral connections to produce rich feature maps at every resolution level. The problem it solves: deep neural networks naturally produce features at multiple scales as they downsample through layers, but only the deepest features (smallest spatial resolution) carry strong semantic meaning, while shallow features (high resolution) retain spatial detail but lack semantic context. FPN combines both.

The architecture works in two passes. The bottom-up pass is just the normal forward pass through a backbone like ResNet, producing feature maps at progressively lower resolutions (1/4, 1/8, 1/16, 1/32 of input size). The top-down pass starts from the deepest, most semantic features and upsamples them, merging with the corresponding bottom-up features via lateral (1x1 convolution) connections at each level. The result is a pyramid of feature maps that all carry strong semantics but at different spatial resolutions.

FPN is the default neck in most modern detectors. YOLO uses PANet (a bidirectional version), EfficientDet uses BiFPN (weighted bidirectional), and even transformer-based detectors often use FPN-style multi-scale features. Without multi-scale features, detectors struggle with objects at extreme sizes, especially small ones.

Get Started Now

Get Started using Datature’s platform now for free.