Batch Normalization

Batch normalization (BatchNorm) normalizes the inputs to each layer during training by adjusting them to have zero mean and unit variance across the current mini-batch. Introduced by Ioffe and Szegedy in 2015, it addressed a practical problem: as weights update during training, the distribution of inputs to each layer shifts constantly, making optimization slow and unstable.

BatchNorm computes the mean and variance of activations across all samples in a mini-batch, normalizes them, then applies two learnable parameters (scale and shift) that let the network recover any representation it needs. During inference, it uses running averages computed during training instead of batch statistics. This stabilizes training, allows higher learning rates, and reduces sensitivity to weight initialization. It also adds a slight regularization effect because the per-batch statistics introduce noise.

Layer Normalization (used in transformers, normalizes across features per sample) and Group Normalization (normalizes across channel groups, works with small batches) are common alternatives. Nearly every CNN-based vision model — ResNet, EfficientNet, YOLO — uses BatchNorm after convolutional layers as standard practice.

Stay Tuned For New Articles
Get Started Now

Get Started using Datature’s platform now for free.