Regularization is a collection of techniques used during training to prevent overfitting, where a model memorizes the training data (including noise) instead of learning patterns that generalize to new data. An overfit model scores well on training images but poorly on anything it hasn't seen before.
The most common forms include: L2 regularization (weight decay), which adds a penalty proportional to the squared magnitude of weights, encouraging smaller, simpler weight values; L1 regularization, which promotes sparsity by penalizing the sum of absolute weights; dropout, which randomly turns off neurons during each training step, forcing the network to spread information across more neurons; and data augmentation, which creates training variations through random flips, crops, rotations, color changes, and more advanced transforms like Mosaic (YOLO), CutMix, and RandAugment.
Other regularization methods include batch normalization (which has an implicit regularizing effect through mini-batch noise), label smoothing (replacing hard 0/1 targets with softened values like 0.1/0.9), stochastic depth (randomly skipping entire layers during training), and early stopping (halting training when validation performance plateaus). In computer vision, data augmentation is typically the single most effective regularizer, often having more impact than all other techniques combined.