Dropout is a regularization technique that randomly deactivates a fraction of neurons during each training step. For every forward pass, each neuron has a probability (typically 0.2-0.5) of being temporarily "turned off," meaning its output is set to zero. This forces the network to not rely on any single neuron or small group of neurons, spreading learned features across the network more evenly.
The effect is similar to training an ensemble of many smaller networks that share weights. At inference time, dropout is turned off and all neurons are active, but their outputs are scaled down by the dropout probability to account for the fact that more neurons are now contributing. This train-time noise acts as a strong regularizer, reducing overfitting on small datasets without requiring additional data.
In modern convolutional networks, dropout is less common in intermediate layers (BatchNorm provides similar regularization), but it's still widely used in fully connected classification heads. Spatial dropout (dropping entire feature map channels instead of individual neurons) works better for convolutional layers. Stochastic depth (randomly dropping entire residual blocks) applies the same principle at the layer level and is standard in training deep transformers and EfficientNet variants.