Backpropagation
Backpropagation is the algorithm neural networks use to learn from their mistakes. After the model makes a prediction and the loss function measures how wrong it was, backpropagation calculates how much each weight in the network contributed to that error. It does this by applying the chain rule from calculus, working backward from the output layer through each hidden layer to the input.
The result is a gradient for every weight in the network: a number that says "increase this weight to reduce the error" or "decrease this weight." An optimizer (like SGD or Adam) then adjusts the weights by a small amount in the direction that reduces the loss. This cycle of forward pass, loss computation, backward pass, and weight update repeats for every batch of training data, gradually improving the model's predictions.
Backpropagation works because neural networks are composed of differentiable operations (matrix multiplications, activation functions, pooling), so gradients can flow smoothly from output to input. Problems like vanishing gradients (gradients shrinking to near-zero in deep networks) and exploding gradients (gradients growing uncontrollably) led to architectural innovations like skip connections (ResNet), batch normalization, and careful weight initialization that keep gradient flow healthy in modern deep networks.