An epoch is one complete pass through the entire training dataset. If you have 10,000 training images and a batch size of 32, one epoch consists of ~313 training steps (10,000 / 32). Most models train for multiple epochs, seeing each image many times, because a single pass rarely extracts all learnable patterns from the data.
The number of epochs is a key training hyperparameter. Too few epochs and the model underfits (hasn't learned enough). Too many and the model overfits (memorizes training data instead of generalizing). The sweet spot depends on dataset size, model capacity, learning rate schedule, and augmentation. YOLO models typically train for 100-300 epochs on custom datasets, while fine-tuning a pre-trained model might need only 10-50 epochs.
In practice, you rarely pick a fixed epoch count upfront. Early stopping monitors validation loss and halts training when performance stops improving, automatically selecting the right number of epochs. Learning rate schedulers (cosine annealing, step decay, warm restarts) adjust the learning rate across epochs to balance fast initial progress with fine-grained convergence later. The training loss curve over epochs is the primary diagnostic for spotting problems: a gap between training and validation loss signals overfitting, while a flat training loss suggests the learning rate is too low.

