Training Metrics

Training metrics are the quantitative measurements used to evaluate how well a model is learning during training and how it performs on held-out data. Monitoring these metrics across epochs is the primary way to diagnose training problems, compare experiments, and decide when to stop training or adjust hyperparameters.

The most fundamental metric is loss (the objective function the optimizer is minimizing), tracked separately for training and validation sets. A decreasing training loss with a stable or increasing validation loss signals overfitting. Task-specific metrics provide more interpretable evaluation: accuracy and F1 score for classification, mAP (mean Average Precision) for detection, mIoU (mean Intersection over Union) for segmentation, and OKS (Object Keypoint Similarity) for pose estimation. Learning rate, gradient norms, and GPU utilization are operational metrics that help diagnose optimization issues.

Best practices include logging metrics to an experiment tracker (Weights & Biases, MLflow, TensorBoard), saving model checkpoints at the best validation metric rather than the last epoch, using early stopping to halt training when validation performance plateaus, and comparing metrics across experiments with controlled variables. A common mistake is optimizing for a metric that does not match the deployment objective (for example, maximizing accuracy on an imbalanced dataset when recall is what actually matters).

Get Started Now

Get Started using Datature’s computer vision platform now for free.