Class Imbalance

Class imbalance happens when some classes in a dataset have far more training examples than others. A pedestrian detection dataset might have 50,000 background patches for every 500 pedestrian patches. A defect detection dataset might be 99% good parts and 1% defective. Left unaddressed, models trained on imbalanced data learn to predict the majority class almost exclusively — achieving high accuracy while missing the cases that actually matter.

Fixes fall into three categories. Data-level: oversampling the minority class (random duplication or SMOTE-style synthesis), undersampling the majority class, or using augmentation to generate more minority examples. Loss-level: Focal Loss down-weights easy/majority examples so the model focuses on hard/minority cases, weighted cross-entropy assigns higher loss to minority classes, and class-balanced losses scale inversely with class frequency. Architecture-level: two-stage detectors that separate region proposal from classification, or hard example mining that feeds the model its worst predictions for retraining.

In practice, combining Focal Loss with targeted augmentation on rare classes works well for most detection and segmentation tasks.

Resources

Relevant Blog Posts ↘

Glossary

Our Blog

Documentation

Solving Class Imbalance: Upsampling in Machine Learning Projects

MIN READ

March 4, 2026

Class imbalance represents a significant challenge in machine learning where the distribution of classes in a dataset is heavily skewed. When one class substantially outnumbers others, models tend to develop a bias toward the majority class, potentially compromising their performance on minority classes. In this article we explore upsampling as an effective strategy to address class imbalance problems.

Read

Performing Image Augmentation For Machine Learning

MIN READ

March 4, 2026

An in-depth guide in understanding the uses & importance of image augmentation and how to perform it using Keras and the Datature platform.

Read

Get Started Now

Get Started using Datature’s computer vision platform now for free.

Book Demo