Introducing Class Metrics and Low Confidence Sampling for Deeper Model Evaluation Insights

This article introduces the concepts of evaluation class metrics and low confidence sampling and how they can enable deeper model evaluation insights that can improve your computer vision’s model performance using a helmet detection model as an example

Marcus Neo
Editor

To build a high-quality computer vision model, its performance must be consistently accurate across all classes and data types it has been trained on, even when dealing with ambiguous features or underrepresented categories. Inconsistent results can undermine the model's reliability, making it unsuitable for real-world applications. A granular analysis is essential to pinpoint and resolve these weaknesses, ensuring the model performs optimally in all scenarios. In this article, we’ll introduce the concepts of evaluation class metrics and low confidence sampling and how they can enable deeper model evaluation insights that can improve your computer vision’s model performance.

What are Evaluation Class Metrics?

In computer vision, evaluation class metrics assess how well a model performs at classifying objects within images. Key metrics include precision, recall, F1-score, and accuracy, which measure the model's ability to correctly identify objects, minimize false positives/negatives, and provide an overall performance evaluation across different object classes. 

Consider the COCO evaluation framework. COCO, or Common Objects in Context, offers a comprehensive set of metrics tailored for object detection models. Among these are precision and recall, calculated at various intersections over union (IoU) thresholds, which help gauge model performance in real-world scenarios.

Precision and Recall Formula for COCO Evaluation Framework

Class metrics refine this by breaking down overall metrics for each class in the dataset. Analyzing performance at this level reveals specific areas of underperformance, allowing us to make targeted adjustments in future training and improve the model's effectiveness on weaker classes. 

What is Low-Confidence Sampling?

Low-confidence sampling is a technique used in training computer vision models to identify instances where the model exhibits uncertainty in its predictions. These low-confidence samples often stem from ambiguous data or challenging features within the dataset.

By pinpointing these instances, we can update the dataset to better represent such scenarios in future training iterations. This approach increases the model's robustness and accuracy on complex data. Low-confidence sampling involves monitoring the model’s prediction scores for each image, focusing on those with confidence scores below a specified threshold.

Why Should You Use Class Metrics and Low-Confidence Sampling as Evaluation Methods?

Using Class Metrics and Low-Confidence Sampling as evaluation methods in model training offers several advantages:

  • Class metrics offer a detailed breakdown of the model's performance across different classes, highlighting specific areas of underperformance. By focusing on these insights, you can address imbalances or weaknesses that overall metrics may obscure. This ensures consistent performance across all classes, which is crucial in real-world applications where success often hinges on effectively handling minority or challenging classes.
  • Low Confidence Sampling complements Class Metrics by uncovering potential weaknesses in the model through cases where it struggles the most. These low-confidence predictions often represent ambiguous or complex scenarios that standard evaluation methods may miss. By focusing on such instances, you can fine-tune the model to handle critical edge cases, improving its robustness and real-world performance.

Example: Helmet Detection Model

Using Datature Nexus’ platform to visualize this example, we will look at a computer vision model that has been trained to identify whether an individual has their helmet on or their helmet off. The image below is showcasing the evaluation graphs for some of the most important metrics of the model’s performance. We see the overall performance of the model but by selecting the filter menu we are able to see the performance of each class. In this case, the classes are “Helmet On” and “Helmet Off”.

Evaluation Metrics with Class-Specific Line Charts

Taking a closer look, we see that the mAP for the "Helmet On" class is significantly higher than that for the "Helmet Off" class. As shown below, this disparity is attributed to a greater number of annotations for "Helmet On," highlighting a class imbalance that negatively impacts the model's performance on the less-represented "Helmet Off" class.

Precision/mAP@50IOU Metric with Class-Specific Line Charts
Tag Distribution Table for the Dataset

In addition, we can examine the low-confidence samples. In Datature’s Nexus Platform, the recommendations tab highlights up to 12 assets in the evaluation dataset with the lowest average prediction scores, allowing you to visualize areas where the model shows uncertainty. You can use the confidence threshold slider to filter out high-confidence predictions, helping you focus on the objects that challenge the model the most.

Confidence Threshold Slider in the Nexus Platform's Recommendations Tab

Finally, we can look into the confusion matrix as well where we see that the model frequently misclassified the “Helmet Off” class as “Helmet On.”

Confusion Matrix Visualization in the Nexus Platform

In Summary, we’ve revealed some important insights:

  • The Mean Average Precision at 0.50 IoU (mAP) indicates that the model’s performance improves consistently throughout the training run.
    • There is potential for further enhancement, as the mAP continues to rise beyond step 2000.
  • The "Helmet On" class shows significantly higher mAP than the "Helmet Off" class due to a greater number of annotations, indicating a class imbalance that adversely affects model performance on the underrepresented "Helmet Off" class.
    • This leads to frequent misclassifications of "Helmet Off" as "Helmet On."

What Should You Do Next After Learning More About The Model’s Performance?

After analysing the evaluation class metrics and low-confidence samples, the next step is to strategically incorporate these insights into the model's training process.

  • Address Class Imbalance:
    • Identify underrepresented classes 
    • Gather additional data for these classes in the next model iteration.
  • Enhance Data Diversity:
    • Implement data augmentation techniques to address model struggles with blurred images.
    • Use methods like simulating motion blur or Generative AI to create more blurred images for training.
  • Ensure Label Accuracy:
    • Review samples for inconsistencies, such as mislabeled images or ambiguous annotations.
    • Correct any identified errors using the Annotation suite before reintroducing samples into training.
  • Investigate Low-Confidence Predictions:
    • Assess if low-confidence predictions dominate the validation set, indicating potential data quality issues.
    • Consider reviewing data collection methods and experimenting with different model architectures or hyperparameter tuning to improve visibility of objects of interest.

Want to Build Your Own Computer Vision Model?

Datature enables you to create and deploy custom models without any coding required. Our user-friendly platform streamlines the process, allowing you to focus on innovation and results. Click here to learn more!

If you have questions, feel free to join our Community Slack to post your questions or contact us if you wish to learn more about the Class Metrics and Low Confidence Sampling features on Datature Nexus. For more detailed information about the Class Metrics and Low Confidence Sampling functionalities, customisation options, or answers to any common questions you might have, read more on our Developer Portal.

Build models with the best tools.

develop ml models in minutes with datature

START A PROJECT