F1 Score

The F1 score is the harmonic mean of precision and recall, providing a single number that balances both metrics. It's calculated as 2 * (precision * recall) / (precision + recall), and ranges from 0 to 1. Unlike a simple average, the harmonic mean penalizes extreme imbalances: if precision is 0.95 but recall is 0.10, the F1 score is 0.18 rather than 0.53, correctly reflecting that the model is missing most positive cases.

F1 is most useful when you care equally about false positives and false negatives, and when the class distribution is imbalanced (making accuracy misleading). A defect detection system that catches 95% of defects but also flags 30% of good parts as defective would have a mediocre F1, surfacing the precision problem that accuracy alone might hide.

Variants handle multi-class scenarios: macro F1 computes F1 per class then averages (treats all classes equally), micro F1 pools all true/false positives across classes (biased toward frequent classes), and weighted F1 scales each class's F1 by its support count. In object detection, F1 is computed at specific confidence thresholds, and the threshold that maximizes F1 is often reported alongside precision-recall curves and mAP.

Resources

Relevant Blog Posts ↘

Glossary

Our Blog

Documentation

VLM Training Metrics and Loss Functions: A Technical Reference [2026]

MIN READ

March 7, 2026

Comprehensive technical guide to VLM evaluation and fine-tuning, covering key metrics (BLEU, METEOR, CIDEr, SPICE, BERTScore, CLIPScore, VQA Accuracy, ANLS) and core loss functions (cross-entropy, contrastive, focal, KL divergence, DPO). Includes mathematical formulations, step-by-step worked examples, and practical code snippets for implementation.

Read

Introducing Class Metrics and Low Confidence Sampling for Deeper Model Evaluation Insights

MIN READ

March 4, 2026

This article introduces the concepts of evaluation class metrics and low confidence sampling and how they can enable deeper model evaluation insights that can improve your computer vision’s model performance using a helmet detection model as an example

Read

How to Evaluate Computer Vision Models with Confusion Matrix

MIN READ

March 4, 2026

We are excited to help you learn about the Confusion Matrix and how it can be used to evaluate and determine steps to improve your Computer Vision models.

Read

Get Started Now

Get Started using Datature’s computer vision platform now for free.

Book Demo