DETR (Detection Transformer)

DETR (Detection Transformer) is a 2020 architecture from Meta AI that reformulated object detection as a direct set prediction problem, replacing the complex post-processing pipelines of traditional detectors with a clean end-to-end design. Instead of generating thousands of proposals and filtering them with Non-Maximum Suppression, DETR uses a fixed set of learned "object queries" that attend to the image features through a transformer decoder and directly output the final set of detections.

The training process uses Hungarian matching — a bipartite assignment algorithm that pairs each prediction with a ground-truth object (or "no object") to compute the loss. This one-to-one matching eliminates duplicate detections by design, removing the need for NMS. The original DETR was slow to converge and struggled with small objects. Deformable DETR fixed this by using deformable attention (attending to sparse, learned key locations instead of every pixel), reducing training time by 10x.

The DETR family has since expanded: RT-DETR (Baidu, 2023) achieved real-time inference speeds competitive with YOLO, D-FINE improved detection accuracy through fine-grained distribution refinement, and the architecture influenced SAM's mask decoder design. DETR proved that transformers could handle detection without hand-designed components like anchors or NMS.

Resources

Relevant Blog Posts ↘

Glossary

Our Blog

Documentation

YOLO26: The Edge-First Evolution of Real-Time Object Detection

MIN READ

March 4, 2026

YOLO26 is a deployment-first evolution of the YOLO family, eliminating NMS and Distribution Focal Loss while introducing Progressive Loss Balancing, STAL, and the MuSGD optimizer to deliver faster convergence and up to 43% faster CPU inference without sacrificing accuracy.

Read

Real-Time Object Detection With D-FINE

MIN READ

March 7, 2026

This article introduces D-FINE, an advanced object detection model addressing the limitations of traditional methods. It uses Fine-grained Distribution Refinement (FDR) for precise bounding box adjustments and Global Optimal Localization Self-Distillation (GO-LSD) for efficient learning. The article also demonstrates fine-tuning D-FINE on custom datasets with Datature Nexus for real-world applications.

Read

A Historical Breakdown of YOLO: A Landmark Model in Object Detection

MIN READ

March 7, 2026

YOLO is considered a landmark model in object detection due to its fast and accurate detection results, making it a popular choice for various applications

Read

Get Started Now

Get Started using Datature’s computer vision platform now for free.

Book Demo