Contrastive Learning

Contrastive learning trains a model to produce similar representations for related inputs and different representations for unrelated inputs, without needing class labels. Take an image, create two augmented versions (random crop, color jitter, flip), and train the model to map both versions close together in embedding space while pushing embeddings of different images apart.

SimCLR uses a shared encoder that maps augmented pairs through a projection head, with a contrastive loss (NT-Xent) that maximizes agreement between positive pairs relative to negatives in the batch. MoCo maintains a momentum-updated encoder and a queue of negative embeddings, removing the need for very large batch sizes. BYOL and SimSiam showed that negative pairs aren't strictly necessary — asymmetric architectures with stop-gradients can learn good representations from positive pairs alone. DINO and DINOv2 apply self-distillation with vision transformers and produce features with strong emergent properties, including semantic segmentation without any pixel-level training.

These methods produce general-purpose visual representations that transfer well to downstream tasks — classification, detection, segmentation — with minimal labeled data. They're especially useful in domains where labels are expensive: medical imaging, satellite analysis, and industrial inspection.

Get Started Now

Get Started using Datature’s platform now for free.