Optical flow describes the pattern of motion between consecutive video frames, represented as a vector field where each pixel gets a displacement vector showing its movement direction and magnitude. If a car moves 10 pixels to the right between two frames, the optical flow vectors in that region point right with magnitude 10. This motion information is a building block for video understanding tasks.
Classical methods include Lucas-Kanade (sparse flow, tracks individual feature points), Horn-Schunck (dense flow, assumes smooth motion across the image), and Farneback (polynomial expansion for dense estimation). Deep learning brought major accuracy improvements: FlowNet and FlowNet2 were early learned approaches, while RAFT (Recurrent All-Pairs Field Transforms) became the dominant architecture by using a correlation volume with iterative GRU-based refinement. FlowFormer added transformer attention, and VideoFlow extended to multi-frame estimation.
Optical flow feeds into many downstream tasks: object tracking (associating detections across frames), action recognition (capturing motion patterns), video interpolation (generating intermediate frames), video stabilization, and temporal segmentation. In autonomous driving, flow helps predict where pedestrians and vehicles are heading. In sports analytics, it quantifies player movement. Dense flow is computationally expensive, so real-time applications often use sparse flow or lightweight learned models.