Super-resolution is the process of generating a higher-resolution image from a lower-resolution input. The goal is to recover fine details, sharp edges, and textures that were lost during downsampling or that were never captured due to sensor limitations. Deep learning models, particularly convolutional and generative networks, have replaced older interpolation methods because they can hallucinate plausible high-frequency detail that simple upscaling cannot.
The field broadly splits into single-image super-resolution (SISR) and multi-frame approaches. SISR models like ESRGAN and SwinIR take one low-res image and predict a 2x or 4x upscaled version. They train on paired datasets where high-res images are artificially downsampled to create low-res inputs. Multi-frame methods combine information from several slightly shifted frames, which is common in video and satellite imaging where temporal redundancy provides real additional detail.
Practical applications span medical imaging, satellite analysis, security footage enhancement, and preprocessing for downstream vision tasks. Upscaling low-resolution training images can improve detection and segmentation accuracy when high-res capture is not feasible. The main trade-off is computational cost versus quality: lightweight models run in real time on edge devices, while larger diffusion-based approaches produce sharper results but need GPU resources.
