Pose Estimation
Pose estimation detects the position and orientation of a person's body (or an object's structure) by locating a set of predefined keypoints in an image or video frame. For human pose estimation, this typically means finding 17-33 body joints (nose, eyes, shoulders, elbows, wrists, hips, knees, ankles) and connecting them to form a skeleton representation.
Two main approaches exist. Top-down methods first detect each person with a bounding box, then run a keypoint estimator on each crop (HRNet, ViTPose, RTMPose). This is more accurate but slower because it processes each person separately. Bottom-up methods detect all keypoints in the image simultaneously, then group them into individual skeletons (OpenPose, HigherHRNet). This is faster for crowded scenes because the computation does not scale with the number of people.
Applications include fitness and sports analytics (tracking form, counting reps, analyzing biomechanics), healthcare and physical therapy (monitoring patient movement and recovery), animation and motion capture (driving 3D character rigs from video), action recognition (classifying activities based on pose sequences), and safety monitoring (detecting falls or unsafe postures in industrial environments). Real-time pose estimation on mobile devices is enabled by lightweight models like MoveNet and MediaPipe Pose.

