Keypoint Detection

Keypoint detection locates specific points of interest on objects or bodies within an image. For human pose estimation, keypoints mark joint locations like shoulders, elbows, wrists, hips, knees, and ankles. For faces, they mark eyes, nose tip, mouth corners, and jawline. For objects, they can mark functional parts like door handles, wheel centers, or component attachment points.

Architectures for keypoint detection typically produce heatmaps, one per keypoint type, where bright spots indicate the predicted location of each point. Top-down methods first detect objects with bounding boxes, then estimate keypoints within each box (HRNet, ViTPose). Bottom-up methods detect all keypoints in the image at once, then group them into individual instances (OpenPose, HigherHRNet). Top-down approaches are generally more accurate but slower because they run the keypoint estimator once per detected object.

Applications include human pose estimation for sports analytics and fitness tracking, hand keypoint detection for gesture recognition and sign language, facial landmark detection for face alignment and expression analysis, animal pose estimation for wildlife monitoring and veterinary science, and industrial keypoint detection for measuring component positions in manufacturing assembly. MediaPipe provides lightweight real-time keypoint models for hands, faces, and full bodies on mobile devices.

Get Started Now

Get Started using Datature’s computer vision platform now for free.