luvv to helpDiscover the Best Free Online Tools
Topic 5 of 8

Keypoints And Pose Labels

Learn Keypoints And Pose Labels for free with explanations, exercises, and a quick test (for Computer Vision Engineer).

Published: January 5, 2026 | Updated: January 5, 2026

Why this matters

Bounding boxes say where an object is. Keypoints and pose labels say how it is shaped and oriented. In real projects, you will:

  • Track human joints for fitness, AR try-on, or ergonomics analysis.
  • Align faces for recognition or expression analysis.
  • Estimate object pose for robotics grasping or quality control.
  • Drive downstream features like angles, gait metrics, or gesture commands.

What you will learn in this subskill

  • Designing a keypoint schema (names, order, left/right conventions, edges).
  • Coordinates and visibility flags that models understand.
  • How to annotate consistently and review quality.
  • How to transform keypoints under resizing, normalization, and flips.

Concept explained simply

Keypoints are predefined landmarks you place on an object or person (for example, left eye, right knee, bottle cap center). A pose label is the collection of all keypoints for an instance, often with a skeleton (edges between points), and sometimes angles or 3D coordinates.

Quick mental model

Imagine a stick figure. The dots are keypoints (joints). The lines between dots are the skeleton edges that guide training and help visualize pose. Consistency matters: same points, same order, same naming every time.

Key design decisions

  1. Define the point set: Which landmarks matter for your use case? Fewer points are faster to annotate; more points can unlock richer features.
  2. Consistent ordering: Always store points in the exact same order (e.g., [nose, left_eye, right_eye, ...]). Models and metrics rely on it.
  3. Left/right convention: Use the subject's left/right (not the camera's). Decide how to handle mirrored images and write it down.
  4. Visibility flags: Suggestion: 0 = not labeled/unknown, 1 = labeled but not visible (occluded), 2 = visible. Or use boolean visible + present. Pick and document one.
  5. Coordinate system: Pixel coordinates (x, y) with origin at top-left is standard. Optionally store a normalized copy (x/W, y/H) in [0,1].
  6. Skeleton edges: Define adjacency pairs (e.g., left_shoulder–left_elbow–left_wrist). Helps training and visualization.
  7. Augmentation rules: When you flip or rotate images, update points accordingly and swap left/right labels if needed.
2D vs 3D notes
  • 2D keypoints: (x, y) in the image. Simple and common.
  • 2.5D: (x, y) plus relative depth per joint.
  • 3D: (X, Y, Z) in a camera or world frame. Requires calibration and more careful definitions.

Worked examples

Example 1: Human pose (2D, 17 points)

Suppose you choose 17 points including nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles. You store them in a fixed order and add edges for limbs.

Sample record
{
  "image_id": 42,
  "person_id": 1,
  "keypoints_order": [
    "nose","left_eye","right_eye","left_ear","right_ear",
    "left_shoulder","right_shoulder","left_elbow","right_elbow",
    "left_wrist","right_wrist","left_hip","right_hip",
    "left_knee","right_knee","left_ankle","right_ankle"
  ],
  "keypoints": [
    {"x": 420, "y": 180, "v": 2},
    {"x": 400, "y": 170, "v": 2},
    {"x": 440, "y": 170, "v": 2},
    {"x": 385, "y": 175, "v": 1},
    {"x": 455, "y": 175, "v": 0},
    {"x": 360, "y": 260, "v": 2},
    {"x": 480, "y": 260, "v": 2},
    {"x": 330, "y": 330, "v": 2},
    {"x": 510, "y": 330, "v": 2},
    {"x": 310, "y": 395, "v": 2},
    {"x": 530, "y": 395, "v": 2},
    {"x": 380, "y": 380, "v": 2},
    {"x": 460, "y": 380, "v": 2},
    {"x": 370, "y": 480, "v": 2},
    {"x": 470, "y": 480, "v": 2},
    {"x": 360, "y": 560, "v": 2},
    {"x": 480, "y": 560, "v": 2}
  ],
  "edges": [
    ["left_shoulder","left_elbow"], ["left_elbow","left_wrist"],
    ["right_shoulder","right_elbow"], ["right_elbow","right_wrist"],
    ["left_shoulder","right_shoulder"], ["left_hip","right_hip"],
    ["left_shoulder","left_hip"], ["right_shoulder","right_hip"],
    ["left_hip","left_knee"], ["left_knee","left_ankle"],
    ["right_hip","right_knee"], ["right_knee","right_ankle"]
  ],
  "image_size": {"width": 640, "height": 720}
}

Here v is visibility: 0 = unknown/not labeled, 1 = labeled but occluded, 2 = visible.

Example 2: Face landmarks (5 points) with normalization

Define order: left_eye, right_eye, nose_tip, left_mouth, right_mouth.

Pixel and normalized coordinates
{
  "order": ["left_eye","right_eye","nose_tip","left_mouth","right_mouth"],
  "pixels": [
    {"x": 210, "y": 120, "v": 2},
    {"x": 270, "y": 122, "v": 2},
    {"x": 240, "y": 145, "v": 2},
    {"x": 220, "y": 170, "v": 2},
    {"x": 260, "y": 171, "v": 2}
  ],
  "size": {"width": 480, "height": 270},
  "normalized": [
    {"x": 210/480, "y": 120/270},
    {"x": 270/480, "y": 122/270},
    {"x": 240/480, "y": 145/270},
    {"x": 220/480, "y": 170/270},
    {"x": 260/480, "y": 171/270}
  ]
}

Normalization puts coordinates in [0,1], making models robust to different image sizes.

Example 3: Rigid object landmarks with flip handling

Four keypoints for a bottle: top_left, top_right, bottom_right, bottom_left. After a horizontal flip, x becomes 1 - x if you use normalized coordinates. Also swap left/right labels if your schema encodes sides.

Before and after flip (normalized)
before = [
  {"name":"top_left", "x":0.30, "y":0.10},
  {"name":"top_right","x":0.60, "y":0.10},
  {"name":"bottom_right","x":0.61,"y":0.80},
  {"name":"bottom_left", "x":0.29,"y":0.81}
]
after = [
  {"name":"top_left", "x":1-0.60, "y":0.10},
  {"name":"top_right","x":1-0.30, "y":0.10},
  {"name":"bottom_right","x":1-0.29,"y":0.80},
  {"name":"bottom_left", "x":1-0.61,"y":0.81}
]

If your naming is semantic (left/right from object perspective), rename points accordingly to keep meaning correct after flip.

Quality and QA

  • Write a short guideline with images of correct/incorrect placement and how to handle occlusions.
  • Use a tolerance (e.g., within 3–5 pixels or a small fraction of image size) for review checks.
  • Spot-check inter-annotator agreement on the same images.
  • Log common errors (left/right swaps, wrong order) and update the guideline.

Exercises

Do these inside your notes or a JSON editor. A checklist is below to self-verify. Solutions are provided for reference.

Exercise 1: Design a human keypoint schema (17 points)

Define names, fixed order, visibility convention, and skeleton edges. Output a compact JSON example with an image_size and one annotated person.

Exercise 2: Normalize and flip keypoints

Given image size 1280x720 and wrists at (100, 500) left, (1180, 520) right, both visible: 1) produce normalized coordinates; 2) compute coordinates after a horizontal flip using normalized coords (x' = 1 - x); 3) swap left/right labels appropriately.

Exercise checklist
  • All keypoints have a well-defined name and a fixed index.
  • Left/right naming uses the subject's perspective.
  • Visibility convention is stated and used consistently.
  • Skeleton edges connect meaningful adjacent joints.
  • Normalized coordinates are within [0,1].
  • Flip logic updates coordinates and labels correctly.

Common mistakes and self-checks

  • Swapping left/right: Self-check by overlaying points; arms should cross after a flip only if labels are swapped too.
  • Inconsistent order: Self-check by printing the first three names per record; they must match your schema every time.
  • Mixing units: Pixels vs normalized. Self-check by confirming max x equals width (pixels) or 1.0 (normalized), never both.
  • Guessing occluded points: Mark as occluded or unknown instead of guessing. Self-check: if the point is behind another object, visibility must not be 2.
  • Forgetting skeleton: Missing edges reduces training signal. Self-check: count edges equals your defined list.
  • Not updating after augmentation: Always apply the same transforms to keypoints. Self-check: automate a unit test that reprojects and verifies bounds.

Practical projects

  • Mini human pose dataset: Annotate 100 images with 17 joints, define edges, and visualize overlays to catch mistakes.
  • Face alignment: Annotate 5 landmarks on 200 faces and test a simple alignment pipeline using eye positions.
  • Rigid object pose: Choose a box-like object, mark 8 physical corners in images, and compute its 2D keypoint reprojection error after simple homography-based alignment.

Who this is for

  • Computer Vision Engineers defining datasets for pose/landmark tasks.
  • Annotation leads creating clear labeling guidelines.
  • ML practitioners preparing data for keypoint-based models.

Prerequisites

  • Basic image coordinates (x right, y down), and resizing/normalization.
  • Familiarity with bounding boxes and segmentation is helpful.

Learning path

  1. Review boxes and polygons to understand object extents.
  2. Define your keypoint schema and visibility rules.
  3. Practice with 20–50 annotated images; revise guidelines.
  4. Add augmentation rules and test transformations.
  5. Scale up annotation and implement QA checks.

Mini challenge

Pick any daily object (e.g., a mug). Define 6 meaningful keypoints and a simple skeleton. Annotate 10 images, include at least 3 with occlusion. Write one paragraph on how you handled left/right and flips.

Next steps

  • Extend to multi-person scenes and crowd cases.
  • Add 3D or depth where needed, starting with a small pilot.
  • Automate sanity checks: ordering, ranges, left/right swaps.
About the quick test

The quick test is available to everyone. If you are logged in, your progress will be saved automatically.

Practice Exercises

2 exercises to complete

Instructions

Create a compact JSON example for a 640x720 image with one person. Include:

  • keypoints_order: array of 17 names
  • keypoints: list of objects with x, y, v (0/1/2)
  • edges: adjacency pairs forming limbs and torso
  • image_size

Ensure left/right is the subject's perspective and ordering is consistent.

Expected Output
A valid JSON object containing: keypoints_order of 17 items, keypoints of 17 entries with x,y,v, an edges array connecting adjacent joints, and image_size width=640 height=720.

Keypoints And Pose Labels — Quick Test

Test your knowledge with 7 questions. Pass with 70% or higher.

7 questions70% to pass

Have questions about Keypoints And Pose Labels?

AI Assistant

Ask questions about this tool