How to learn Keypoints And Pose Labels for Data Collection And Annotation in Computer Vision Engineer for free

Why this matters

Bounding boxes say where an object is. Keypoints and pose labels say how it is shaped and oriented. In real projects, you will:

Track human joints for fitness, AR try-on, or ergonomics analysis.
Align faces for recognition or expression analysis.
Estimate object pose for robotics grasping or quality control.
Drive downstream features like angles, gait metrics, or gesture commands.

What you will learn in this subskill

Designing a keypoint schema (names, order, left/right conventions, edges).
Coordinates and visibility flags that models understand.
How to annotate consistently and review quality.
How to transform keypoints under resizing, normalization, and flips.

Concept explained simply

Keypoints are predefined landmarks you place on an object or person (for example, left eye, right knee, bottle cap center). A pose label is the collection of all keypoints for an instance, often with a skeleton (edges between points), and sometimes angles or 3D coordinates.

Quick mental model

Imagine a stick figure. The dots are keypoints (joints). The lines between dots are the skeleton edges that guide training and help visualize pose. Consistency matters: same points, same order, same naming every time.

Key design decisions

Define the point set: Which landmarks matter for your use case? Fewer points are faster to annotate; more points can unlock richer features.
Consistent ordering: Always store points in the exact same order (e.g., [nose, left_eye, right_eye, ...]). Models and metrics rely on it.
Left/right convention: Use the subject's left/right (not the camera's). Decide how to handle mirrored images and write it down.
Visibility flags: Suggestion: 0 = not labeled/unknown, 1 = labeled but not visible (occluded), 2 = visible. Or use boolean visible + present. Pick and document one.
Coordinate system: Pixel coordinates (x, y) with origin at top-left is standard. Optionally store a normalized copy (x/W, y/H) in [0,1].
Skeleton edges: Define adjacency pairs (e.g., left_shoulder–left_elbow–left_wrist). Helps training and visualization.
Augmentation rules: When you flip or rotate images, update points accordingly and swap left/right labels if needed.

2D vs 3D notes

2D keypoints: (x, y) in the image. Simple and common.
2.5D: (x, y) plus relative depth per joint.
3D: (X, Y, Z) in a camera or world frame. Requires calibration and more careful definitions.

Worked examples

Example 1: Human pose (2D, 17 points)

Suppose you choose 17 points including nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles. You store them in a fixed order and add edges for limbs.

Sample record

{
  "image_id": 42,
  "person_id": 1,
  "keypoints_order": [
    "nose","left_eye","right_eye","left_ear","right_ear",
    "left_shoulder","right_shoulder","left_elbow","right_elbow",
    "left_wrist","right_wrist","left_hip","right_hip",
    "left_knee","right_knee","left_ankle","right_ankle"
  ],
  "keypoints": [
    {"x": 420, "y": 180, "v": 2},
    {"x": 400, "y": 170, "v": 2},
    {"x": 440, "y": 170, "v": 2},
    {"x": 385, "y": 175, "v": 1},
    {"x": 455, "y": 175, "v": 0},
    {"x": 360, "y": 260, "v": 2},
    {"x": 480, "y": 260, "v": 2},
    {"x": 330, "y": 330, "v": 2},
    {"x": 510, "y": 330, "v": 2},
    {"x": 310, "y": 395, "v": 2},
    {"x": 530, "y": 395, "v": 2},
    {"x": 380, "y": 380, "v": 2},
    {"x": 460, "y": 380, "v": 2},
    {"x": 370, "y": 480, "v": 2},
    {"x": 470, "y": 480, "v": 2},
    {"x": 360, "y": 560, "v": 2},
    {"x": 480, "y": 560, "v": 2}
  ],
  "edges": [
    ["left_shoulder","left_elbow"], ["left_elbow","left_wrist"],
    ["right_shoulder","right_elbow"], ["right_elbow","right_wrist"],
    ["left_shoulder","right_shoulder"], ["left_hip","right_hip"],
    ["left_shoulder","left_hip"], ["right_shoulder","right_hip"],
    ["left_hip","left_knee"], ["left_knee","left_ankle"],
    ["right_hip","right_knee"], ["right_knee","right_ankle"]
  ],
  "image_size": {"width": 640, "height": 720}
}

Here v is visibility: 0 = unknown/not labeled, 1 = labeled but occluded, 2 = visible.

Example 2: Face landmarks (5 points) with normalization

Define order: left_eye, right_eye, nose_tip, left_mouth, right_mouth.

Pixel and normalized coordinates

{
  "order": ["left_eye","right_eye","nose_tip","left_mouth","right_mouth"],
  "pixels": [
    {"x": 210, "y": 120, "v": 2},
    {"x": 270, "y": 122, "v": 2},
    {"x": 240, "y": 145, "v": 2},
    {"x": 220, "y": 170, "v": 2},
    {"x": 260, "y": 171, "v": 2}
  ],
  "size": {"width": 480, "height": 270},
  "normalized": [
    {"x": 210/480, "y": 120/270},
    {"x": 270/480, "y": 122/270},
    {"x": 240/480, "y": 145/270},
    {"x": 220/480, "y": 170/270},
    {"x": 260/480, "y": 171/270}
  ]
}

Normalization puts coordinates in [0,1], making models robust to different image sizes.

Example 3: Rigid object landmarks with flip handling

Four keypoints for a bottle: top_left, top_right, bottom_right, bottom_left. After a horizontal flip, x becomes 1 - x if you use normalized coordinates. Also swap left/right labels if your schema encodes sides.

Before and after flip (normalized)

before = [
  {"name":"top_left", "x":0.30, "y":0.10},
  {"name":"top_right","x":0.60, "y":0.10},
  {"name":"bottom_right","x":0.61,"y":0.80},
  {"name":"bottom_left", "x":0.29,"y":0.81}
]
after = [
  {"name":"top_left", "x":1-0.60, "y":0.10},
  {"name":"top_right","x":1-0.30, "y":0.10},
  {"name":"bottom_right","x":1-0.29,"y":0.80},
  {"name":"bottom_left", "x":1-0.61,"y":0.81}
]

If your naming is semantic (left/right from object perspective), rename points accordingly to keep meaning correct after flip.

Quality and QA

Write a short guideline with images of correct/incorrect placement and how to handle occlusions.
Use a tolerance (e.g., within 3–5 pixels or a small fraction of image size) for review checks.
Spot-check inter-annotator agreement on the same images.
Log common errors (left/right swaps, wrong order) and update the guideline.

Exercises

Do these inside your notes or a JSON editor. A checklist is below to self-verify. Solutions are provided for reference.

Exercise 1: Design a human keypoint schema (17 points)

Define names, fixed order, visibility convention, and skeleton edges. Output a compact JSON example with an image_size and one annotated person.

Exercise 2: Normalize and flip keypoints

Given image size 1280x720 and wrists at (100, 500) left, (1180, 520) right, both visible: 1) produce normalized coordinates; 2) compute coordinates after a horizontal flip using normalized coords (x' = 1 - x); 3) swap left/right labels appropriately.

Exercise checklist

All keypoints have a well-defined name and a fixed index.
Left/right naming uses the subject's perspective.
Visibility convention is stated and used consistently.
Skeleton edges connect meaningful adjacent joints.
Normalized coordinates are within [0,1].
Flip logic updates coordinates and labels correctly.

Common mistakes and self-checks

Swapping left/right: Self-check by overlaying points; arms should cross after a flip only if labels are swapped too.
Inconsistent order: Self-check by printing the first three names per record; they must match your schema every time.
Mixing units: Pixels vs normalized. Self-check by confirming max x equals width (pixels) or 1.0 (normalized), never both.
Guessing occluded points: Mark as occluded or unknown instead of guessing. Self-check: if the point is behind another object, visibility must not be 2.
Forgetting skeleton: Missing edges reduces training signal. Self-check: count edges equals your defined list.
Not updating after augmentation: Always apply the same transforms to keypoints. Self-check: automate a unit test that reprojects and verifies bounds.

Practical projects

Mini human pose dataset: Annotate 100 images with 17 joints, define edges, and visualize overlays to catch mistakes.
Face alignment: Annotate 5 landmarks on 200 faces and test a simple alignment pipeline using eye positions.
Rigid object pose: Choose a box-like object, mark 8 physical corners in images, and compute its 2D keypoint reprojection error after simple homography-based alignment.

Who this is for

Computer Vision Engineers defining datasets for pose/landmark tasks.
Annotation leads creating clear labeling guidelines.
ML practitioners preparing data for keypoint-based models.

Prerequisites

Basic image coordinates (x right, y down), and resizing/normalization.
Familiarity with bounding boxes and segmentation is helpful.

Learning path

Review boxes and polygons to understand object extents.
Define your keypoint schema and visibility rules.
Practice with 20–50 annotated images; revise guidelines.
Add augmentation rules and test transformations.
Scale up annotation and implement QA checks.

Mini challenge

Pick any daily object (e.g., a mug). Define 6 meaningful keypoints and a simple skeleton. Annotate 10 images, include at least 3 with occlusion. Write one paragraph on how you handled left/right and flips.

Next steps

Extend to multi-person scenes and crowd cases.
Add 3D or depth where needed, starting with a small pilot.
Automate sanity checks: ordering, ranges, left/right swaps.

About the quick test

The quick test is available to everyone. If you are logged in, your progress will be saved automatically.

Menu

Keypoints And Pose Labels

Table of Contents