luvv to helpDiscover the Best Free Online Tools

Data Collection And Annotation

Learn Data Collection And Annotation for Computer Vision Engineer for free: roadmap, examples, subskills, and a skill exam.

Published: January 5, 2026 | Updated: January 5, 2026

Why this skill matters for Computer Vision Engineers

High-performing computer vision models start with the right data. Data collection and annotation determine what your model can learn, how robust it is in the real world, and how quickly you can iterate. A clear label schema, reliable guidelines, and disciplined dataset versioning keep experiments reproducible and models trustworthy.

  • Ship reliable models faster by reducing label noise and rework.
  • Unlock advanced tasks (detection, segmentation, pose) with appropriate labels.
  • Maintain repeatable experiments via clean splits and dataset lineage.

What you'll learn

  • Define a task and label schema that matches product goals.
  • Write annotation guidelines and quality checks that reduce ambiguity.
  • Create bounding boxes, polygons, masks, and keypoints correctly.
  • Handle class imbalance and rare cases intentionally.
  • Make sound train/val/test splits and track dataset versions.

Who this is for

  • Junior to mid-level Computer Vision Engineers building datasets or reviewing vendor labels.
  • Data scientists transitioning to vision tasks.
  • Technical product owners who need to scope annotation projects.

Prerequisites

  • Comfort with Python and NumPy.
  • Basic understanding of image formats and arrays.
  • Familiarity with a CV task (classification, detection, segmentation, or pose).

Learning path

  1. Define task and label schema – What problem, what objects, which attributes, and how to measure success.
  2. Write guidelines and QA plan – Unambiguous instructions, examples, edge cases, auditing strategy.
  3. Choose annotation shapes – Boxes vs polygons vs masks vs keypoints; pick what you need for the model and metrics.
  4. Plan data coverage – Collect diverse scenes, handle class imbalance, and rare cases.
  5. Split data right – Train/val/test by scene, time, or location to avoid leakage.
  6. Version and track lineage – Tie every model run to a dataset version and changelog.
Milestone checklist
  • Clear label schema approved by stakeholders.
  • Guidelines document with 10+ visual examples and edge cases.
  • QA rubric and acceptance thresholds (e.g., IoU ≥ 0.5 for box matches).
  • Annotated pilot set (100–300 images) reviewed and revised.
  • Documented split strategy and rationale.
  • Dataset version tag and changelog.

Worked examples (with code)

Example 1 — Minimal label schema for traffic-sign detection
{
  "task": "object_detection",
  "classes": [
    {"id": 1, "name": "stop_sign"},
    {"id": 2, "name": "yield_sign"},
    {"id": 3, "name": "speed_limit"}
  ],
  "attributes": {
    "occluded": {"type": "boolean"},
    "truncated": {"type": "boolean"}
  },
  "annotation_shape": "bbox",
  "bbox_format": "xywh",  // x,y top-left; width,height
  "metrics": {"primary": "mAP@0.5"}
}

Tip: Keep class names stable and unambiguous. Avoid synonyms that annotators could mix up.

Example 2 — Convert Pascal VOC box (xmin,ymin,xmax,ymax) to COCO (x,y,w,h)
def voc_to_coco(xmin, ymin, xmax, ymax):
    x = xmin
    y = ymin
    w = max(0, xmax - xmin)
    h = max(0, ymax - ymin)
    return x, y, w, h

print(voc_to_coco(12, 20, 60, 100))  # (12, 20, 48, 80)

Always clamp to image bounds to avoid negative widths/heights.

Example 3 — Rasterize a polygon into a binary mask
from PIL import Image, ImageDraw
import numpy as np

def polygon_to_mask(height, width, polygon_xy):
    # polygon_xy: [x1,y1, x2,y2, ..., xn,yn]
    img = Image.new("L", (width, height), 0)
    ImageDraw.Draw(img).polygon(polygon_xy, outline=1, fill=1)
    return np.array(img, dtype=np.uint8)

h, w = 200, 300
poly = [10,10, 150,20, 140,100, 20,120]
mask = polygon_to_mask(h, w, poly)
print(mask.shape, mask.max())  # (200, 300) 1

Use uint8 masks (0/1). For multiple instances, keep separate masks or use distinct ids.

Example 4 — Keypoints with visibility flags
# Each keypoint: (x, y, v) where v ∈ {0: not labeled, 1: labeled but invisible, 2: visible}
person = {
  "keypoints": [
    (120, 180, 2), # nose
    (110, 170, 2), # left_eye
    (130, 170, 1), # right_eye occluded
  ],
  "skeleton": [(0,1),(1,2)] # indices into keypoints list
}

visible = [kp for kp in person["keypoints"] if kp[2] == 2]
print(len(visible))

Visibility flags help evaluation ignore occluded joints that cannot be annotated reliably.

Example 5 — Simple stratified split by image-level class presence
import random
from collections import defaultdict

random.seed(7)

# image_id -> set of classes present
img_classes = {
  "img_001": {1,2},
  "img_002": {2},
  "img_003": {1,3},
  "img_004": {3},
  "img_005": {1},
  "img_006": {2,3},
}

# Target ratios
ratios = {"train":0.7, "val":0.15, "test":0.15}

imgs = list(img_classes.keys())
random.shuffle(imgs)

splits = {"train":[], "val":[], "test":[]}
class_counts = {k: defaultdict(int) for k in splits}

for img in imgs:
    # Greedy assign to split that least increases imbalance
    best_split = None
    best_score = 1e9
    for s in splits:
        score = 0
        for c in img_classes[img]:
            score += class_counts[s][c]
        # penalty for exceeding target size
        size_penalty = max(0, len(splits[s]) + 1 - ratios[s]*len(imgs))
        if score + size_penalty < best_score:
            best_score = score + size_penalty
            best_split = s
    splits[best_split].append(img)
    for c in img_classes[img]:
        class_counts[best_split][c] += 1

print(splits)

Greedy heuristics can approximate stratification when true stratification is hard.

Example 6 — Sampling weights to fight class imbalance
from collections import Counter

# One label per instance (e.g., detection instances aggregated)
labels = ["cat","cat","dog","dog","dog","bird"]
counts = Counter(labels)

# Inverse frequency weights
weights = {cls: 1.0/c for cls, c in counts.items()}

# Image-level weight as max instance weight contained
img_to_instances = {
  "img1": ["cat"],
  "img2": ["dog","dog"],
  "img3": ["bird"],
}
img_weight = {img: max(weights[l] for l in lbls) for img, lbls in img_to_instances.items()}
print(weights, img_weight)

Combine sampling with targeted data collection for rare classes to avoid overfitting to resampled noise.

Drills and exercises

  • [ ] Write a one-page label schema for a 3-class detection task, including attributes and metrics.
  • [ ] Create 10 annotated images with both correct and intentionally wrong boxes; have a peer run QA using your rubric.
  • [ ] Convert five VOC boxes to COCO format and back; verify no information is lost.
  • [ ] Rasterize two polygons into masks and compute their IoU.
  • [ ] Build a small stratified split (train/val/test) for 60 images and document leakage checks.
  • [ ] Tag a dataset version (v0.1) with a changelog describing class changes and new images.

Common mistakes and debugging tips

  • Vague classes: Overlapping class definitions cause annotator disagreement. Fix by adding positive/negative examples and decision rules.
  • Inconsistent box formats: Mixing xywh and xyxy corrupts training. Store format in metadata and validate on load.
  • Mask off-by-one: Incorrect polygon winding or coordinate rounding creates holes. Visualize overlays and verify mask sums.
  • Keypoint visibility misuse: Marking occluded points as visible hurts evaluation. Use visibility=1 for labeled-but-invisible.
  • Leakage in splits: Frames from the same video in train and test inflate metrics. Split by scene/source, not by image only.
  • Untracked dataset edits: Silent relabels break reproducibility. Version everything and keep a changelog.
Debugging checklist
  • Visual diff: draw boxes/masks from both old and new annotations on the same image.
  • Distribution check: class counts per split and per location/time.
  • Format validation: assert bbox within image bounds and mask dtype ∈ {uint8,bool}.
  • Spot-check QA: random 50 samples with a strict rubric; measure precision/recall of annotators vs gold labels.

Mini project: Build a small segmentation dataset

Goal: Train-ready dataset for sidewalk vs road segmentation (two classes).

  1. Define schema: background, road, sidewalk; masks as uint8 with ids {0,1,2}.
  2. Collect 60 images across different times and weather.
  3. Annotate polygons and rasterize to masks.
  4. QA: 15 random samples, require ≥95% pixel agreement between two annotators on sidewalk class.
  5. Split: 42 train / 9 val / 9 test by location (no leakage).
  6. Version: Tag v0.1, store counts and example overlays.
Acceptance criteria
  • All masks load and align with images; no out-of-bounds polygons.
  • Per-class pixel counts reported; sidewalk has at least 500k pixels in train.
  • Val mIoU baseline ≥ 0.45 with a small UNet; document failure cases.

Practical projects

  • Retail shelf detection: boxes for products, attribute for facing count; compare mAP with and without small items.
  • Pose for fitness: 17-keypoint skeleton with visibility; evaluate PCK and build a feedback demo.
  • Damage segmentation: polygons for dents/scratches; add hard negative images and measure false positive rate.

Subskills

  • Defining Task And Label Schema — Outcome: Translate product goals into concrete classes, attributes, shapes, and metrics. Estimated time: 45–90 min.
  • Annotation Guidelines And QA — Outcome: Write clear instructions and a QA rubric with acceptance thresholds. Estimated time: 60–120 min.
  • Bounding Boxes And Polygons — Outcome: Choose and apply the right geometry; convert formats safely. Estimated time: 45–90 min.
  • Segmentation Masks Basics — Outcome: Create clean masks, manage ids, and validate overlaps. Estimated time: 60–120 min.
  • Keypoints And Pose Labels — Outcome: Label points with visibility and skeletons; avoid common errors. Estimated time: 45–90 min.
  • Handling Class Imbalance — Outcome: Diagnose imbalance and apply sampling/collection strategies. Estimated time: 45–90 min.
  • Train Validation Test Splits — Outcome: Build leakage-safe, stratified splits with rationale. Estimated time: 45–90 min.
  • Dataset Versioning And Lineage — Outcome: Version datasets and tie them to experiments and changelogs. Estimated time: 45–90 min.

Next steps

  • Pick one practical project and complete the mini project flow end-to-end.
  • Instrument a validation script that checks formats, bounds, and class distributions.
  • Plan your next iteration: add rare cases, refine guidelines, and bump the dataset version.

Data Collection And Annotation — Skill Exam

15 questions. Estimated time: 15–25 minutes. You can retake as many times as you like. Progress is saved for logged-in users; everyone can take the exam for free.Passing score: 70%. You will see explanations after each question on review.

13 questions70% to pass

Have questions about Data Collection And Annotation?

AI Assistant

Ask questions about this tool