Who this is for
Beginner to intermediate Computer Vision practitioners who need to understand and create pixel-accurate labels for training and evaluating segmentation models.
Prerequisites
- Basic image concepts: pixels, width/height, channels (RGB vs grayscale).
- Comfort with simple math (areas, ratios).
- Familiarity with classification or detection is helpful but not required.
Learning path
- Start here: what masks are and why they matter.
- Practice binary, multiclass, and instance masks.
- Learn formats (PNG, polygons, RLE) and export rules.
- Do QA with IoU/Dice, fix common mistakes.
- Move on to panoptic labeling and dataset quality control.
Why this matters
In real projects, you will:
- Label pixels to teach models where objects start and end (e.g., road vs sidewalk, tumor vs healthy tissue, crop vs weeds).
- Evaluate models with precise metrics (IoU/Dice) that depend on correct masks.
- Debug data issues (wrong label map, overlapping polygons, palette errors) that can silently break training.
Concept explained simply
A segmentation mask is an image-size grid of numbers where every pixel has a label. For binary masks, 1 means the object and 0 means background. For multiclass masks, each class has its own ID (e.g., 0 background, 1 road, 2 car). For instance masks, each object instance gets a unique ID, even if it is the same class.
Mental model
Imagine laying transparent colored sheets on top of an image. Each sheet marks where one class appears. Stack them into a single map by writing the class ID at each pixel. For instances, think of numbering each separate blob.
Types of masks
- Binary segmentation: one foreground class vs background (values typically 0 and 1).
- Multiclass segmentation: each pixel belongs to exactly one class (0..K).
- Instance segmentation: separates individual objects (IDs per object, plus a mapping to class).
- Panoptic segmentation: combines semantic class and instance in one representation.
Formats and label maps
- Per-pixel PNG: often stored as 8-bit single-channel where pixel values are class IDs. May use an indexed color palette for visualization.
- Polygons: vertices outlining objects; later rasterized to masks.
- RLE (run-length encoding): compact text/array form storing runs of foreground pixels.
- Label map: a stable mapping like {0: background, 1: road, 2: car}. Keep it consistent across the dataset.
Deep dive: common export rules
- Background is 0 by default unless otherwise required.
- For multiclass PNGs, ensure no anti-aliasing. Values must be exact integers.
- For polygons, decide overlap priority: later polygon wins or class-priority order.
- For holes (donut shapes), use interior rings or subtractive polygons.
Workflow: from image to mask
- Define label map and class priority (which class wins on overlaps).
- Set tool to discrete labels (no feathering, no soft edges).
- Annotate with polygons or brush; keep edges tight to object boundaries.
- Handle occlusion: label visible area; if instance labeling, separate IDs.
- QA your masks: spot-check edges, tiny islands, and correct class IDs.
- Export in required format (PNG, polygons, RLE) and verify with a viewer.
Worked examples
Example 1 — Binary mask
Task: Label "road" pixels as 1; everything else 0.
If the road occupies a rectangle region, the mask is a block of 1s in that area. Area is the count of 1-pixels; coverage is area / total pixels.
Example 2 — Multiclass mask
Label map: 0 background, 1 cat, 2 dog. A pixel cannot be both cat and dog. If cat occludes dog, the visible surface gets the cat ID. Ensure the same rule across the dataset.
Example 3 — Instance vs multiclass
Two apples touching: multiclass mask would label both pixels as "apple" (same ID). Instance mask assigns apple-1 and apple-2 with distinct IDs (e.g., 11 and 12) and a lookup indicating both are class "apple".
Quality checks and metrics
- Intersection over Union (IoU): overlap / union of predicted vs ground-truth pixels. Ranges 0–1.
- Dice (F1) score: 2 × overlap / (predicted + ground truth). Also 0–1.
- Edge tightness: visually inspect zoomed boundaries; avoid stair-stepping or gaps.
- Connectivity: ensure instances are single connected components unless the object is naturally disjoint.
Mini calculation
If ground truth area is 16 px, prediction area 16 px, and intersection 9 px, then IoU = 9 / (16 + 16 − 9) = 9 / 23 ≈ 0.391 and Dice = 2 × 9 / (16 + 16) = 18 / 32 = 0.5625.
Exercises
Do these to lock in the concepts. A checklist follows each exercise. Compare with the solution only after you try.
Exercise 1 — Rasterize a rectangle and compute IoU
Image grid: 6×6 with rows and columns 0–5. Ground-truth polygon is a rectangle with corners (1,1) → (1,4) → (4,4) → (4,1). Include boundary pixels. Background=0, object=1. Also consider a predicted rectangle with corners (2,2) → (2,5) → (5,5) → (5,2) (also boundary-inclusive).
- Create the 6×6 binary mask (rows as strings of 0/1) for the ground-truth rectangle.
- Report its area (count of 1s).
- Compute IoU with the predicted rectangle mask.
Show solution
Ground-truth mask rows:
000000 011110 011110 011110 011110 000000
Area = 16. Predicted rectangle area = 16. Intersection = rows 2–4 and cols 2–4 = 9. Union = 16 + 16 − 9 = 23. IoU = 9/23 ≈ 0.391.
- [ ] Mask rows written with correct 0/1 counts
- [ ] Area counted correctly
- [ ] IoU computed with union = A + B − intersection
Exercise 2 — Split multiclass into instances and compute Dice
Label map: 0 background, 1 leaf, 2 stem. Given a 5×5 multiclass mask (rows 0–4, cols 0–4):
Row0: 0 0 1 0 0 Row1: 0 1 1 1 0 Row2: 0 0 0 2 2 Row3: 0 1 0 0 0 Row4: 0 1 0 0 0
- Split the leaf class (value 1) into instances using 4-connectivity. Assign IDs 11, 12, …
- List pixel coordinates (row,col) for each leaf instance.
- Model prediction (leaf pixels): {(0,2),(1,1),(1,2),(1,3),(3,1)}. Compute Dice between predicted leaf mask and ground-truth leaf mask.
Show solution
Ground-truth leaf pixels: {(0,2),(1,1),(1,2),(1,3),(3,1),(4,1)}.
Instances (4-connectivity):
- ID 11: {(0,2),(1,1),(1,2),(1,3)}
- ID 12: {(3,1),(4,1)}
Prediction size = 5, GT size = 6, intersection = 5. Dice = 2×5/(5+6) = 10/11 ≈ 0.909.
- [ ] Instances formed with 4-connectivity
- [ ] Coordinates listed clearly
- [ ] Dice computed as 2×overlap / (pred + gt)
Common mistakes and self-checks
- Anti-aliased edges: soft boundaries create non-integer labels. Fix by disabling feathering/anti-aliasing and exporting as indexed or integer masks.
- Palette confusion: assuming palette color equals class ID. Verify the numeric pixel values, not the displayed colors.
- Overlap rules undefined: polygons drawn in any order. Define a class priority or consistent z-order and stick to it.
- Inconsistent label map: changing class IDs mid-project. Keep a single source of truth and validate masks against it.
- Disconnected instances: multiple blobs labeled as one instance. Run connected component checks and split them.
Self-check routine
- Open a mask as raw integers and confirm values are only the allowed IDs.
- Render contours over the image and visually inspect edges at 200% zoom.
- Compute class pixel counts and compare across batches for anomalies.
- Run IoU/Dice on a validation subset to catch sudden drops.
Practical projects
- Mini dataset: Collect 30 images of a simple scene (e.g., desk). Label two classes (object vs background). Export masks and compute per-image IoU for a basic baseline.
- Multiclass set: Label 3–4 classes (e.g., keyboard, mouse, mug, background). Create a confusion report from pixel counts.
- Instance labeling: Label multiple instances of the same class (e.g., fruits). Verify instance connectivity and counts per image.
Next steps
- Learn polygon-to-mask rasterization details and hole handling.
- Automate dataset QA: ID validation, class coverage histograms, component checks.
- Explore panoptic formats that store both category and instance IDs.
Quick Test info
Everyone can take the quick test below. Sign in to save your progress; otherwise, your result will not be saved. You can retake the test anytime.
Mini challenge
Pick a photo on your device (no upload needed). Define a 3-class label map. Sketch on paper where each class would be in the mask, then write down approximate pixel percentages per class (must sum to 100%). Finally, state which overlap rule you would use and why.