How to learn Segmentation Masks Basics for Data Collection And Annotation in Computer Vision Engineer for free

Who this is for

Beginner to intermediate Computer Vision practitioners who need to understand and create pixel-accurate labels for training and evaluating segmentation models.

Prerequisites

Basic image concepts: pixels, width/height, channels (RGB vs grayscale).
Comfort with simple math (areas, ratios).
Familiarity with classification or detection is helpful but not required.

Learning path

Start here: what masks are and why they matter.
Practice binary, multiclass, and instance masks.
Learn formats (PNG, polygons, RLE) and export rules.
Do QA with IoU/Dice, fix common mistakes.
Move on to panoptic labeling and dataset quality control.

Why this matters

In real projects, you will:

Label pixels to teach models where objects start and end (e.g., road vs sidewalk, tumor vs healthy tissue, crop vs weeds).
Evaluate models with precise metrics (IoU/Dice) that depend on correct masks.
Debug data issues (wrong label map, overlapping polygons, palette errors) that can silently break training.

Concept explained simply

A segmentation mask is an image-size grid of numbers where every pixel has a label. For binary masks, 1 means the object and 0 means background. For multiclass masks, each class has its own ID (e.g., 0 background, 1 road, 2 car). For instance masks, each object instance gets a unique ID, even if it is the same class.

Mental model

Imagine laying transparent colored sheets on top of an image. Each sheet marks where one class appears. Stack them into a single map by writing the class ID at each pixel. For instances, think of numbering each separate blob.

Types of masks

Binary segmentation: one foreground class vs background (values typically 0 and 1).
Multiclass segmentation: each pixel belongs to exactly one class (0..K).
Instance segmentation: separates individual objects (IDs per object, plus a mapping to class).
Panoptic segmentation: combines semantic class and instance in one representation.

Formats and label maps

Per-pixel PNG: often stored as 8-bit single-channel where pixel values are class IDs. May use an indexed color palette for visualization.
Polygons: vertices outlining objects; later rasterized to masks.
RLE (run-length encoding): compact text/array form storing runs of foreground pixels.
Label map: a stable mapping like {0: background, 1: road, 2: car}. Keep it consistent across the dataset.

Deep dive: common export rules

Background is 0 by default unless otherwise required.
For multiclass PNGs, ensure no anti-aliasing. Values must be exact integers.
For polygons, decide overlap priority: later polygon wins or class-priority order.
For holes (donut shapes), use interior rings or subtractive polygons.

Workflow: from image to mask

Define label map and class priority (which class wins on overlaps).
Set tool to discrete labels (no feathering, no soft edges).
Annotate with polygons or brush; keep edges tight to object boundaries.
Handle occlusion: label visible area; if instance labeling, separate IDs.
QA your masks: spot-check edges, tiny islands, and correct class IDs.
Export in required format (PNG, polygons, RLE) and verify with a viewer.

Worked examples

Example 1 — Binary mask

Task: Label "road" pixels as 1; everything else 0.

If the road occupies a rectangle region, the mask is a block of 1s in that area. Area is the count of 1-pixels; coverage is area / total pixels.

Example 2 — Multiclass mask

Label map: 0 background, 1 cat, 2 dog. A pixel cannot be both cat and dog. If cat occludes dog, the visible surface gets the cat ID. Ensure the same rule across the dataset.

Example 3 — Instance vs multiclass

Two apples touching: multiclass mask would label both pixels as "apple" (same ID). Instance mask assigns apple-1 and apple-2 with distinct IDs (e.g., 11 and 12) and a lookup indicating both are class "apple".

Quality checks and metrics

Intersection over Union (IoU): overlap / union of predicted vs ground-truth pixels. Ranges 0–1.
Dice (F1) score: 2 × overlap / (predicted + ground truth). Also 0–1.
Edge tightness: visually inspect zoomed boundaries; avoid stair-stepping or gaps.
Connectivity: ensure instances are single connected components unless the object is naturally disjoint.

Mini calculation

If ground truth area is 16 px, prediction area 16 px, and intersection 9 px, then IoU = 9 / (16 + 16 − 9) = 9 / 23 ≈ 0.391 and Dice = 2 × 9 / (16 + 16) = 18 / 32 = 0.5625.

Exercises

Do these to lock in the concepts. A checklist follows each exercise. Compare with the solution only after you try.

Exercise 1 — Rasterize a rectangle and compute IoU

Image grid: 6×6 with rows and columns 0–5. Ground-truth polygon is a rectangle with corners (1,1) → (1,4) → (4,4) → (4,1). Include boundary pixels. Background=0, object=1. Also consider a predicted rectangle with corners (2,2) → (2,5) → (5,5) → (5,2) (also boundary-inclusive).

Create the 6×6 binary mask (rows as strings of 0/1) for the ground-truth rectangle.
Report its area (count of 1s).
Compute IoU with the predicted rectangle mask.

Show solution

Ground-truth mask rows:

Area = 16. Predicted rectangle area = 16. Intersection = rows 2–4 and cols 2–4 = 9. Union = 16 + 16 − 9 = 23. IoU = 9/23 ≈ 0.391.

[ ] Mask rows written with correct 0/1 counts
[ ] Area counted correctly
[ ] IoU computed with union = A + B − intersection

Exercise 2 — Split multiclass into instances and compute Dice

Label map: 0 background, 1 leaf, 2 stem. Given a 5×5 multiclass mask (rows 0–4, cols 0–4):

Row0: 0 0 1 0 0
Row1: 0 1 1 1 0
Row2: 0 0 0 2 2
Row3: 0 1 0 0 0
Row4: 0 1 0 0 0

Split the leaf class (value 1) into instances using 4-connectivity. Assign IDs 11, 12, …
List pixel coordinates (row,col) for each leaf instance.
Model prediction (leaf pixels): {(0,2),(1,1),(1,2),(1,3),(3,1)}. Compute Dice between predicted leaf mask and ground-truth leaf mask.

Show solution

Ground-truth leaf pixels: {(0,2),(1,1),(1,2),(1,3),(3,1),(4,1)}.

Instances (4-connectivity):

ID 11: {(0,2),(1,1),(1,2),(1,3)}
ID 12: {(3,1),(4,1)}

Prediction size = 5, GT size = 6, intersection = 5. Dice = 2×5/(5+6) = 10/11 ≈ 0.909.

[ ] Instances formed with 4-connectivity
[ ] Coordinates listed clearly
[ ] Dice computed as 2×overlap / (pred + gt)

Common mistakes and self-checks

Anti-aliased edges: soft boundaries create non-integer labels. Fix by disabling feathering/anti-aliasing and exporting as indexed or integer masks.
Palette confusion: assuming palette color equals class ID. Verify the numeric pixel values, not the displayed colors.
Overlap rules undefined: polygons drawn in any order. Define a class priority or consistent z-order and stick to it.
Inconsistent label map: changing class IDs mid-project. Keep a single source of truth and validate masks against it.
Disconnected instances: multiple blobs labeled as one instance. Run connected component checks and split them.

Self-check routine

Open a mask as raw integers and confirm values are only the allowed IDs.
Render contours over the image and visually inspect edges at 200% zoom.
Compute class pixel counts and compare across batches for anomalies.
Run IoU/Dice on a validation subset to catch sudden drops.

Practical projects

Mini dataset: Collect 30 images of a simple scene (e.g., desk). Label two classes (object vs background). Export masks and compute per-image IoU for a basic baseline.
Multiclass set: Label 3–4 classes (e.g., keyboard, mouse, mug, background). Create a confusion report from pixel counts.
Instance labeling: Label multiple instances of the same class (e.g., fruits). Verify instance connectivity and counts per image.

Next steps

Learn polygon-to-mask rasterization details and hole handling.
Automate dataset QA: ID validation, class coverage histograms, component checks.
Explore panoptic formats that store both category and instance IDs.

Quick Test info

Everyone can take the quick test below. Sign in to save your progress; otherwise, your result will not be saved. You can retake the test anytime.

Mini challenge

Pick a photo on your device (no upload needed). Define a 3-class label map. Sketch on paper where each class would be in the mask, then write down approximate pixel percentages per class (must sum to 100%). Finally, state which overlap rule you would use and why.

Menu

Segmentation Masks Basics

Table of Contents

Who this is for

Prerequisites

Learning path

Why this matters

Concept explained simply

Types of masks

Formats and label maps

Workflow: from image to mask

Worked examples

Quality checks and metrics

Exercises

Exercise 1 — Rasterize a rectangle and compute IoU

Exercise 2 — Split multiclass into instances and compute Dice

Common mistakes and self-checks

Practical projects

Next steps

Quick Test info

Mini challenge

Practice Exercises

Rasterize a rectangle and compute IoU

Instructions

Expected Output

Split multiclass into instances and compute Dice

Segmentation Masks Basics — Quick Test

Have questions about Segmentation Masks Basics?

AI Assistant