Why this matters
In computer vision, the model can only learn what you ask it to learn. A clear task and label schema prevents wasted annotation, enables consistent quality checks, and directly impacts model accuracy and cost. Real tasks include: identifying defects on a production line, counting people in retail footage, localizing vehicles for ADAS, segmenting tumors in medical scans, and classifying product images for e-commerce.
Typical professional tasks this unlocks
- Translating business goals into a precise CV task (classification, detection, segmentation, keypoints, OCR).
- Designing unambiguous label definitions and hierarchy.
- Writing annotation guidelines with edge cases and examples.
- Defining data splits, quality metrics, and inter-annotator agreement checks.
- Piloting annotation and iterating schema to reduce noise.
Who this is for
- Computer Vision Engineers and Data Scientists planning datasets.
- Annotation leads and QA reviewers.
- Product/PMs scoping ML features and acceptance criteria.
Prerequisites
- Basic understanding of image data formats and datasets.
- Familiarity with CV task types: classification, detection, segmentation, keypoints, OCR.
- Basic model metrics: precision/recall, IoU, F1.
Concept explained simply
Defining the task and label schema means deciding exactly what the model must output and how humans will mark it in data. It converts a business question (e.g., ".find dents on cars.") into concrete labels (e.g., "bounding boxes for dents with severity: minor/major").
Mental model
Think of it like drafting a contract between the business, annotators, and the model:
- Inputs: what images/videos are in scope and what is out of scope.
- Outputs: exact label types and formats.
- Rules: how to handle tricky cases and what to do when uncertain.
- Quality: how success is measured and reviewed.
Picking the right task type
Quick guide
- Classification: one label per image or multi-label per image. Use when presence/absence is enough.
- Object Detection: bounding boxes for each instance. Use for counting and localization.
- Instance Segmentation: precise pixel mask per object. Use when shape matters.
- Semantic Segmentation: pixel mask per class (no instance IDs). Use when object identity is less important.
- Keypoints/Pose: specific landmark coordinates. Use for pose, alignment, or measurements.
- OCR: text localization and transcription. Use for documents, signs, plates.
- Tracking: associate objects across frames. Use for video analytics.
Start with the minimum output needed to meet the business KPI. Simpler tasks cost less and annotate faster.
Designing the label schema
- Define classes: exhaustive list, mutually exclusive where applicable. Provide definitions with positive and negative examples.
- Attributes: optional or required properties (e.g., occluded: yes/no, severity: minor/major).
- Hierarchy: parent/child relationships (Vehicle → Car/Truck/Bus).
- Instance rules: when does a new instance start/end? How to merge/split overlapping items.
- Spatial representation: box, polygon, mask, keypoints, line, or text region with transcription.
- Uncertainty handling: allow "uncertain" or "ignore" regions to avoid forcing wrong labels.
- Metadata: scene conditions (lighting, weather), annotator flags, and versioning.
Example label definition template
{
"class": "Car",
"definition": "A road vehicle designed primarily for passenger transport with four wheels.",
"include": ["sedans", "hatchbacks", "SUVs"],
"exclude": ["pickup trucks", "vans", "golf carts"],
"representation": "bounding_box",
"attributes": {
"occluded": {"type": "boolean", "required": true},
"truncated": {"type": "boolean", "required": true}
},
"edge_cases": [
"Car behind fence → label as Car with occluded=true",
"Half outside frame → truncated=true"
]
}
Label instructions and edge cases
Write concise instructions people can follow consistently.
- Use must/should language and avoid ambiguity.
- Include visual examples of correct and incorrect labels.
- Call out common edge cases explicitly (small objects, occlusions, motion blur, reflections).
- Define minimum size thresholds (e.g., ignore objects smaller than 12x12 px).
- Define time-based rules for video (e.g., track identity persists across occlusion up to 10 frames).
Quality metrics and agreement
- Detection: IoU thresholds (e.g., 0.5), precision/recall by class and size bucket.
- Segmentation: mean IoU or Dice; boundary quality if relevant.
- Classification: accuracy, F1, per-class confusion.
- Agreement: inter-annotator agreement (IAA) via overlap metrics or Cohen’s kappa for classification.
- Gold tasks: seed known examples for ongoing QA.
Lightweight QA flow
- Pilot: annotate 50–200 samples with two annotators.
- Measure IAA and error types (missing, wrong class, bad geometry).
- Revise schema/instructions; repeat until stable.
- Scale with spot-checks and gold tasks.
Worked examples
Example 1: Retail shelf compliance
- Goal: Detect if each product is present and front-facing.
- Task: Object detection with attributes.
- Classes: {Product_A, Product_B, Product_C}.
- Attributes: facing={front, angled, back}, occluded={yes/no}.
- Ignore: objects under 1% of image area.
- Metric: mAP@0.5, per-class.
Example 2: Road lane segmentation
- Goal: Identify drivable area and lane markings.
- Task: Semantic segmentation.
- Classes: {drivable, lane_marking, curb, background}.
- Rules: lane_marking only on paint; exclude shadows/reflections.
- Metric: mIoU; boundary IoU for lane_marking.
Example 3: Face landmarks
- Goal: Align faces for AR filters.
- Task: Keypoints (68 landmarks) + visibility attribute.
- Rules: if landmark occluded, set visibility=false and estimate location if reasonable; otherwise mark as missing.
- Metric: NME (normalized mean error) over visible points.
Step-by-step: from problem to schema
- State the decision you want to support (e.g., "alert staff when shelf gap exists").
- Pick the simplest task that enables that decision.
- List classes and attributes. Make them mutually exclusive or clearly multi-label.
- Choose representation (box/polygon/mask/keypoints) and size thresholds.
- Write edge-case rules and uncertainty policy.
- Define metrics and acceptance thresholds.
- Run a pilot, measure agreement, and iterate.
Exercises
Complete these in your own notes or a doc. The quick test below checks core concepts. Everyone can take the test; only logged-in users get saved progress.
Exercise 1: Supermarket shelf monitoring (mirror of ex1)
Design a label schema to detect and count three products on shelves and flag if any product is not front-facing.
- Deliver: task type, classes, attributes, ignore rules, and metrics.
- Consider: tiny products, reflections, occlusions, and similar packaging.
Exercise 2: Medical polyp detection (mirror of ex2)
Endoscopy video: detect and localize polyps, and mark uncertainty when visibility is poor.
- Deliver: task type, representation, attributes, uncertainty handling, and QA plan.
- Consider: motion blur, specular highlights, and tiny lesions.
Self-check checklist
- Did you pick the simplest task that meets the goal?
- Are classes exhaustive and non-overlapping, or clearly multi-label?
- Do edge cases have explicit rules?
- Is there a defined ignore/uncertain policy?
- Do you have clear metrics and acceptance thresholds?
- Did you plan a small pilot and IAA measurement?
Common mistakes and self-check
- Too many classes: merge where decisions do not require the split.
- Ambiguous definitions: add include/exclude examples and edge-case rules.
- No uncertainty option: forces wrong labels; add uncertain/ignore.
- Skipping pilot: disagreements stay hidden; always pilot and measure IAA.
- Over-precise geometry: use boxes instead of polygons if shape is not needed.
- Poor size thresholds: define minimum size to avoid noise.
Practical projects
- Project 1: Create a 4-class traffic object detection schema with attributes (occluded, truncated). Pilot on 100 images and report IAA and error types.
- Project 2: Build a semantic segmentation schema for indoor rooms (wall, floor, ceiling, window, door). Define ignore regions (mirrors, reflections) and measure mIoU on a validation split.
- Project 3: OCR: define text region detection + transcription conventions (case, punctuation, illegible tag). Create 50 gold examples.
Learning path
- Review task types and when to use each.
- Draft label schema and instructions for a simple dataset.
- Run a 50–200 sample pilot with dual annotation.
- Measure IAA, revise schema, and document changes.
- Scale annotation with periodic QA and gold tasks.
Next steps
- Turn your exercise outputs into a one-page guideline doc.
- Set acceptance metrics and thresholds for go/no-go.
- Prepare a small gold set to monitor drift during scale-up.
Mini challenge
You are asked to detect road signs and also read the speed limit. Propose a two-stage labeling approach that balances cost and accuracy in 5 bullet points.
Take the Quick Test
Ready to check your understanding? Take the quick test below. Everyone can take it; only logged-in users will have results saved.