luvv to helpDiscover the Best Free Online Tools
Topic 5 of 8

Slice Based Analysis By Conditions

Learn Slice Based Analysis By Conditions for free with explanations, exercises, and a quick test (for Computer Vision Engineer).

Published: January 5, 2026 | Updated: January 5, 2026

Why this matters

Slice-based analysis checks model performance under specific conditions (slices) such as low light, rain, small objects, specific cameras, or heavy occlusion. This is how Computer Vision Engineers find hidden failure modes and turn them into concrete fixes.

  • Diagnose real-world incidents: Why did precision drop at night on Camera C?
  • Plan data collection: Which conditions (e.g., fog + backlight) need more labeled samples?
  • Decide mitigations: Threshold tweaks, augmentations, or specialized submodels for hard slices.
  • Communicate risk: Reliable metrics and confidence intervals per condition build stakeholder trust.

Concept explained simply

A slice is a subset of your evaluation data defined by a condition. Example: all images with brightness < 0.3 (low light). You compute metrics on that subset and compare to the overall dataset.

Key idea: Overall accuracy can look fine while certain conditions fail badly. Slice-based analysis exposes those blind spots.

Mental model

  • Think of your dataset as a layered stack. Each layer (slice) is a condition-based filter.
  • For each layer: filter → compute metrics → compute uncertainty (CI) → compare → take action.
  • Actionability drives the process: each slice should map to a possible fix (data, training, inference).

How to do it step-by-step

  1. Choose target metrics. Classification: precision/recall/F1. Detection: mAP, recall@IoU, FP per image. Segmentation: mean IoU, pixel F1.
  2. List candidate conditions. Lighting, weather, motion blur, object size buckets, occlusion level, camera ID, scene type, time-of-day, geographic region, or other available metadata (ethically and with consent when using demographic attributes).
  3. Define precise predicates. Example: low_light = brightness < 0.3; small_object = bbox_short_side < 32 px; heavy_occlusion = occluded_area_ratio ≥ 0.3.
  4. Implement slicing.
    Show simple pseudo-code
    def slice_low_light(sample):
        return sample.brightness < 0.3
    
    def slice_small_obj(det):
        return min(det.w, det.h) < 32
    
    # Apply to dataset and compute metrics
    low_light_samples = [s for s in dataset if slice_low_light(s)]
    metrics_low = compute_metrics(low_light_samples)
    metrics_all = compute_metrics(dataset)
  5. Compute uncertainty. Add 95% confidence intervals. For rates (e.g., recall), use Wilson or normal approximation. Prefer n ≥ 30 positives per slice.
  6. Interpret and compare. Look for meaningful gaps beyond CI overlap. Beware multiple comparisons if you test many slices; focus on top gaps.
  7. Decide actions. Data collection targets, augmentation strategies, loss reweighting, resolution/anchor changes, calibration, or per-slice thresholds (only if the condition is known at inference time).

Worked examples

Example 1 — PPE helmet detection in a factory
  • Slices: motion_blur (blur_score > 0.5), backlight (brightness < 0.25), small_objects (short_side < 24 px).
  • Detection metric: recall@IoU=0.5

Results (n_positives shown per slice):

  • All: recall 0.91 (n=1200)
  • motion_blur: recall 0.72 (n=140)
  • backlight: recall 0.69 (n=95)
  • small_objects: recall 0.63 (n=110)

Actions: add motion-blur augmentations, adjust exposure in data collection, increase input resolution or use scale-aware training for small objects.

Example 2 — Road sign detection
  • Slices: night (time 20:00–05:00), rain (weather=rain), occluded ≥ 30%.
  • Metric: mAP@0.5

Results:

  • All: 0.78
  • night: 0.58
  • rain: 0.60
  • occluded≥30%: 0.55

Actions: add synthetic rain/night augmentation; larger/stronger backbone; specialized occlusion-aware training; confirm the condition is identifiable at inference time if considering per-slice thresholds.

Example 3 — Face landmark localization
  • Slices: strong_pose (yaw>30°), high_blur (blur>0.6), dark_scenes (brightness<0.2).
  • Metric: NME (normalized mean error; lower is better)

Results:

  • All: NME 0.035
  • strong_pose: 0.051
  • high_blur: 0.057
  • dark_scenes: 0.049

Actions: pose-aware augmentation, sharpen training with deblurring pipeline, brightness/contrast augmentation; consider curriculum learning on hard slices.

Practical heuristics and thresholds

  • Sample size: aim for ≥ 30 positives per slice for recall/precision; if fewer, report wide CIs and treat conclusions as tentative.
  • Confidence intervals: prefer Wilson for rates; for mAP/IoU, use bootstrapping over samples.
  • Meaningful gap: prioritize slices where the difference exceeds both 3–5 percentage points and the CI overlap suggests a real drop.
  • Overlapping slices: allowed. Avoid double-counting when aggregating; report per-slice metrics independently and consider exclusive bins when needed.
  • Per-slice thresholds: only if the condition is reliably known at inference time and checked for drift.
  • Multiple comparisons: if exploring many slices, expect some false alarms. Use domain knowledge to focus and validate with a holdout.

Common mistakes and how to self-check

  • Using vague predicates (e.g., "+/- some blur"). Self-check: are your slice rules deterministic and reproducible?
  • Too-small slices causing noisy metrics. Self-check: show CI; defer decisions if n is tiny.
  • Changing thresholds per slice without runtime access to the condition. Self-check: is the condition available and stable at inference?
  • Confounding factors (e.g., one camera has both night and rain). Self-check: stratify by camera or analyze intersecting slices.
  • Ignoring actionability. Self-check: each slice should suggest at least one concrete fix.

Hands-on exercises

Mirror of the exercises section below. Try them now; then compare with the solutions.

  1. Exercise 1: Compute per-slice precision, recall, and F1 from counts. Write one action you would take.
  2. Exercise 2: Compute 95% CIs for recall in two slices and judge if the gap is meaningful.
  3. Exercise 3: Write clear slice predicates in pseudo-code for given conditions.
  • ☐ I used precise, reproducible predicates
  • ☐ I checked sample sizes and added CIs
  • ☐ I identified at least one actionable fix per weak slice

Practical projects

  • Build a slice dashboard: given predictions, compute and visualize metrics per condition with CIs and top-5 worst slices.
  • Targeted data collection: pick two worst slices, collect/label 500 more examples each, retrain, and re-evaluate the same slices.
  • Per-slice calibration: learn temperature scaling per known runtime condition and compare to a single global calibration.

Who this is for

  • Computer Vision Engineers and ML practitioners who need reliable, production-ready models.
  • Data scientists performing model validation and planning data improvements.

Prerequisites

  • Basic CV metrics (precision/recall/F1, IoU, mAP).
  • Ability to parse dataset metadata and compute metrics.
  • Comfort with simple statistics (confidence intervals).

Learning path

  1. Review core metrics for your task (classification/detection/segmentation).
  2. List realistic conditions from your domain and define predicates.
  3. Compute per-slice metrics with CIs; identify top gaps.
  4. Plan and apply targeted fixes; re-run the same slices.

Next steps

  • Automate slice generation and reporting in your evaluation pipeline.
  • Add drift checks to catch changing slice distributions in production.
  • Document known failure slices and mitigation plans for your team.

Mini challenge

Pick three conditions relevant to your project. Define precise predicates, compute per-slice metrics with CIs on your last validation run, and identify the single most actionable fix you can deploy this week.

Try the quick test

The quick test below is available to everyone; only logged-in users get saved progress.

Practice Exercises

3 exercises to complete

Instructions

You evaluate a binary classifier. For two slices, you have:

  • low_light: TP=56, FP=12, FN=24
  • normal_light: TP=110, FP=20, FN=10

Tasks:

  1. Compute precision, recall, and F1 for each slice.
  2. Which slice is weaker and by how much on recall?
  3. Propose one concrete action to improve the weaker slice.
Expected Output
Numerical precision/recall/F1 per slice; identification of weaker slice; one actionable mitigation.

Slice Based Analysis By Conditions — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Slice Based Analysis By Conditions?

AI Assistant

Ask questions about this tool