Why this matters
Computer Vision systems touch safety, privacy, and business decisions. Clear documentation and strong governance help you:
- Prove how data was collected, labeled, and used (compliance and trust).
- Explain model behavior and limits to stakeholders (transparency).
- Gate releases with approvals, audit trails, and rollback plans (risk control).
- Reproduce results, detect drift, and respond to incidents (reliability).
Who this is for
- Computer Vision Engineers shipping models to production.
- ML Ops practitioners standardizing processes for vision pipelines.
- Team leads who need auditable change control and risk management.
Prerequisites
- Basic CV modeling knowledge (classification/detection/segmentation).
- Familiarity with version control and experiment tracking.
- Awareness of data privacy and security basics.
Concept explained simply
Documentation is the living story of your system: what it is, why it exists, what data it uses, how it performs, and where it can fail. Governance is the set of rules and checkpoints ensuring that story stays accurate, safe, and accountable.
Mental model
- Recipe: Templates (datasheets, model cards) make your work repeatable.
- Chain-of-custody: Every asset (data, code, model) is traceable by version.
- Gates: A release passes only if required evidence is present and approved.
Key artifacts (open to view)
- Datasheet for Datasets: purpose, collection, consent, labeling, composition, known issues, license, retention.
- Model Card: intended use, limits, metrics by subgroup, risks, mitigations, training data summary, evaluation protocol, version.
- Decision/Change Record: rationale, risks, approvals, test evidence, rollout and rollback plan.
- Incident Report: what happened, impact, root cause, remediation, lessons learned.
- Policy Snippets: data retention, access control, PII handling, secure storage, dependency licenses.
Worked examples
Example 1 — Dataset Datasheet snippet (Retail Shelf Images v2)
- Purpose: Train product detector for shelf auditing in supermarkets.
- Collection: Store cameras during off-hours. No customers present; employee faces blurred.
- Composition: 28k images, 640×640; regions: US (60%), EU (40%); lighting: day (70%), night (30%).
- Labeling: 42 classes annotated by 3 labelers; 10% adjudicated by a senior reviewer.
- Known Issues: Underrepresented night scenes in EU; small items (<16px) often missed.
- Consent & Privacy: No PII expected; policy enforces face auto-blur. Verified by script and manual spot checks (1% sample).
- License & Usage: Internal, non-commercial. Third-party logos appear; handled under fair use for internal QA.
- Retention: Raw video deleted after 30 days; derived images retained 2 years.
- Governance Checks: Privacy review OK; bias check flagged EU-night gap; mitigation plan: collect +2k EU-night images.
Example 2 — Model Card snippet (HelmetDetector-v1)
- Intended Use: Detect hard hats on workers in industrial sites to trigger safety alerts.
- Out of Scope: Non-industrial environments, motorcycle helmets, thermal imagery.
- Inputs/Outputs: 1080p RGB frames; output: bounding boxes with class {helmet, no-helmet}, confidence.
- Metrics: mAP@0.5=0.89 overall; Recall=0.92 day / 0.84 night. By subgroup: skin tone (Fitzpatrick I–VI): recall ranges 0.90–0.93 day; 0.80–0.86 night.
- Risks: Night-time false negatives; occlusion by hoods; PPE variants without brim.
- Mitigations: Night-specific threshold, IR illumination where allowed, human verification for critical alerts.
- Training Data Summary: 120k labeled frames from 12 sites; synthetic augmentation for night.
- Evaluation Protocol: Site-stratified split; report per-site and per-lighting metrics.
- Version & Repro: v1.3; code tag: helm-det@1.3; data hash: ds_helmet_2025-11-15; seed fixed; training config attached.
- Deployment Gate: Requires per-site shadow run and safety sign-off.
Example 3 — Change Request (CR-2026-014: YOLOv8→YOLOv9 upgrade)
- Rationale: +4% mAP on small objects in lab benchmarks; same latency on GPU T4.
- Risk: Medium — decoder changes may affect calibration in edge cases.
- Evidence: A/B on 5 sites; mAP +3.1%; FP rate +0.3% at threshold 0.5; meets SLOs.
- Checks: Privacy N/A; License OK; Security scan clear; Repro pack uploaded.
- Rollout: Canary 10% traffic for 48h; auto-rollback if alert FN > baseline +1%.
- Approvals: Model owner, product lead, safety officer.
- Decision: Approved; scheduled for 2026-01-12.
Step-by-step workflow
- Prepare templates
- Create lightweight templates for datasheets, model cards, change records, and incidents.
- Version everything
- Assign IDs to data snapshots, code commits, model artifacts, configs, and evaluation reports.
- Document data lifecycle
- Collection → labeling → QA → storage → access → retention/deletion. Note PII handling and licenses.
- Write the Model Card
- State intended use, limits, metrics by subgroup, known risks, mitigations, and deployment gates.
- Run risk checks
- Privacy, bias, safety, and security checklists. Record results and actions.
- Seek approvals
- Route for sign-off based on risk level (engineering, product, legal/compliance, domain expert).
- Release with gates
- Block deploy unless required docs, tests, and approvals are attached. Include rollback plan.
- Monitor and log
- Track SLOs and drift signals. Log changes and on-call runbooks for incidents.
- Incident response
- Capture impact, root cause, remediation, and updates to docs and processes.
- Retention and audits
- Follow retention schedules; periodically review docs and access controls.
Exercises
Complete the tasks below. Solutions are available for self-checking.
Exercise 1 (ex1): Focused Model Card
Draft a one-page Model Card for an industrial surface defect detector (DefectSpotter-v3) used on a conveyor line. Include: intended use/out-of-scope, inputs/outputs, metrics (overall + by material type), risks and mitigations, training data summary, evaluation protocol, versioning, and deployment gate.
Hints
- Include at least one subgroup breakdown (e.g., matte vs glossy surfaces).
- Call out failure modes like glare or motion blur.
- Tie deployment gates to site-specific validation.
Show solution
Model Card — DefectSpotter-v3
- Intended Use: Detect scratches and pits on metal panels on Line A for automated rejection.
- Out of Scope: Textiles, wood, and very dark anodized surfaces.
- Inputs/Outputs: 12MP RGB stills, overhead lighting; outputs: boxes with {scratch, pit}, score.
- Metrics: mAP@0.5=0.94 overall; Recall: matte 0.96, glossy 0.90; Precision: matte 0.95, glossy 0.92.
- Risks: Glare on glossy surfaces; motion blur at 2m/s; camera misalignment.
- Mitigations: Polarizing filter, shutter 1/2000s, weekly camera alignment checklist.
- Training Data Summary: 80k images from Lines A/B; 20% glossy; augment glare.
- Evaluation Protocol: Split by line and material; report per-material metrics; hardware-in-the-loop test.
- Version & Repro: v3.2; code: ds-spotter@3.2; data: ds_metal_2025-10-01; config hash attached.
- Deployment Gate: Must pass glossy recall ≥0.92 in on-site validation; maintenance sign-off required.
Exercise 2 (ex2): Governance Change Request
Create a Change Request entry for retraining the helmet detector with +2k EU-night images. Include rationale, risk rating, evidence, privacy/bias checks, rollout and rollback plan, and required approvals.
Hints
- Reference the subgroup metric you intend to improve (night recall).
- State quantitative rollback triggers.
- Note any PII considerations even if none are expected.
Show solution
CR-2026-022 — Retrain with EU-night images
- Rationale: Address lower night recall in EU sites (0.84 → target ≥0.88).
- Risk: Low-Medium; same architecture and thresholds.
- Evidence: Offline eval: overall mAP +1.2%; EU-night recall +4.1%; FP rate stable (+0.1%).
- Privacy: No PII; faces are blurred by pipeline; spot check 1% confirmed.
- Bias: Report subgroup metrics (site, lighting, skin tone). No regressions.
- Rollout: Canary to 2 EU sites for 72h; trigger rollback if EU-night recall <0.86 or alerts exceed baseline +1%.
- Approvals: Model owner, product lead; safety officer informed.
- Decision: Approved pending canary success.
Deployment-ready checklist
Common mistakes and self-check
- Mistake: One-time docs that never update. Fix: Update cards on every material change (data, model, thresholds).
- Mistake: Reporting metrics only overall. Fix: Include subgroup breakdowns tied to real-world risk.
- Mistake: Missing rollback criteria. Fix: Predefine numeric triggers before rollout.
- Mistake: Ignoring licenses and retention. Fix: Document licenses and deletion timelines; verify in audits.
- Mistake: Vague "works on my machine" evidence. Fix: Attach exact versions, seeds, configs, and environment details.
Self-check prompts
- Can a new teammate reproduce training with only the docs?
- Can you explain where the model fails and what you do about it?
- Is there a clear, measurable condition to roll back?
Practical projects
- Docs Pack: Pick an open CV dataset and a simple detector. Write a datasheet and a model card with subgroup metrics (e.g., lighting).
- Release Gate: Use a CI job to verify that model artifacts include a model card and test report before tagging a release.
- Incident Drill: Simulate drift (e.g., new camera) and write an incident report plus an update to the model card and change log.
Learning path
- Start: This subskill — build your templates and practice on a toy project.
- Then: Monitoring and drift detection — connect metrics to governance triggers.
- Next: Deployment strategies — canary, blue/green, rollback.
- Advanced: Privacy-preserving vision (blurring, federated learning, on-device processing).
Next steps
- Adopt the templates from this lesson in your current project.
- Schedule a lightweight review with cross-functional stakeholders.
- Integrate documentation checks into your build/release pipeline.
Mini challenge
Your detector will expand to nighttime operations at two new sites. Last audit showed night recall is borderline. What single document would you update first, and what new metric breakdown would you add? Write 3–5 sentences.
Quick Test
Take the quick test to check your understanding. The test is available to everyone. Only logged-in users will have their progress saved.