luvv to helpDiscover the Best Free Online Tools
Topic 3 of 7

Documenting Assumptions And Limitations

Learn Documenting Assumptions And Limitations for free with explanations, exercises, and a quick test (for Applied Scientist).

Published: January 7, 2026 | Updated: January 7, 2026

Why this matters

As an Applied Scientist, your models drive product decisions, experiments, and risk assessments. Clear documentation of assumptions and limitations prevents misuse, supports reproducibility, and helps stakeholders choose the right solution.

  • Product decisions: Explain what conditions must hold for the model to work (e.g., traffic volume, user behavior).
  • Experimentation: State design assumptions (randomization quality, stationarity) and known threats to validity.
  • Risk and compliance: Surface fairness, privacy, and safety constraints, along with mitigations and monitoring.

Concept explained simply

Assumptions are things you expect to be true but have not fully proven in your current work (e.g., data distribution is stable next quarter). Limitations are known boundaries where your approach may underperform or fail (e.g., low recall for rare classes).

Mental model

Think of your model like a tool with a label:

  • Use for: The problem slice where it shines.
  • Works because: The assumptions that make it valid.
  • Do not use for: Conditions where performance degrades.
  • Handle with care: Risks and mitigations.

A practical template

Use this template to write one page that anyone can understand:

Context: One sentence on the goal and where the model is used.

Assumptions (A):
- A1: [Claim], evidence: [metric/test], scope: [data/time/domain].
- A2: ...

Limitations (L):
- L1: [Boundary/weakness], impact: [metric/user segment], severity: [low/med/high].
- L2: ...

Uncertainties (U):
- U1: [What we don’t yet know], plan to learn: [experiment/monitor], timeline: [date].

Mitigations (M):
- M1: [Control/guardrail], owner: [name/team], trigger: [threshold].

Monitoring & Triggers:
- Metric: [e.g., recall minority segment], threshold: [value], action: [rollback/fallback/alert].

Version & Date:
- Model vX.Y, documented on [YYYY-MM-DD].
Example phrases you can reuse
  • Assumption: "We assume input text is predominantly English (>95%), based on language-ID on a 100k sample (last 30 days)."
  • Limitation: "Performance drops for cold-start users (AUC -0.07); avoid using scores for first 24 hours of activity."
  • Uncertainty: "Long-term decay of calibration beyond 90 days is unknown; schedule recalibration check quarterly."
  • Mitigation: "If drift (PSI > 0.25) is detected, switch to baseline heuristic and alert the on-call."

Worked examples

Example 1: Fraud detection model (tabular, imbalanced)

Context: Real-time fraud scoring for card transactions.

  • Assumptions:
    • A1: Merchant category distributions remain within ±10% of last quarter; monitored via PSI.
    • A2: Label delay ≤ 14 days; training uses labels up to 21 days old to reduce leakage.
  • Limitations:
    • L1: Sparse history users (≤ 2 prior transactions) have 12% lower recall; scores should be combined with rules for these users.
    • L2: Model not calibrated for cross-border transactions; expect higher false positives for foreign MCCs.
  • Uncertainties:
    • U1: Impact of new 3DS policy changes; A/B ramp planned next month.
  • Mitigations:
    • M1: If recall_weekly < 0.70 on RU/CIS region, route to manual review queue.
Example 2: Forecasting daily demand (time series)

Context: 30-day horizon demand forecast for inventory planning.

  • Assumptions:
    • A1: Seasonality patterns are stable year-over-year after holiday normalization.
    • A2: Promotions calendar is accurate and finalized 2 weeks before launch.
  • Limitations:
    • L1: Cannot capture sudden black-swan events; uncertainty intervals widen during anomalies.
    • L2: Low-volume SKUs (weekly sales < 20) have MAPE > 35%; use safety stock buffers.
  • Uncertainties:
    • U1: Supplier lead-time variance increasing; assess impact via scenario stress tests.
  • Mitigations:
    • M1: If WAPE > 25% for 2 consecutive weeks, escalate to manual override for top 50 SKUs.
Example 3: Toxicity classifier (NLP)

Context: Flag harmful comments for moderator review.

  • Assumptions:
    • A1: Training distribution covers major dialects currently seen (>90% coverage by language ID).
    • A2: Human labels are majority-vote from trained raters; inter-rater agreement κ ≥ 0.62.
  • Limitations:
    • L1: Higher false positives for reclaimed slurs in community contexts; moderators should rely on context view.
    • L2: Sarcasm detection is weak; expect under-flagging in sarcastic content.
  • Uncertainties:
    • U1: Shift due to new platform slang; monthly lexicon updates planned.
  • Mitigations:
    • M1: Fairness checks per demographic proxy; if FNR disparity > 10pp, trigger retraining with reweighting.

How to write clear assumptions and limitations

  • Make them testable: include a metric or observable condition.
  • Bound them: specify time window, data slice, or scenario.
  • Trace them: cite the source (sample size, test name, date).
  • Quantify impact: describe effect on users or metrics.
  • Pair with a mitigation: what you will do if it breaks.
Good vs. vague statements
  • Vague: "Data probably stable."
  • Good: "Feature drift PSI < 0.1 for top 20 features (last 30 days vs. training)."
  • Vague: "Works on most users."
  • Good: "AUC ≥ 0.86 for users with ≥ 30 days history; drops to 0.79 for newer users."

Exercises

Complete the tasks below, then compare with the solutions. Tip: Write assumptions as "Condition + Evidence + Scope" and limitations as "Boundary + Impact + Severity".

Exercise 1 — Rewrite vague statements

Scenario: A recommendation model for news articles.

  • Vague assumption A: "Users like fresh content."
  • Vague limitation L: "Cold-start is an issue."

Task: Rewrite A and L into clear, testable statements including evidence and scope.

  • A includes measurable condition
  • A cites evidence or test
  • L states specific impact/segment
  • L includes severity or threshold
Show sample answer

Assumption: "Clicks skew to articles < 24h old (CTR +18% vs. older), confirmed on a 500k-session sample (last 14 days)."

Limitation: "For new users (account age < 48h), recall@10 is 0.06 lower; severity: medium—add popularity fallback until 5 sessions observed."

Exercise 2 — Surface hidden assumptions

Scenario: You plan an uplift model to target a discount. The business wants to launch next week.

Task: List at least 4 hidden assumptions that must hold for results to be trustworthy, and propose one mitigation each.

  • Randomization or proper counterfactual design
  • Stable conversion tracking and label definitions
  • Sufficient sample size and variance assumptions
  • No interference or spillover across groups
Show sample answer
  • A1: Randomization is unbiased; check via covariate balance before launch. Mitigation: re-randomize if SMD > 0.1.
  • A2: Tracking latency ≤ 1h; verify end-to-end event delivery. Mitigation: add backup batch ingestion.
  • A3: Power ≥ 80% for expected uplift 1.5pp; compute sample size. Mitigation: extend duration by 2 weeks if needed.
  • A4: No cross-group coupon sharing; Mitigation: per-user code restriction, monitor leakage rate.

Note: The quick test is available to everyone; only logged-in users will have their progress saved.

Common mistakes and self-check

  • Listing generic risks without thresholds. Self-check: Do you have numbers, dates, or segments?
  • Confusing uncertainties with limitations. Self-check: Can this be tested soon? If yes, it is an uncertainty.
  • Not scoping to specific user segments. Self-check: Name the segment and metric direction.
  • Omitting monitoring hooks. Self-check: What alert triggers action?
  • Over-promising. Self-check: Can you defend each claim with current evidence?

Practical projects

  • Write a one-page ALUM doc for a model you built (or a public dataset project). Include at least 3 assumptions, 3 limitations, and 2 mitigations.
  • Run a simple drift analysis on a dataset snapshot (e.g., compare two weeks). Document findings as assumptions/limitations with thresholds.
  • Peer review: Swap ALUM docs with a colleague; each adds one missing assumption and one clearer limitation.

Mini challenge

In 6 sentences or fewer, write assumptions and limitations for a sentiment model used to prioritize customer support tickets. Include one monitoring trigger.

Learning path

  1. Foundation: Review your model’s objective, key metrics, and data generation process.
  2. Evidence gathering: Run sanity checks (drift, calibration, segment performance).
  3. Draft ALUM: Fill the template; keep it to one page.
  4. Peer review: Ask a PM/engineer to challenge assumptions; refine.
  5. Operationalize: Add monitoring thresholds and owners; schedule review cadence.

Who this is for

  • Applied Scientists and ML Engineers shipping models to production.
  • Data Scientists preparing experiment or modeling reports.
  • PMs and Analysts who need clear, decision-ready documentation.

Prerequisites

  • Basic understanding of model evaluation (precision/recall/AUC/calibration).
  • Familiarity with dataset splits and drift concepts.
  • Comfort summarizing evidence from experiments or analyses.

Next steps

  • Convert one of your recent project notes into the ALUM template.
  • Set monitoring thresholds for two critical metrics and define an action per threshold.
  • Take the quick test to check your understanding.

Practice Exercises

2 exercises to complete

Instructions

Scenario: A recommendation model for news articles. Improve the clarity of one assumption and one limitation.

  • Assumption: "Users like fresh content."
  • Limitation: "Cold-start is an issue."

Rewrite both with measurable conditions, evidence, and scope.

Expected Output
Two concise statements: one assumption and one limitation, each including metric/evidence and scope/segment.

Documenting Assumptions And Limitations — Quick Test

Test your knowledge with 6 questions. Pass with 70% or higher.

6 questions70% to pass

Have questions about Documenting Assumptions And Limitations?

AI Assistant

Ask questions about this tool