luvv to helpDiscover the Best Free Online Tools
Topic 7 of 8

Defining Error Handling Rules

Learn Defining Error Handling Rules for free with explanations, exercises, and a quick test (for ETL Developer).

Published: January 11, 2026 | Updated: January 11, 2026

Why this matters

As an ETL Developer, you will routinely face bad inputs, schema surprises, duplicates, and downstream load failures. Clear error handling rules let you detect issues early, protect data quality, and keep stakeholders informed without guesswork. You’ll use these rules to decide when to reject a record, when to fix it, when to retry, and when to stop the pipeline.

  • Real tasks: validating source feeds, quarantining corrupt records, deduplicating, handling late-arriving dimensions, retrying transient failures, and alerting owners.
  • Outcomes: consistent data quality, predictable operations, auditable decisions, and faster incident resolution.
Progress note

The quick test is available to everyone. If you are logged in, your progress will be saved automatically.

Concept explained simply

Error handling rules are if-then decisions for data and operations. They specify what to detect, how severe it is, what action to take, who to notify, and how to measure it.

Mental model

Think of a 5-step loop for every pipeline stage:

  1. Detect: Identify the condition (e.g., null in required field, duplicate key).
  2. Classify: Assign severity (info, warning, error, critical) and category (data quality, schema, operational, business rule).
  3. Decide: Choose action (skip, correct, default, route to quarantine, retry, fail).
  4. Act: Execute the action consistently and idempotently.
  5. Record: Log the event with enough context to trace, aggregate metrics, and alert if thresholds are exceeded.

Common error categories

  • Data quality: missing required values, invalid formats, out-of-range numbers, invalid enums.
  • Identity/deduplication: duplicate natural keys, conflicting updates.
  • Referential integrity: foreign keys pointing to non-existent dimension rows.
  • Schema mismatch: unexpected columns, missing required columns, type changes.
  • Business rule violations: order date in the future, negative quantity.
  • Operational: network timeouts, API rate limits, disk full.
  • Load conflicts: primary/unique key violations on target write.

Actions and severity

  • Record-level actions: correct/standardize, default, reject/quarantine, flag and continue.
  • Batch/pipeline actions: retry with backoff, partial continue, fail-fast, circuit-break (stop downstream steps).
  • Severities guide behavior: Warning (track, no stop), Error (quarantine/retry), Critical (fail job, immediate alert).
What to log for each rule
  • Rule ID and name
  • Timestamp and pipeline step
  • Sample of offending values (mask PII)
  • Record identifiers (e.g., order_id)
  • Action taken (rejected, defaulted, retried)
  • Correlation/run ID for traceability

Designing rules

Use a consistent template so your team can review, implement, and audit rules.

Rule template (copy/paste)
  • Rule ID: ER-###
  • Condition: Describe the exact predicate (e.g., email NOT matching regex).
  • Scope: record-level or batch-level
  • Severity: warning | error | critical
  • Action: default | correct | reject to quarantine | dedupe | suspense table | retry | fail
  • Retry policy: attempts, backoff, cutoff
  • Alerting: none | email/on-call | dashboard-only
  • Owner: team or role
  • Metrics: counters, error rate thresholds
  • Retention: how long to keep quarantined data

Thresholds you should define

  • Per-rule error rate thresholds (e.g., reject rate > 2% -> fail job)
  • Batch-level stop conditions (e.g., missing required column -> fail-fast)
  • Operational retry/backoff limits (e.g., 3 attempts, exponential backoff up to 10 minutes)

Where to place rules in the pipeline

  • Ingest/landing: schema validation, format checks, basic type casting
  • Staging: data quality checks, deduplication, business rule validation
  • Dimensional/serving: referential integrity resolution, upsert conflict handling

Worked examples

Example 1: Invalid emails and missing customer_id

  • Condition: customer_id is null OR email invalid format
  • Scope: record-level
  • Severity: error
  • Action: reject to quarantine storage with masked email; increment counters
  • Batch rule: if rejected_records_rate > 2%, fail job and alert owner
Why this works

Bad identifiers break joins; invalid emails are often PII-sensitive and typically non-critical unless required for downstream ops. Rate threshold prevents silently ingesting a broken feed.

Example 2: Duplicate order_id within a day

  • Condition: multiple records share the same order_id
  • Scope: record-level resolved via grouping
  • Action: deduplicate by keeping the record with the latest updated_at; log duplicates metric
  • Severity: warning unless duplicates > 0.5% of batch, then error with alert
Why this works

Many sources emit corrections. A last-write-wins policy plus metrics balances freshness and safety.

Example 3: Missing dimension row (late-arriving customer)

  • Condition: orders.customer_id not found in dim_customer
  • Scope: record-level
  • Action: route to suspense table; schedule retries for up to 72 hours
  • Escalation: after expiry, move to dead-letter with incident ticket
  • Severity: error
Why this works

Late-arriving dimensions are common. Suspense and retry preserve integrity without blocking all orders.

Example 4: Target load primary key conflict

  • Condition: insert violates PK on target
  • Scope: batch-level decision, record-level resolution
  • Action: switch to upsert/merge; if conflict persists for the same key within the same run, mark critical and fail to avoid loops
  • Severity: error -> critical on repeat
Why this works

Upserts handle legitimate updates; repeated conflicts often indicate mapping bugs that should stop the pipeline.

Step-by-step: define your rules

  1. List source contracts: required fields, types, keys, expected volumes.
  2. Enumerate failure modes: nulls, bad formats, duplicates, missing refs, schema changes, load conflicts, timeouts.
  3. Choose actions per failure mode with severities and thresholds.
  4. Decide retry and backoff for transient errors; define cutoffs.
  5. Specify logging, metrics, and PII masking.
  6. Document owners and alert paths.
  7. Pilot on a historical sample; tune thresholds; finalize.

Exercises

Do the exercise below, then compare with the sample solution.

Exercise 1 (mirrors the test exercise ex1)
Dataset: orders(order_id, customer_id, email, order_date, updated_at, total_amount)
  • Define 6–8 error handling rules covering data quality, duplicates, ref integrity, and operational failures.
  • Include condition, severity, action, threshold (if any), and alerting.
Checklist before you move on
  • Every rule has a clear condition and action
  • You set at least one batch-level threshold
  • PII is masked in logs
  • Idempotent behavior is described (dedupe or upsert)
  • Owners and metrics are assigned

Common mistakes and how to self-check

  • Vague conditions: replace "bad email" with an explicit regex or validator.
  • Only record-level rules: add batch thresholds to catch feed-wide issues.
  • No retry cutoffs: infinite retries hide real failures; set max attempts and escalate.
  • Unmasked logs: ensure sensitive fields are hashed or partially masked.
  • Missing owners: every actionable alert should route to a person/team.
  • Non-idempotent retries: verify that reprocessing won’t duplicate records.
Self-check prompts
  • Can a second run of the same batch produce duplicate outputs?
  • What happens if 30% of records fail validation?
  • Would an on-call engineer know what to do from the alert?

Practical projects

  • Build a quarantine-and-report step for a sample CSV feed; generate a daily error summary with counts by rule and top offenders.
  • Implement a suspense table for missing dimension keys with a scheduled retry; auto-close items that resolve.
  • Create a rule configuration file (YAML/JSON) and a small validator that enforces those rules on a sample dataset.

Who this is for

  • ETL Developers and Data Engineers defining pipeline quality gates
  • Analysts/DBAs tasked with monitoring data quality

Prerequisites

  • Basic SQL (joins, constraints, upsert/merge)
  • Understanding of data types and common file formats (CSV/JSON/Parquet)

Learning path

  • Start: Source profiling and schema validation basics
  • Then: Define record-level rules and batch thresholds
  • Next: Implement quarantine, suspense, and retry patterns
  • Finally: Add metrics, dashboards, and alerts

Next steps

  • Finish the exercise and take the quick test
  • Add your rules to a config file and validate on a real sample
  • Review rules with stakeholders and refine thresholds

Mini challenge

You ingest a daily customers feed. Today, 15% of rows miss last_name, and a new optional column marketing_opt_in appeared. Draft two rules: one batch-level decision for the spike in missing last_name, and one schema rule for the new column. Specify severity, action, and alerts.

Practice Exercises

1 exercises to complete

Instructions

Given dataset orders(order_id, customer_id, email, order_date, updated_at, total_amount):

  • Write 6–8 rules with: Condition, Severity, Action, Threshold (if any), Alerting, Owner.
  • Cover data quality (nulls, formats), duplicates, referential integrity, schema change, load conflicts, and operational errors.
  • Assume PII masking for email in logs.
Expected Output
A clear set of rules. Example format: ER-001 Missing customer_id -> error -> reject to quarantine; batch fail if >2% rejected; alert data-oncall; owner: Data Eng. ER-002 Invalid email format -> warning -> reject, no fail unless >5% in batch. ER-003 Duplicate order_id -> dedupe keep latest updated_at; warning unless >0.5% then error + alert. ER-004 order_date in future >24h -> error -> reject; track metric. ER-005 Missing customer in dim_customer -> suspense + retry 72h; escalate if unresolved. ER-006 Target PK conflict -> use upsert; if same key conflicts twice in run -> critical fail. ER-007 Schema missing required column order_id -> critical -> fail-fast. ER-008 Ingest timeout -> retry 3x exponential backoff; if still failing -> error + alert.

Defining Error Handling Rules — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Defining Error Handling Rules?

AI Assistant

Ask questions about this tool