Why this matters
In real ETL pipelines, not every record is clean. Some violate schemas, fail validation rules, or arrive too early. Reject handling and quarantine tables let you:
- Prevent bad data from contaminating production fact and dimension tables.
- Capture failed records with reasons for audit and reprocessing.
- Support SLAs: deliver what is valid on time, while triaging the rest.
- Enable targeted fixes: data stewards can correct bad records and replay them.
Typical ETL Developer tasks this enables: routing invalid rows, maintaining reason codes, building reprocessing jobs, and reporting on data quality trends.
Who this is for
- ETL Developers and Data Engineers building batch or streaming pipelines.
- Analytics engineers and DBAs responsible for data integrity.
- Data stewards who need traceable error capture and repair flows.
Prerequisites
- Comfort with SQL DDL/DML and basic data modeling.
- Understanding of ETL/ELT stages: landing, staging, transform, load.
- Familiarity with validation rules (nullability, referential integrity, type checks).
Concept explained simply
Think of your pipeline like airport security: valid passengers board; others are held for extra checks. A quarantine table is the holding area for records that fail validation. You store the record, the reason, and the context to fix it later.
- Reject: the act of withholding a record from target tables.
- Quarantine table: a table where rejected records are stored with metadata.
- Reprocess: after fixing either data or rules, move the record from quarantine to the target.
Mental model
Pipeline = conveyor belt. At each checkpoint, you stamp a pass or a reason_code. Passed items continue; failed items go into a labeled bin (quarantine) with a note. A later process reads the bins, fixes items, and puts them back on the belt at the right spot.
Core patterns you should know
- Row-level rejects vs. file-level quarantine: sometimes one row fails; sometimes the entire file is unreadable.
- Soft vs. hard validation: soft (e.g., unknown enum) may be fixable; hard (e.g., invalid JSON) requires remediation at source.
- Reason codes taxonomy: concise, stable codes like RI_MISSING_PARENT, TYPE_MISMATCH, BUSINESS_RULE_VIOLATION.
- Idempotent reprocessing: ensure a record loaded after fix does not double-insert.
- Traceability: always carry batch_id, load_ts, source_file, checksum, and a raw_payload snapshot.
Designing quarantine tables (practical schema)
Include these columns to support audit, troubleshooting, and reprocessing:
- surrogate_id (auto), batch_id, load_ts
- source_system, source_entity, source_file, record_number
- reason_code, reason_details
- raw_payload (string/JSON) and/or normalized columns for key fields
- hash or natural keys (e.g., business_key, external_id)
- triage_status (NEW, FIXED, REPROCESSED, DISCARDED)
- triage_ts, triage_user
Example DDL (generic SQL)
CREATE TABLE dq_quarantine_orders (
surrogate_id BIGINT GENERATED ALWAYS AS IDENTITY,
batch_id VARCHAR(64) NOT NULL,
load_ts TIMESTAMP NOT NULL,
source_system VARCHAR(64),
source_entity VARCHAR(64),
source_file VARCHAR(512),
record_number BIGINT,
reason_code VARCHAR(64) NOT NULL,
reason_details VARCHAR(1000),
business_key VARCHAR(128),
external_id VARCHAR(128),
raw_payload TEXT,
triage_status VARCHAR(32) DEFAULT 'NEW',
triage_ts TIMESTAMP,
triage_user VARCHAR(128),
PRIMARY KEY (surrogate_id)
);
-- Recommended indexes
-- CREATE INDEX ix_q_orders_batch ON dq_quarantine_orders(batch_id);
-- CREATE INDEX ix_q_orders_reason ON dq_quarantine_orders(reason_code);
-- CREATE INDEX ix_q_orders_status ON dq_quarantine_orders(triage_status);
Partitioning tips
- Partition by load_date or load_ts for pruning.
- Optionally sub-partition by reason_code for fast triage queries.
- Keep raw_payload for full fidelity; compress if supported.
Worked examples
Example 1: Email validation for Customers
- Rule: email must match simple pattern and be non-null if contact_opt_in = true.
- Process: SELECT valid rows to dim_customer; INSERT invalid rows into dq_quarantine_customers with reason_code EMAIL_INVALID or EMAIL_REQUIRED.
- Reprocess: corrected emails revalidated and moved into dim_customer.
What goes to quarantine?
- Null email with contact_opt_in = true → EMAIL_REQUIRED
- email without '@' or bad domain → EMAIL_INVALID
Example 2: Orders missing Customer
- Rule: orders.customer_id must exist in dim_customer.business_key.
- Process: LEFT JOIN to dim_customer; where match is null, route to dq_quarantine_orders with reason_code RI_MISSING_PARENT and include customer_id in raw_payload.
- Reprocess: once the customer dimension is updated, re-try those orders.
Example 3: Schema drift in event JSON
- Rule: required field event_type and numeric amount exist.
- Process: if JSON parse fails → JSON_INVALID; if required key missing → SCHEMA_REQUIRED_MISSING; if amount not numeric → TYPE_MISMATCH.
- Reprocess: update parser or mapping; replay quarantined raw_payloads.
Step-by-step setup
- List validation rules: nullability, types, ranges, referential integrity, business constraints.
- Define reason codes: short codes + human-friendly reason_details.
- Create quarantine tables: include metadata and raw_payload.
- Route logic: in SQL, use WHEN/CASE; in code, if/else; in streaming, dead-letter queues to quarantine sinks.
- Reprocessing job: reads NEW records, validates, loads to target, updates triage_status.
- Monitoring: counts by reason_code and batch; alert on spikes.
Operations: triage and SLAs
- Daily review of NEW quarantine counts and top reason_codes.
- Document playbooks for each reason_code.
- Set auto-expiry/discard policy for records that cannot be fixed after X days (compliance permitting).
- Audit trail: keep who fixed what, when, and why.
Exercises
These are hands-on. You can complete them in any SQL environment or on paper. The Quick Test is available to everyone; only logged‑in users get saved progress.
Exercise 1: Design a quarantine table
See the Exercises section below for full instructions and a solution you can expand.
Exercise 2: Write routing logic
Implement how records are split between target and quarantine with reason codes.
- I defined clear reason codes and mapped them to validation rules.
- My quarantine table captures batch_id, timestamps, and raw_payload.
- I planned a reprocessing workflow with idempotency.
Common mistakes and how to self-check
- Missing context: Only storing a couple of columns. Self-check: Can someone fix the record from quarantine alone?
- No reason taxonomy: Free-text reasons make reporting impossible. Self-check: Are reason codes standardized and documented?
- One-way street: No reprocessing path. Self-check: Is there a job to retry after fixes, marking records REPROCESSED?
- Duplicates on replay: Not using keys/hashes. Self-check: Are loads idempotent with natural keys or hashes?
- Over-quarantining: Failing records for soft rules that could be defaulted. Self-check: Can some rules become defaults with flags?
Practical projects
- Build a customer ingestion pipeline with email and country code validation, quarantine invalid rows, and a retry job.
- Create a dashboard showing daily quarantine volumes by reason_code and source_system.
- Implement a file-level quarantine for unreadable files with a manifest table and remediation notes.
Learning path
- Before: Data profiling and validation rules.
- Now: Reject handling and quarantine tables (this lesson).
- After: Automated reprocessing, alerting, and DQ SLAs. Then progress to data lineage and observability.
Next steps
- Finalize your reason code taxonomy and share with your team.
- Add partitioning and indexes to your quarantine tables.
- Schedule the reprocessing job and simulate a spike to test alerts.
Quick Test
Take the Quick Test below to check your understanding. Everyone can take it; only logged-in users will see saved progress.
Mini challenge
You ingest 2 million order lines daily. Today, 8% land in quarantine with reason_code TYPE_MISMATCH for amount. What do you do in the next 60 minutes? Write a short triage plan with: checks to run, who to notify, what to temporarily relax (if safe), and how to prevent duplicates on replay.