Who this is for
ETL Developers, Data Engineers, and Analytics Engineers who need to ship reliable data pipelines to dev, test, and production without breaking dashboards or downstream jobs.
Prerequisites
- Basic ETL/ELT knowledge (ingest, transform, load)
- Comfort with version control (e.g., Git)
- Familiarity with scheduling/orchestration concepts (e.g., cron, DAGs)
Why this matters
In real ETL work, changes can impact production data, SLAs, and business reporting. Good deployment practices prevent broken pipelines, bad data, and on-call pages. Typical tasks include:
- Promoting a new transformation from dev to production safely
- Deploying a schema change without breaking downstream consumers
- Running backfills with guardrails and rollbacks
- Configuring secrets and environment-specific settings securely
Concept explained simply
Deployment is how your ETL changes move from your laptop to production. You package your change, test it, roll it out in stages, watch it closely, and have a plan to roll it back if something goes wrong.
Mental model
Think of deployment like a controlled bridge crossing for data. Gates check your cargo (tests), speed limits keep you safe (phased rollout), and there is a turn-around lane (rollback) if you spot a hazard.
Core concepts to know
- Environments: dev, staging/test, production
- Configuration and secrets: environment variables, secret managers
- Packaging and versioning: artifacts, semantic versions, immutability
- CI/CD basics: automated build, test, deploy; approvals
- Safe rollout patterns: canary, blue/green, feature flags
- Idempotency and re-run safety
- Schema migration strategy: backward compatibility, deprecation windows
- Monitoring and alerting: smoke tests, data quality checks, rollbacks
- Runbooks and change management
Worked examples
Example 1: Config-driven job across dev, staging, prod
Goal: Same code, different configs per environment.
# config.dev.yaml
source_db: "postgres://dev-user@dev-db:5432/app"
warehouse_schema: "staging_dev"
write_mode: "append"
# config.prod.yaml
source_db: "postgres://etl-user@prod-db:5432/app"
warehouse_schema: "staging_prod"
write_mode: "merge"
Deploy steps:
- Commit code and configs (without secrets) to repo.
- CI builds artifact with version tag (e.g., 1.4.0).
- CD deploys artifact to dev, injects
CONFIG=config.dev.yaml. - Run smoke job; if green, promote same artifact to staging, then prod.
Benefit: One artifact, many environments via config only.
Example 2: Safe Airflow DAG rollout with canary
- Create DAG v2 with a feature flag to write to a shadow table, not the main table.
- Deploy to staging, run a day of data, compare shadow vs expected counts and business metrics.
- In prod, run the DAG only for a small subset (e.g., one partition) as a canary.
- If canary matches thresholds, flip the flag to write to main table, monitor for 24 hours.
- Remove flag and shadow artifacts after validation window.
Example 3: Backward-compatible schema change
Goal: Add column currency_code without breaking readers.
- Phase 1 (Add): Add nullable column and write it, keep old readers working.
- Phase 2 (Dual-write): Populate both old and new logic; communicate timeline.
- Phase 3 (Deprecate): After consumers migrate, remove old column or logic in a later release.
Rollback: If metrics degrade, revert to previous artifact version and stop writing new column.
Step-by-step: a safe ETL deployment
- Prepare: Open a PR with description, risks, and rollback plan.
- Build: Package code into an immutable artifact and tag version (e.g., 1.5.2).
- Validate: Run unit tests, SQL linting, and data contracts checks.
- Deploy to staging: Apply config for staging, run smoke tests.
- Canary in prod: Run for a narrow scope (one partition/date).
- Promote: Expand to full prod if canary passes thresholds.
- Monitor: Track SLAs, row counts, null rates, business KPIs; keep a rollback window.
What should a smoke test include?
- Row count is above a minimum and below a sanity maximum
- No unexpected nulls in key columns
- Uniqueness for primary keys or natural keys holds
- Job runtime under agreed threshold
What to log and alert on
- Start/end times, record counts in/out
- Error details with correlation IDs
- Data quality check results and thresholds
- Backfill progress and partial failures
Exercises
Do these to build muscle memory. You can take the quick test anytime. Note: everyone can take the test; only logged-in users get saved progress.
Exercise 1: Draft a deployment plan for a new dimension table
Create a short, actionable plan to add a new dim_channel table.
- Environments and sequence
- Smoke tests and monitoring
- Rollback steps
Exercise 2: Write env-specific configs and a parameterized run command
Produce dev and prod configs and a command that selects the right config via an environment variable.
Checklist before you move on
- Plan includes canary or phased rollout
- Smoke tests have measurable thresholds
- Rollback references a specific artifact version
- No secrets committed to files; referenced via environment or a secret manager
Common mistakes and self-check
- Pushing directly to prod without staging validation. Self-check: Is there a staging pass artifact?
- Missing idempotency causing duplicate rows on retries. Self-check: Can you safely rerun yesterday’s partition?
- Hard-coded configs inside code. Self-check: Can you switch environments by changing only variables?
- No rollback plan. Self-check: Which version will you revert to and how fast?
- Unmonitored backfills. Self-check: Are there progress logs and partial-failure alerts?
Practical projects
- Build a small pipeline that reads a CSV, transforms it, and loads to a warehouse schema using env-specific configs and a canary run.
- Implement a data quality check suite (row counts, null checks) and fail the deployment if checks exceed thresholds.
- Create a rollback script that reverts a target table to the last successful snapshot.
Mini challenge
You must add a new required column to a prod table used by multiple teams. How do you deploy without breaking readers this week? Write a 5-step plan that includes backward compatibility, communication, and a removal timeline.
Learning path
- Start: Deployment basics (this page)
- Next: Data quality gates and contracts
- Then: Backfills and reprocessing strategies
- Later: Infra-as-code and containerization fundamentals
Next steps
- Complete the exercises and keep your notes
- Take the quick test to check understanding
- Apply these steps to your next real deployment with a peer review