luvv to helpDiscover the Best Free Online Tools
Topic 8 of 8

Deployment Practices Basics

Learn Deployment Practices Basics for free with explanations, exercises, and a quick test (for ETL Developer).

Published: January 11, 2026 | Updated: January 11, 2026

Who this is for

ETL Developers, Data Engineers, and Analytics Engineers who need to ship reliable data pipelines to dev, test, and production without breaking dashboards or downstream jobs.

Prerequisites

  • Basic ETL/ELT knowledge (ingest, transform, load)
  • Comfort with version control (e.g., Git)
  • Familiarity with scheduling/orchestration concepts (e.g., cron, DAGs)

Why this matters

In real ETL work, changes can impact production data, SLAs, and business reporting. Good deployment practices prevent broken pipelines, bad data, and on-call pages. Typical tasks include:

  • Promoting a new transformation from dev to production safely
  • Deploying a schema change without breaking downstream consumers
  • Running backfills with guardrails and rollbacks
  • Configuring secrets and environment-specific settings securely

Concept explained simply

Deployment is how your ETL changes move from your laptop to production. You package your change, test it, roll it out in stages, watch it closely, and have a plan to roll it back if something goes wrong.

Mental model

Think of deployment like a controlled bridge crossing for data. Gates check your cargo (tests), speed limits keep you safe (phased rollout), and there is a turn-around lane (rollback) if you spot a hazard.

Core concepts to know

  • Environments: dev, staging/test, production
  • Configuration and secrets: environment variables, secret managers
  • Packaging and versioning: artifacts, semantic versions, immutability
  • CI/CD basics: automated build, test, deploy; approvals
  • Safe rollout patterns: canary, blue/green, feature flags
  • Idempotency and re-run safety
  • Schema migration strategy: backward compatibility, deprecation windows
  • Monitoring and alerting: smoke tests, data quality checks, rollbacks
  • Runbooks and change management

Worked examples

Example 1: Config-driven job across dev, staging, prod

Goal: Same code, different configs per environment.

# config.dev.yaml
source_db: "postgres://dev-user@dev-db:5432/app"
warehouse_schema: "staging_dev"
write_mode: "append"

# config.prod.yaml
source_db: "postgres://etl-user@prod-db:5432/app"
warehouse_schema: "staging_prod"
write_mode: "merge"

Deploy steps:

  1. Commit code and configs (without secrets) to repo.
  2. CI builds artifact with version tag (e.g., 1.4.0).
  3. CD deploys artifact to dev, injects CONFIG=config.dev.yaml.
  4. Run smoke job; if green, promote same artifact to staging, then prod.

Benefit: One artifact, many environments via config only.

Example 2: Safe Airflow DAG rollout with canary
  1. Create DAG v2 with a feature flag to write to a shadow table, not the main table.
  2. Deploy to staging, run a day of data, compare shadow vs expected counts and business metrics.
  3. In prod, run the DAG only for a small subset (e.g., one partition) as a canary.
  4. If canary matches thresholds, flip the flag to write to main table, monitor for 24 hours.
  5. Remove flag and shadow artifacts after validation window.
Example 3: Backward-compatible schema change

Goal: Add column currency_code without breaking readers.

  1. Phase 1 (Add): Add nullable column and write it, keep old readers working.
  2. Phase 2 (Dual-write): Populate both old and new logic; communicate timeline.
  3. Phase 3 (Deprecate): After consumers migrate, remove old column or logic in a later release.

Rollback: If metrics degrade, revert to previous artifact version and stop writing new column.

Step-by-step: a safe ETL deployment

  1. Prepare: Open a PR with description, risks, and rollback plan.
  2. Build: Package code into an immutable artifact and tag version (e.g., 1.5.2).
  3. Validate: Run unit tests, SQL linting, and data contracts checks.
  4. Deploy to staging: Apply config for staging, run smoke tests.
  5. Canary in prod: Run for a narrow scope (one partition/date).
  6. Promote: Expand to full prod if canary passes thresholds.
  7. Monitor: Track SLAs, row counts, null rates, business KPIs; keep a rollback window.
What should a smoke test include?
  • Row count is above a minimum and below a sanity maximum
  • No unexpected nulls in key columns
  • Uniqueness for primary keys or natural keys holds
  • Job runtime under agreed threshold
What to log and alert on
  • Start/end times, record counts in/out
  • Error details with correlation IDs
  • Data quality check results and thresholds
  • Backfill progress and partial failures

Exercises

Do these to build muscle memory. You can take the quick test anytime. Note: everyone can take the test; only logged-in users get saved progress.

Exercise 1: Draft a deployment plan for a new dimension table

Create a short, actionable plan to add a new dim_channel table.

  • Environments and sequence
  • Smoke tests and monitoring
  • Rollback steps

Exercise 2: Write env-specific configs and a parameterized run command

Produce dev and prod configs and a command that selects the right config via an environment variable.

Checklist before you move on
  • Plan includes canary or phased rollout
  • Smoke tests have measurable thresholds
  • Rollback references a specific artifact version
  • No secrets committed to files; referenced via environment or a secret manager

Common mistakes and self-check

  • Pushing directly to prod without staging validation. Self-check: Is there a staging pass artifact?
  • Missing idempotency causing duplicate rows on retries. Self-check: Can you safely rerun yesterday’s partition?
  • Hard-coded configs inside code. Self-check: Can you switch environments by changing only variables?
  • No rollback plan. Self-check: Which version will you revert to and how fast?
  • Unmonitored backfills. Self-check: Are there progress logs and partial-failure alerts?

Practical projects

  • Build a small pipeline that reads a CSV, transforms it, and loads to a warehouse schema using env-specific configs and a canary run.
  • Implement a data quality check suite (row counts, null checks) and fail the deployment if checks exceed thresholds.
  • Create a rollback script that reverts a target table to the last successful snapshot.

Mini challenge

You must add a new required column to a prod table used by multiple teams. How do you deploy without breaking readers this week? Write a 5-step plan that includes backward compatibility, communication, and a removal timeline.

Learning path

  • Start: Deployment basics (this page)
  • Next: Data quality gates and contracts
  • Then: Backfills and reprocessing strategies
  • Later: Infra-as-code and containerization fundamentals

Next steps

  • Complete the exercises and keep your notes
  • Take the quick test to check understanding
  • Apply these steps to your next real deployment with a peer review

Practice Exercises

2 exercises to complete

Instructions

Create a concise deployment plan to add dim_channel to your warehouse.

  • Include: dev → staging → prod sequence
  • Define smoke tests with thresholds
  • Describe monitoring and rollback (artifact version and steps)
Expected Output
A step-by-step plan with phased rollout, smoke tests (row counts, null checks), monitoring plan, and a rollback to a specific previous artifact.

Deployment Practices Basics — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Deployment Practices Basics?

AI Assistant

Ask questions about this tool