Quick outline (open)
- Why this matters
- Concept explained simply
- Mental model
- Worked examples
- Checklist before you ship
- Exercises
- Common mistakes
- Practical projects
- Who this is for
- Prerequisites
- Learning path
- Next steps
- Mini challenge
Why this matters
As a Data Engineer, you ship changes that can affect dashboards, machine learning features, and critical reports. CI/CD makes your data pipelines reliable by:
- Blocking schema-breaking changes before they reach production.
- Automatically testing transformations and DAGs on every pull request.
- Promoting artifacts safely across dev → stage → prod with data quality gates.
- Providing fast rollbacks if a deployment impacts SLAs.
Concept explained simply
CI (Continuous Integration) automatically checks your code every time you push: it installs dependencies, lints, runs unit tests, validates DAGs/SQL, and compiles your project. CD (Continuous Delivery/Deployment) packages your pipeline, deploys it to environments, runs smoke tests with real infrastructure, and promotes only if checks pass.
Mental model
Imagine a factory assembly line with gates:
- Gate 1 (CI): Is the part built correctly? (lint, unit tests, compile, DAG check)
- Gate 2 (CD - Dev): Does it run on real machines? (container build, deploy to dev)
- Gate 3 (CD - Stage): Does data look healthy on sample/limited scope? (smoke and data quality checks)
- Gate 4 (CD - Prod): Limited canary, observe metrics, then full rollout. Rollback ready.
Worked examples
Example 1 — Minimal CI for a dbt + Python pipeline
This CI runs on every pull request:
name: ci
on: [pull_request]
jobs:
build-test:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.10'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Lint
run: ruff check .
- name: Unit tests
run: pytest -q tests/unit
- name: Validate dbt project
run: |
dbt deps
dbt compile
- name: Validate Airflow DAGs
run: python -m pyflakes dags || true
What this catches: style errors, failing unit tests, SQL compile errors, and obvious DAG import errors before merging.
Example 2 — CD with environment promotion and data gates
This CD builds a Docker image, deploys to dev, then stage with smoke tests, then promotes to prod after a canary:
name: cd
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build image
run: docker build -t registry/pipeline:${GITHUB_SHA} .
- name: Push image
run: docker push registry/pipeline:${GITHUB_SHA}
deploy-dev:
needs: build
runs-on: ubuntu-latest
steps:
- name: Deploy to dev
run: ./infra/deploy.sh dev registry/pipeline:${GITHUB_SHA}
- name: Dev smoke tests
run: python checks/smoke.py --env dev
deploy-stage:
needs: deploy-dev
runs-on: ubuntu-latest
steps:
- name: Deploy to stage
run: ./infra/deploy.sh stage registry/pipeline:${GITHUB_SHA}
- name: Data quality gate
run: python checks/data_quality.py --env stage --threshold 0.98
deploy-prod:
needs: deploy-stage
runs-on: ubuntu-latest
steps:
- name: Canary to prod (10%)
run: ./infra/deploy.sh prod registry/pipeline:${GITHUB_SHA} --scope canary
- name: Observe metrics
run: python checks/observe.py --env prod --minutes 15 --error-rate-threshold 0.01
- name: Full rollout
run: ./infra/deploy.sh prod registry/pipeline:${GITHUB_SHA} --scope full
Key idea: each step is conditional on the previous passing, with simple numeric thresholds for quality gates.
Example 3 — Safe schema change with backward compatibility
Scenario: adding a non-nullable column to a fact table.
- Step 1: Add column as nullable with default; write code to populate it; keep consumers reading old schema.
- Step 2: Backfill in batches; monitor null rate and row counts.
- Step 3: Switch consumers to new column behind a feature flag; keep dual-write temporarily.
- Step 4: After stability, enforce NOT NULL; remove old paths.
- Rollback plan: If metrics dip, revert consumers to old column, stop backfill, and remove feature flag.
Checklist before you ship
- [ ] Git branch uses a clear naming convention (e.g., feature/, fix/).
- [ ] Lint, unit tests, and project compilation pass in CI.
- [ ] Data quality checks exist for critical models (row counts, null rates, freshness).
- [ ] Deploy scripts are idempotent and can be re-run safely.
- [ ] Secrets are injected via environment variables or secret manager, not hardcoded.
- [ ] Rollback plan is documented and tested on a non-prod env.
- [ ] Observability: basic metrics and alerts are configured (failures, latency, data volume).
Exercises
Complete these hands-on tasks. You can check solutions below each exercise. Your progress in the quick test is available to everyone; only logged-in users will have their progress saved.
Exercise 1 — Write a minimal CI workflow
Create a YAML CI workflow that:
- Runs on pull requests.
- Sets up Python 3.10 and installs requirements.txt.
- Runs a linter (ruff) and unit tests (pytest on tests/unit).
- Validates a dbt project (dbt deps + dbt compile).
Show solution
name: ci
on: [pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.10'
- run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- run: ruff check .
- run: pytest -q tests/unit
- run: |
dbt deps
dbt compile
Exercise 2 — Plan a safe prod rollout
Write a step-by-step promotion plan for a pipeline change that adds a new dimension column used by downstream dashboards. Include testing, data gates, canary scope, observation window, and rollback triggers.
Show solution
- Dev: deploy, run unit tests + dev smoke (sample run), verify logs.
- Stage: deploy, backfill small subset, validate row counts and null rate < 2%.
- Prod canary: enable on 10% partitions; monitor 15 minutes for failure rate < 1% and volume within ±5% baseline.
- Full rollout: expand to 100% after metrics are stable.
- Rollback: if thresholds breach, revert image/tag, disable feature flag, and restore previous DAG version.
Common mistakes and self-check
Skipping data checks in CI/CD
Fix: Add at least row count and null rate checks for critical tables. Treat thresholds as gates, not warnings.
Breaking consumers with schema changes
Fix: Use backward-compatible changes first (nullable + default), dual-write, then enforce constraints later.
No rollback path
Fix: Keep previous container tag and DAG version. Test rollback in stage before prod rollout.
Hardcoded secrets
Fix: Use environment variables injected by a secret manager. Never commit keys to the repo.
Practical projects
- Build-and-test: Create a repo with a small dbt model and a Python UDF. Add CI that lints, tests, and compiles.
- Env promotion: Package a simple pipeline in Docker, deploy to dev and stage with a smoke test script.
- Data gate: Write a small checker that fails if null rate exceeds 1% and wire it as a CD gate.
Who this is for
- Aspiring and junior Data Engineers who want reliable deployments.
- Analytics Engineers adding automation to transformations.
- Engineers moving from ad-hoc scripts to production-grade pipelines.
Prerequisites
- Basic Git (commit, branch, PR).
- Comfort with Python or SQL-based transformations.
- Familiarity with a workflow tool (e.g., Airflow) or a transformation tool (e.g., dbt).
Learning path
- Set up CI: lint + unit tests + compile checks.
- Add data checks: row counts, null rates, freshness for key tables.
- Introduce CD: build artifacts, deploy to dev, then stage.
- Add gates and canary release for prod.
- Document rollback and practice it in non-prod.
Next steps
- Implement at least one CI workflow in your repository this week.
- Add one data quality gate to your stage environment.
- Take the quick test below to check your understanding. Note: anyone can take the test; only logged-in users will have their progress saved.
Mini challenge
Your team needs to add a new required column to a high-traffic table. Design a CI/CD plan that avoids downtime. Include test coverage, promotion steps, canary, metrics to watch, and rollback. Write it in 6–10 bullet points.