Topic Not Found

Who this is for

Data Engineers setting up reliable pipelines in orchestrators (Airflow, Prefect, Dagster, etc.).
Platform/ML Ops engineers who need safe promotion from dev to production.
Analytic engineers who run jobs that touch sensitive or business-critical data.

Prerequisites

Basic understanding of job orchestration (DAGs/flows, schedules, retries).
Familiarity with configuration files (YAML/JSON), environment variables, and secrets.
Git basics: branches, tags, pull requests, and CI/CD concepts.

Why this matters

In real teams, pipelines evolve constantly. A single bad change can break dashboards, alerts, or payments. Separating environments (dev, stage, prod) reduces blast radius, enables safe testing with realistic data, and creates a repeatable promotion path.

Typical professional tasks:

Develop a new DAG in dev with small sample data and mocked services.
Validate schema changes and data quality in stage with production-like schemas and masked data.
Promote to prod with a controlled rollout, monitoring, and rollback plan.

Concept explained simply

Environment separation means you run the same pipeline in multiple places with different risk levels. Dev is for building and quick iteration. Stage (a.k.a. test/preprod) is for realistic checks. Prod is for real business data and users.

Mental model

Think of three lanes on a highway:

Dev lane: slow, short distance, easy to pull over. Lots of debugging.
Stage lane: medium speed, almost same road conditions as the highway, but still safe to stop.
Prod lane: fast, high traffic, crashes are costly—so you enter only after passing the first two lanes.

The car (your code artifact) should be the same when moving lanes; you only change the road signs (configuration and connections).

Key principles and patterns

Same artifact, different config: build once (e.g., container image, wheel) and promote; inject environment-specific config at runtime.
Isolate resources: separate secrets, storage, queues, and databases per environment.
Least privilege: service accounts and connections with minimal rights per environment.
Data protection: stage uses production-like schemas with masked/anonymized data.
Deterministic promotion: pass checks in dev, then stage, then prod with gates (tests, approvals).
Observable rollouts: metrics, logs, and alerts per environment; tag runs with env labels.

Worked examples

Example 1: Environment-aware Airflow DAG

Goal: One DAG definition runs in dev, stage, prod with different connections and parameters.

# Pseudocode (concept applies to most orchestrators)
import os
ENV = os.getenv("DEPLOY_ENV", "dev")  # dev|stage|prod

# Use env-specific connection IDs (created in each Airflow env)
SRC_CONN = f"postgres_{ENV}"
DST_CONN = f"warehouse_{ENV}"

# Pick buckets/tables by suffix
RAW_BUCKET = f"raw-data-{ENV}"
TABLE_SUFFIX = f"_{ENV}"

# Guardrails
if ENV == "prod":
    retries = 3
    sla = "30m"
else:
    retries = 0
    sla = None

# DAG tasks use SRC_CONN/DST_CONN and suffixes

Outcome: One codebase; differences live in connections, variables, and runtime environment variables.

Example 2: dbt targets for dev/stage/prod

# profiles.yml excerpt
my_project:
  target: dev
  outputs:
    dev:
      type: postgres
      host: dev.db
      schema: analytics_dev
      user: svc_dev
    stage:
      type: postgres
      host: stage.db
      schema: analytics_stage
      user: svc_stage
    prod:
      type: postgres
      host: prod.db
      schema: analytics
      user: svc_prod

Promotion flow: run dbt in stage with masked prod-like data and full tests. Only promote to prod if tests pass.

Example 3: Secrets, accounts, and naming

Secrets store: separate paths per env, e.g., secret/data/dev/... vs secret/data/prod/...
Service accounts: sa-etl-dev, sa-etl-stage, sa-etl-prod with least privilege.
Naming strategy: buckets/tables: raw_dev, raw_stage, raw.

Deployment readiness checklist

Config injection: environment variables or config files select connections and destinations.
Secrets: stored outside code; rotated per environment.
Data: stage has production-like schemas and masked records.
Observability: logs, metrics, and alerts labeled by environment.
Rollback: a clear path to revert to the previous artifact/tag in prod.

Exercises

Exercise 1 — Design environment-aware config

Create a minimal config that parameterizes connections, storage, and output tables for dev, stage, and prod. Use keys: src_conn, dst_conn, bucket, table_suffix, and retries.

When done, self-check:

The artifact (code) remains identical across environments.
No secret values appear in the file; only references or connection IDs.
Prod has stricter settings (more retries, SLAs).

Exercise 2 — Plan a safe promotion

Draft a promotion plan from dev to stage to prod for a new pipeline that loads orders data. Include: gates (tests/approvals), observability checks, and rollback steps.

When done, self-check:

Stage validation uses production-like schemas and masked data.
Promotion uses a single built artifact (tag) through all envs.
Rollback is specific and quick (revert tag, disable new DAG run).

Common mistakes and how to self-check

Branching drift: building different binaries for each environment. Fix: build once; promote the same tag.
Hard-coded secrets: credentials inside code. Fix: use secret manager or orchestrator connections.
Stage unlike prod: toy schemas or volumes in stage. Fix: mirror schemas; mask or subset data carefully.
Skipping gates: promoting without tests. Fix: enforce CI/CD checks and approvals.
No rollback: unclear reversion steps. Fix: document revert-to-previous-tag and disable/stop procedures.

Practical projects

Project 1: Convert a single-environment DAG into a dev/stage/prod setup with per-env connections and buckets.
Project 2: Implement a canary release for a transformation job (run in prod on 5% of partitions before 100%).
Project 3: Add data quality tests in stage and block promotion on failure; include rollback to previous artifact.

Learning path

Introduce runtime configuration (env vars, YAML) and secrets separation.
Add per-environment connections and resource names.
Create stage with production-like schemas and masked data.
Set up CI checks: unit tests, linting, dbt/SQL tests, and dry-runs.
Add promotion gates, approvals, and rollback playbooks.
Introduce canary/blue-green rollouts and runtime monitoring.

Next steps

Automate your promotion pipeline so passing stage tests auto-opens a prod deployment with approval.
Standardize naming for resources across environments.
Document your rollback procedures and test them quarterly.

Quick Test: how it works

You can take the quick test below any time. Everyone can use it for free; only logged-in users get saved progress.

Mini challenge

Your team wants to change a table schema (add a nullable column). Describe, in 5–7 bullet points, how you would roll this out across dev, stage, and prod with zero downtime and a rollback plan.

Menu

Environment Separation Dev Stage Prod

Table of Contents

Who this is for

Prerequisites

Why this matters

Concept explained simply

Mental model

Key principles and patterns

Worked examples

Deployment readiness checklist

Exercises

Exercise 1 — Design environment-aware config

Exercise 2 — Plan a safe promotion

Common mistakes and how to self-check

Practical projects

Learning path

Next steps

Quick Test: how it works

Mini challenge

Practice Exercises

Design environment-aware config

Instructions

Expected Output

Plan a safe promotion

Environment Separation Dev Stage Prod — Quick Test

Have questions about Environment Separation Dev Stage Prod?

AI Assistant