Who this is for
ETL Developers and Data Engineers who need reliable Dev/Stage/Prod environments to build, test, and release data pipelines without breaking production.
Prerequisites
- Basic ETL pipeline knowledge (sources, transforms, targets)
- Comfort using a code repo, environment variables, and a secrets store
- Familiarity with one orchestration tool (e.g., Airflow, ADF, Prefect) and one warehouse (e.g., Snowflake, BigQuery, Redshift)
Why this matters
Real tasks you will face:
- Spin up a Dev environment to safely test a new transformation.
- Promote a pipeline to Stage with realistic data volumes and credentials.
- Schedule in Prod with different windows, quotas, and access controls.
- Rotate secrets without code changes or outages.
- Roll back quickly if a release causes bad data.
Concept explained simply
Dev, Stage, Prod are separate lanes for the same car (your code artifact). The car stays the same; only the road signs change: credentials, endpoints, resource sizes, schedules, and feature flags. Your goal: same code, different configuration per lane.
Mental model
- Immutable code artifact: built once, promoted across environments.
- Mutable configuration: injected per environment at deploy/run time.
- Strict separation: no Dev secrets in Stage/Prod; no Prod data paths in Dev.
Pro tip: The 12-factor-style approach
- Store configuration in the environment or config files, not in code.
- Keep secrets in a managed secrets store.
- Build once, promote the same artifact.
Six core rules for reliable environments
- One codebase, many configs: parameterize everything that varies by environment.
- No shared credentials across environments: unique service principals/keys.
- Separate data paths: different schemas/buckets for each environment.
- Same DAG/job names pattern with suffixes/prefixes (e.g., ingest_customers_dev).
- Feature flags for risky or expensive logic (e.g., enable_backfill=false in Dev).
- Automated promotion: deploy the same artifact to Stage, then Prod, with approvals and smoke tests.
What to parameterize per environment
- Connection strings and secrets (sources, warehouse, message queues)
- Data locations (buckets, containers, database schemas)
- Schedules and concurrency limits
- Resource sizes (cluster size, node type, worker count)
- Feature flags (backfills, alerts, optional transforms)
- Access control (roles, service accounts)
Worked examples
Example 1: Airflow DAG using env-specific Variables and Connections
- Prefix Airflow Variables by environment: var.env = DEV/STAGE/PROD.
- Lookup endpoints and schema names by env key.
- Toggle a backfill flag to prevent heavy Dev runs.
Show sample configuration mapping
{
"DEV": {
"src_postgres_conn_id": "pg_dev",
"warehouse_conn_id": "snowflake_dev",
"target_schema": "analytics_dev",
"enable_backfill": false,
"schedule": "@hourly"
},
"STAGE": {
"src_postgres_conn_id": "pg_stage",
"warehouse_conn_id": "snowflake_stage",
"target_schema": "analytics_stage",
"enable_backfill": true,
"schedule": "0 * * * *"
},
"PROD": {
"src_postgres_conn_id": "pg_prod",
"warehouse_conn_id": "snowflake_prod",
"target_schema": "analytics",
"enable_backfill": true,
"schedule": "0/15 * * * *"
}
}
Example 2: dbt profiles.yml with per-environment targets
- One profile with targets dev, stage, prod.
- Use env vars to inject secrets at runtime.
- Schema naming: schema: analytics_dev, analytics_stage, analytics.
Show sample profiles.yml
my_project:
target: dev
outputs:
dev:
type: snowflake
account: ${SNOWFLAKE_ACCOUNT}
user: ${SNOWFLAKE_USER}
password: ${SNOWFLAKE_PASSWORD}
role: ANALYTICS_DEV
database: ANALYTICS
warehouse: DEV_WH
schema: analytics_dev
stage:
type: snowflake
account: ${SNOWFLAKE_ACCOUNT}
user: ${SNOWFLAKE_USER}
password: ${SNOWFLAKE_PASSWORD}
role: ANALYTICS_STAGE
database: ANALYTICS
warehouse: STAGE_WH
schema: analytics_stage
prod:
type: snowflake
account: ${SNOWFLAKE_ACCOUNT}
user: ${SNOWFLAKE_USER}
password: ${SNOWFLAKE_PASSWORD}
role: ANALYTICS
database: ANALYTICS
warehouse: PROD_WH
schema: analytics
Example 3: Azure Data Factory with Key Vault per environment
- Create three ADF instances or one with three linked Services parameterized by environment.
- Use separate Key Vaults: kv-dev, kv-stage, kv-prod.
- Parameterize Linked Service JSON to pull secrets from the active vault.
Show sample linked service parameterization
{
"name": "AzureSqlDatabase_ls",
"type": "LinkedService",
"properties": {
"type": "AzureSqlDatabase",
"typeProperties": {
"connectionString": "...;Initial Catalog=@{pipeline().globalParameters.db_name};...",
"password": {
"type": "AzureKeyVaultSecret",
"store": {"referenceName": "@{pipeline().globalParameters.kv_name}", "type": "LinkedServiceReference"},
"secretName": "sql-password"
}
}
}
}
Design steps: from blank slate to reliable environments
- Inventory what varies: list all endpoints, credentials, paths, schedules, resource sizes.
- Choose a config mechanism: environment variables, config files (YAML/JSON), or orchestrator variables.
- Centralize secrets in a managed store; reference them in config by key.
- Define naming patterns for schemas/buckets and job IDs per environment.
- Add feature flags for risky or costly tasks (e.g., enable_backfill, enable_alerts).
- Create promotion checks: automated tests, data smoke tests, and rollbacks.
Checklist: Ready for Stage?
- Code artifact is the same as Dev build.
- Stage credentials and roles are distinct and valid.
- Data paths point to stage-specific storage/schema.
- Schedules adjusted to moderate load.
- Smoke tests defined: row counts, freshness, null checks.
- Rollback plan documented.
Common mistakes and self-check
- Hardcoding endpoints in code. Self-check: Can you switch environments without editing code?
- Reusing Prod credentials in Dev. Self-check: Do service principals and keys differ per environment?
- Shared data paths. Self-check: Do schemas/buckets include env suffix/prefix?
- Promotion by rebuilding code. Self-check: Is the artifact identical across Stage/Prod?
- No smoke tests. Self-check: Do you have 2–3 automatic data validations per job?
- Lack of rollback. Self-check: Can you disable a release and revert config in minutes?
Practical projects
- Project 1: Convert an existing pipeline to use environment-based config with a secrets store. Add at least two feature flags.
- Project 2: Implement a promote-to-stage workflow with a smoke test DAG/job that validates row counts and freshness.
- Project 3: Build a blue/green config toggle for a target schema and practice flipping between them safely.
Practice: Exercises
Do these now. Your answers can be simple text/YAML. A sample solution is available for each.
Exercise 1: Author a three-environment config
Create a single config file that defines Dev/Stage/Prod parameters for a pipeline that reads Postgres and writes to Snowflake. Include: connection names, target schema, schedule, and a backfill flag. Keep secrets referenced via keys, not hardcoded.
Tip: Include naming patterns
- Schemas: analytics_dev, analytics_stage, analytics
- Connections: pg_dev/pg_stage/pg_prod and snowflake_dev/stage/prod
Exercise 2: Promotion and rollback plan
Write a short, step-by-step plan for promoting the same artifact from Dev to Stage, then Prod. Include: approvals, smoke tests, feature flags to toggle, and a rollback path.
Tip: Keep it concise
- 3–7 steps per environment
- Clearly state success criteria to proceed
Exercise checklist
- Parameters vary by environment without code edits
- No secrets are stored in the file; only references/keys
- Promotion steps include smoke tests and rollback
Learning path
- Start: Solidify parameterization with config files/env vars.
- Then: Add secrets management and rotate a secret without code changes.
- Next: Implement automated smoke tests and feature flags.
- Finally: Create an approval-based promotion flow and rollback routine.
Next steps
- Apply the exercises to one real pipeline at work or in a lab project.
- Add one more environment-specific constraint (e.g., stricter concurrency in Prod).
- Document your environment matrix in your repo to help teammates.
Mini challenge
In your current or sample project, add a single toggle enable_backfill. Keep it off in Dev/Stage and on in Prod. Prove it works by showing the scheduled run configuration per environment.
Quick Test
This short test is available to everyone. Only logged-in learners will see saved progress.