Why this matters
As a Data Platform Engineer, you will routinely create and maintain three environments: development (dev), staging (stage), and production (prod). Provisioning these with Infrastructure as Code (IaC) ensures consistency, safety, and repeatability across your data platform: storage, compute, networking, security, orchestration, and observability.
- Reduce risk: experiment safely in dev, validate in stage, and deploy with confidence to prod.
- Faster delivery: automated, idempotent provisioning removes manual drift and hidden differences.
- Compliance: clear separation of duties, access control, and auditable change history.
Concept explained simply
Think of dev, stage, and prod as three lanes of the same highway. The lanes must be parallel and consistent, but traffic rules (guardrails) tighten as you move from dev to prod. IaC encodes those lanes, their similarities (same modules) and their differences (parameters) so you can switch lanes safely.
- One set of reusable IaC modules
- Per-environment configuration (variables, tags, sizes, secrets)
- Per-environment state and credentials
- Promotion flow: dev → stage → prod with approvals
Key terms (open to review)
- Idempotent: running the same apply multiple times yields the same result.
- Parity: environments are as similar as possible; differences are explicit and intentional.
- Drift: real infrastructure differs from your IaC state.
Core building blocks for environment provisioning
- Parameterization: define environment-specific settings (names, sizes, tags, retention) via variables or maps.
- State isolation: each environment keeps its own state backend and locking to avoid cross-environment collisions.
- Credentials and RBAC: separate service principals/roles per environment with least privilege; no shared root keys.
- Naming conventions: predictable, unique names, e.g., app-env-region-### (raw-dev-euw-001).
- Secrets management: store secrets outside code (secret manager); IaC references secret IDs, not secret values.
- Guardrails: policies, budgets/quotas, and mandatory approvals for stage/prod changes.
- Validation gates: format, lint, validate, plan, and test in dev; run integration checks in stage; gated apply in prod.
- Observability: per-environment logging, metrics, and alerts with stricter SLOs in prod.
- Drift detection: scheduled plan (no-op) runs to detect config drift; investigate before promoting.
Worked examples
Example 1: Parameterized storage buckets across environments
Goal: Provision a raw data bucket per environment with consistent naming and different retention periods.
# variables.tf
variable "env" {}
variable "region" { default = "euw" }
locals {
cfg = {
dev = { suffix = "dev", retention_days = 7, versioning = false, tags = { env = "dev" } }
stage = { suffix = "stg", retention_days = 14, versioning = true, tags = { env = "stage" } }
prod = { suffix = "prd", retention_days = 90, versioning = true, tags = { env = "prod" } }
}[var.env]
bucket_name = "raw-${local.cfg.suffix}-${var.region}-001"
}
# resource pseudo-code (cloud-agnostic)
resource "storage_bucket" "raw" {
name = local.bucket_name
versioning = local.cfg.versioning
retention_days = local.cfg.retention_days
tags = local.cfg.tags
}
Run plans per environment by passing env=dev|stage|prod.
Example 2: Separate remote state per environment
Goal: Isolate states to avoid accidental cross-env changes.
# backend config (conceptual)
# dev backend
backend "remote" {
namespace = "platform/dev"
workdir = "raw"
}
# stage backend
backend "remote" {
namespace = "platform/stage"
workdir = "raw"
}
# prod backend
backend "remote" {
namespace = "platform/prod"
workdir = "raw"
}
Use separate state containers/buckets/namespaces, each with its own locking and access policy.
Example 3: CI/CD promotion with approval to prod
- On pull request: fmt, validate, lint, plan (dev).
- On merge to main: apply (dev), run smoke tests.
- If green: plan (stage), auto-apply (stage) and run integration tests.
- Require manual approval: plan (prod) must be reviewed; only then apply (prod).
This ensures the same code moves through environments with increasing safety checks.
Step-by-step: Provision dev, stage, prod
- Define a naming convention and tagging standard.
- Create per-environment variable maps for size, retention, SKUs, and toggles (e.g., versioning).
- Configure separate state backends and credentials for each environment.
- Build reusable modules for common platform components (storage, compute, networking, observability).
- Implement pipelines: validate → plan → apply, with manual approval for prod.
- Add drift checks (scheduled plan) and cost/budget alerts per environment.
Pre-flight checklist
- Unique, deterministic names for all resources per environment
- State isolation and locking configured
- Credentials and RBAC separated by environment
- Secrets retrieved from a secret manager (no secrets in code)
- Validation and tests defined for dev and stage
- Manual approval step required for prod applies
Common mistakes and self-check
- Mixing states: Self-check by listing state backends; each environment should have its own.
- Hard-coded names: Search code for literal env strings; replace with variables and maps.
- Shared credentials: Verify pipeline secrets; use distinct service principals/roles.
- No parity: Compare plans between dev and stage; non-intentional diffs indicate missing parameters.
- Skipping approvals: Ensure prod requires a manual gate with at least two reviewers.
- Secrets in code: Scan repos; rotate any exposed keys immediately and move to secret manager.
Practical projects
- Project 1: Provision raw/bronze/silver storage buckets, a compute cluster, and monitoring in dev, stage, prod using one module and per-environment variables.
- Project 2: Add feature toggles (e.g., versioning, lifecycle rules) controlled via env maps and verify plans reflect intended differences only.
- Project 3: Implement drift detection with a scheduled plan job and alert on non-empty diffs; practice resolving drift safely.
Exercises
- Exercise 1: Design an environment map and naming convention. Create a variable map for dev, stage, prod including suffix, retention_days, versioning, and tags. Define a naming pattern like app-env-region-###. Confirm that your plan generates distinct names and settings per environment.
- Exercise 2: Build a minimal IaC skeleton with separate state per environment. Add a simple resource (e.g., storage bucket) that reads env-specific values and stores state in distinct backends/namespaces. Run validate and plan for each env.
- Checklist: variables map created; names are unique; states isolated; prod requires approval; no secrets in code.
Note: The quick test is available to everyone; only logged-in users will have their progress saved.
Mini challenge
Extend your provisioning to support a short-lived preview environment on each pull request that mirrors stage at a smaller size, and auto-destroys on merge/close. Keep state isolated and tag all preview resources with an expiration timestamp.
Who this is for
- Data Platform Engineers setting up consistent, safe environments
- Data Engineers contributing IaC to shared platform modules
- Ops/SRE working on platform reliability and compliance
Prerequisites
- Basic IaC knowledge (variables, modules, state)
- Familiarity with your cloud providers storage, networking, and IAM
- Comfort with a CI/CD system (pipelines, approvals)
Learning path
- Start: IaC fundamentals (format, validate, plan/apply)
- Then: Modules and parameterization
- Next: Environment state isolation and credentials
- Finally: CI/CD promotions with approvals and drift detection
Next steps
- Generalize your modules and publish a versioned module catalog
- Add cost controls and budgets per environment
- Introduce canary/blue-green strategies for high-risk changes in prod