luvv to helpDiscover the Best Free Online Tools
Topic 4 of 8

Environment Configuration Management

Learn Environment Configuration Management for free with explanations, exercises, and a quick test (for Data Engineer).

Published: January 8, 2026 | Updated: January 8, 2026

Who this is for

  • Aspiring and junior Data Engineers who need reliable dev/stage/prod setups.
  • Analysts or ML engineers turning notebooks into repeatable jobs.
  • Anyone who has heard “works on my machine” and wants it to work everywhere.

Prerequisites

  • Basic command line comfort (cd, mkdir, running scripts).
  • Python basics (running a script, installing packages).
  • Optional but helpful: Docker installed locally.

Why this matters

Real Data Engineering tasks rely on consistent environments to avoid silent failures and costly re-runs. You will:

  • Deploy batch jobs (e.g., Spark/SQL/ETL) across dev, staging, and production.
  • Run Airflow/DBT pipelines that require the same versions and credentials in each environment.
  • Share reproducible projects with teammates and CI systems.
  • Rotate secrets safely and parameterize jobs per environment without code changes.

Concept explained simply

Environment Configuration Management is how you make your code run the same way everywhere by:

  • Pinning software versions.
  • Separating configuration from code (using environment variables and config files).
  • Keeping secrets out of source code.
  • Automating setup so it’s repeatable and testable.

Mental model

Think: Recipe + Pantry + Labels.

  • Recipe: code and dependency list (requirements.txt).
  • Pantry: base runtime image or virtual environment with the tools you need.
  • Labels: environment variables/config files that tell the same code how to behave in dev vs prod.

Core principles

  • Idempotency: running setup twice should produce the same state.
  • Pin versions: specify exact versions to avoid breaking changes.
  • Config, not code: swap environments by changing variables, not code.
  • Secrets management: never commit secrets; use env vars or a secret manager.
  • Environment parity: dev should mirror prod as closely as practical.
  • Documentation as code: .env.example, Makefile, and READMEs reduce guesswork.

Worked examples

Example 1: A small ETL with .env configuration

Goal: Parameterize input/output paths and processing batch size via environment variables.

# .env (do not commit real secrets)
INPUT_PATH=data/input.csv
OUTPUT_PATH=data/output.parquet
CHUNK_SIZE=5000
  
# .env.example (safe to commit)
INPUT_PATH=path/to/input.csv
OUTPUT_PATH=path/to/output.parquet
CHUNK_SIZE=5000
  
# job.py
import os
from dotenv import load_dotenv
import pandas as pd

load_dotenv()
input_path = os.getenv("INPUT_PATH", "data/input.csv")
output_path = os.getenv("OUTPUT_PATH", "data/output.parquet")
chunk_size = int(os.getenv("CHUNK_SIZE", "10000"))

# Minimal example: read once, write once (no real chunking)
df = pd.read_csv(input_path)
df.to_parquet(output_path, index=False)
print(f"Wrote {len(df)} rows to {output_path}")
  

Run: create venv, install python-dotenv, pandas, pyarrow; then python job.py. Change .env values to switch behavior without editing code.

Example 2: Reproducible environment with Docker
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
ENV PYTHONUNBUFFERED=1
CMD ["python", "job.py"]
  
# requirements.txt
pandas==2.2.0
pyarrow==14.0.1
python-dotenv==1.0.0
  

Build and run:

docker build -t etl-job:latest .
# Use --env-file to pass configuration without baking it into the image
docker run --rm --env-file .env -v "$PWD/data":/app/data etl-job:latest
  

Result: consistent runtime independent of host machine.

Example 3: Multi-environment configs (dev/stage/prod)
# config/dev.json
{
  "db_url": "postgresql://dev_user@localhost:5432/devdb",
  "bucket": "local-bucket",
  "parallelism": 2
}
# config/prod.json
{
  "db_url": "postgresql://service@prod:5432/warehouse",
  "bucket": "analytics-bucket",
  "parallelism": 8
}
  
# job_configured.py
import json, os
from dotenv import load_dotenv
load_dotenv()
env = os.getenv("APP_ENV", "dev")
with open(f"config/{env}.json") as f:
    cfg = json.load(f)
print(f"Running with {env} config: parallelism={cfg['parallelism']}")
  

Switch environments by setting APP_ENV=dev|stage|prod.

Step-by-step: make your project reproducible

  1. Create a project folder: env-demo. Add folders: data, config.
  2. Initialize a virtual environment and pin dependencies.
    python -m venv .venv
    source .venv/bin/activate  # Windows: .venv\\Scripts\\activate
    pip install --upgrade pip
    pip install pandas==2.2.0 pyarrow==14.0.1 python-dotenv==1.0.0
    pip freeze > requirements.txt
        
  3. Add .env and .env.example; keep secrets only in .env.
  4. Add a simple Makefile to standardize commands.
    run: 
    	python job.py
    setup:
    	python -m venv .venv && . .venv/bin/activate && pip install -r requirements.txt
        
  5. Smoke test: run the job locally; change .env to simulate staging.
  6. Optional: build a Docker image and run with --env-file to ensure parity.
Tip: Keep secrets safe
  • Never commit .env files with real secrets.
  • Store example keys in .env.example (no secrets).
  • Rotate credentials periodically and prefer short-lived tokens where available.

Exercises

Complete these in order. They mirror the exercises below so your work can be checked.

  1. Exercise 1: Parameterize a small ETL with .env (see Exercises list below).
  2. Exercise 2: Lock reproducible dependencies with a venv and pinned versions.
  3. Exercise 3: Containerize and run the job with environment variables.

Self-check checklist

  • [ ] I can run the same script by only changing .env or APP_ENV.
  • [ ] My requirements.txt has exact versions.
  • [ ] My .env is ignored by git and .env.example documents required keys.
  • [ ] Docker run with --env-file produces the same output as local run.

Common mistakes and how to self-check

  • Forgetting to pin versions: Run pip freeze > requirements.txt; re-install to verify consistent versions.
  • Committing secrets: Ensure .gitignore includes .env; check git history if something slipped.
  • Hardcoding paths: Replace paths with env variables (INPUT_PATH, OUTPUT_PATH).
  • Config drift across environments: Keep dev/stage/prod config files in one place; document differences.
  • Docker image missing dependencies: Build after updating requirements.txt and verify with docker run.

Practical projects

  • Project 1: CSV to Parquet batch job with .env-driven input/output and chunk size. Provide .env.example.
  • Project 2: Dockerized data quality checker that reads a config JSON per environment and prints validation results.
  • Project 3: Simple orchestration (cron or a tiny scheduler script) that runs the same container with different .env files for dev and prod.

Learning path

  • Today: Master .env files, pinned dependencies, and Docker basics.
  • Next: Introduce Makefile or task runners to codify setup and run commands.
  • Then: Learn secrets managers and CI variables; template configs per environment.
  • Later: Infrastructure-as-Code and container orchestration for scalable, repeatable deployments.

Next steps

  • Finish the exercises below and ensure your checklist is all marked.
  • Convert one of your existing scripts into a parameterized, dockerized job.
  • Take the Quick Test to confirm your understanding.

Mini challenge

Given a script that reads from one table and writes a daily partitioned Parquet to a folder, make it environment-agnostic by moving connection strings, output paths, and partition size into .env or config files. Prove it by running once with APP_ENV=dev and once with APP_ENV=prod without changing code.

Quick Test

Everyone can take the test; only logged-in users get saved progress.

Practice Exercises

3 exercises to complete

Instructions

  1. Create data/input.csv with a few rows (e.g., id,name).
  2. Create a .env with INPUT_PATH, OUTPUT_PATH, CHUNK_SIZE and a matching .env.example without secrets.
  3. Install packages in a venv: pandas==2.2.0, pyarrow==14.0.1, python-dotenv==1.0.0.
  4. Write job.py that loads .env, reads CSV, writes Parquet, and prints row count.
  5. Run: python job.py. Change OUTPUT_PATH in .env and run again without touching code.
Expected Output
Console prints 'Wrote N rows to data/output.parquet' and the file exists at the path set in .env.

Environment Configuration Management — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Environment Configuration Management?

AI Assistant

Ask questions about this tool