How to learn Local Dev And Testing Tooling for Developer Experience For Data in Data Platform Engineer for free

Who this is for

You build and support data platforms and want fast, reliable local feedback loops. If you touch ingestion, transformation, orchestration, or platform tooling, this lesson is for you.

Prerequisites

Basic command-line familiarity
Docker installed (or ready to use an alternative like Podman)
Beginner knowledge of SQL and Python (helpful, not mandatory)

Why this matters

In real data platform work, you will:

Reproduce pipeline issues locally without burning cloud credits
Validate schema/contract changes before they break downstream teams
Run transformations and tests quickly on your laptop
Mock cloud services (S3, Kinesis, IAM) to test integrations
Ship platform templates and dev containers so everyone can get productive in minutes

Concept explained simply

Local dev and testing tooling is a set of small, reliable tools and patterns that let you run a mini data platform on your laptop. You spin up lightweight versions of storage, compute, orchestration, and testing so you can iterate fast and confidently.

Mental model

Think in layers:

Environment: containers/dev containers, reproducible shells
Data services: local S3, warehouse/DB, message queue
Transform + Orchestration: dbt/Spark + Airflow/Prefect locally
Testing: unit, data quality, contract, integration
Developer UX: make/tasks, pre-commit, seed fixtures, example datasets

What “good” looks like

One command to start the stack
One command to run tests
Sample data included; easy to reset
Docs in README; environment parity with CI

Core components of a local data sandbox

Container runtime: Docker or Podman
Object storage: MinIO or LocalStack (S3 API)
Warehouse/DB: Postgres, DuckDB, or SQLite
Compute: Spark local or DuckDB; optional Flink
Orchestration: Airflow or Prefect in local mode
Testing: pytest, dbt tests, Great Expectations, data contract checks
Dev productivity: Makefile/Taskfile, pre-commit, .env files, seed datasets

Minimal setup steps (recommended)

Step 1: Create a project folder with a README, .env, and Makefile. Add a docker-compose.yml describing services.

Step 2: Choose Postgres (warehouse) + MinIO (S3). Add a small sample dataset (CSV or Parquet) under ./data.

Step 3: Provide make targets: up, down, test, seed, clean. Keep outputs under ./local_artifacts for easy reset.

Step 4: Add tests: a simple data quality test (row count, null checks) and a unit test for a transform.

Step 5: Document how to run everything in under 2 minutes.

Example docker-compose.yml (trimmed)

version: '3.9'
services:
  postgres:
    image: postgres:15
    environment:
      POSTGRES_USER: demo
      POSTGRES_PASSWORD: demo
      POSTGRES_DB: warehouse
    ports:
      - "5432:5432"
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U demo -d warehouse"]
      interval: 5s
      timeout: 3s
      retries: 10
  minio:
    image: quay.io/minio/minio:latest
    command: server /data --console-address ":9001"
    environment:
      MINIO_ROOT_USER: minio
      MINIO_ROOT_PASSWORD: minio123
    ports:
      - "9000:9000"
      - "9001:9001"
    volumes:
      - ./local_artifacts/minio:/data

Worked examples

Example 1: Seed data and run a local transform

Walkthrough

Start services: docker compose up -d

Seed Postgres with a CSV:

psql postgresql://demo:demo@localhost:5432/warehouse \
  -c "CREATE TABLE IF NOT EXISTS orders_raw (order_id int, user_id int, amount numeric, created_at timestamp);" \
  -c "\copy orders_raw FROM 'data/orders.csv' WITH (FORMAT csv, HEADER true);"

Run a transform (example SQL):

psql postgresql://demo:demo@localhost:5432/warehouse -c "
CREATE TABLE IF NOT EXISTS orders_clean AS
SELECT order_id, user_id, amount::numeric(12,2) AS amount, created_at::timestamp AS created_at
FROM orders_raw
WHERE amount IS NOT NULL;"

Quick data check:

psql postgresql://demo:demo@localhost:5432/warehouse -c "SELECT COUNT(*) FROM orders_clean;"

Example 2: Data quality tests with pytest + SQL

Walkthrough

Create tests/test_quality.py

import os
import psycopg

DSN = os.getenv("DSN", "postgresql://demo:demo@localhost:5432/warehouse")

def test_no_negative_amounts():
    with psycopg.connect(DSN) as conn:
        cur = conn.execute("SELECT COUNT(*) FROM orders_clean WHERE amount < 0")
        cnt = cur.fetchone()[0]
        assert cnt == 0, f"Found {cnt} negative amounts"

def test_recent_data_exists():
    with psycopg.connect(DSN) as conn:
        cur = conn.execute("SELECT COUNT(*) FROM orders_clean WHERE created_at > now() - interval '90 days'")
        cnt = cur.fetchone()[0]
        assert cnt > 0, "No recent data found"

Run: pytest -q

Example 3: Contract test for schema changes

Walkthrough

Define a simple schema contract using a SQL assertion file (contracts/orders_clean.sql):

-- Expect required columns and types
SELECT
  (SELECT COUNT(*) FROM information_schema.columns
    WHERE table_name = 'orders_clean' AND column_name IN ('order_id','user_id','amount','created_at')) = 4 AS has_columns;

-- Type spot-check (numeric scale)
SELECT pg_typeof(amount) = 'numeric'::regtype AS amount_is_numeric FROM orders_clean LIMIT 1;

Run checks:

psql postgresql://demo:demo@localhost:5432/warehouse -f contracts/orders_clean.sql

Interpret: any false rows mean the contract is broken.

Exercises (hands-on)

These mirror the exercises below. Run them now and mark your checklist.

Exercise 1: One-command local stack

Goal: Start Postgres + MinIO with Docker Compose, load a CSV, and verify a transform table appears.

Create docker-compose.yml with services (see example).
Add a Makefile with targets: up, seed, transform, test, down.
Seed the CSV into orders_raw, create orders_clean, and SELECT COUNT(*).

Exercise 2: Write tests

Goal: Add two tests—one quality test (no negative amounts) and one contract test (required columns exist).

Add tests/test_quality.py or equivalent SQL checks.
Ensure tests fail if you temporarily insert bad data, then pass after fix.

Checklist

[ ] docker compose up brings services healthy
[ ] Makefile target up works
[ ] Seeded data visible in orders_raw
[ ] orders_clean created with transformed data
[ ] Tests run with a single command
[ ] Negative amount test passes
[ ] Contract check passes
[ ] One-command cleanup resets state

Common mistakes and self-check

Missing parity with CI: Self-check—run identical commands locally and in CI (using Makefile/Taskfile) to ensure parity.
Brittle paths: Self-check—use env vars and relative project paths; verify on a clean clone.
Forgotten test data reset: Self-check—add a clean target that drops/re-creates tables/buckets.
Overcomplicated stacks: Self-check—start minimal (DB + storage); add only what’s needed for the task.
No healthchecks: Self-check—add healthchecks so tests wait until services are ready.

Practical projects

Bootstrap a team template repo with docker-compose, Makefile, and a sample dataset
Add data quality tests and a contract check for two critical tables
Introduce pre-commit hooks to validate SQL formatting and run a fast smoke test
Wrap common flows in make targets: up, seed, transform, test, down, clean

Learning path

Start: Minimal local stack with Postgres + MinIO
Next: Add automated tests (pytest/dbt tests) and pre-commit
Then: Add orchestration locally (Airflow or Prefect) for end-to-end runs
Finally: Mock cloud dependencies (LocalStack) and add contract testing in CI

Mini challenge

Create a single make e2e command that: brings up services, seeds data, runs transforms, executes tests, prints a short summary, then exits with the correct status code.

Next steps

Polish your template repo and share it with your team
Add sample data generators (e.g., synthetic orders) for richer tests
Expand tests to cover edge cases (empty files, schema drift, timezones)

About saving your progress

The quick test and exercises are available to everyone. If you log in, your progress will be saved automatically.

Menu

Local Dev And Testing Tooling

Table of Contents

Who this is for

Prerequisites

Why this matters

Concept explained simply

Mental model

Core components of a local data sandbox

Minimal setup steps (recommended)

Worked examples

Example 1: Seed data and run a local transform

Example 2: Data quality tests with pytest + SQL

Example 3: Contract test for schema changes

Exercises (hands-on)

Exercise 1: One-command local stack

Exercise 2: Write tests

Checklist

Common mistakes and self-check

Practical projects

Learning path

Mini challenge

Next steps

Practice Exercises

Spin up a one-command local data stack

Instructions

Expected Output

Add data quality and contract tests

Local Dev And Testing Tooling — Quick Test

Have questions about Local Dev And Testing Tooling?

AI Assistant