luvv to helpDiscover the Best Free Online Tools
Topic 4 of 8

Local Dev And Testing Tooling

Learn Local Dev And Testing Tooling for free with explanations, exercises, and a quick test (for Data Platform Engineer).

Published: January 11, 2026 | Updated: January 11, 2026

Who this is for

You build and support data platforms and want fast, reliable local feedback loops. If you touch ingestion, transformation, orchestration, or platform tooling, this lesson is for you.

Prerequisites

  • Basic command-line familiarity
  • Docker installed (or ready to use an alternative like Podman)
  • Beginner knowledge of SQL and Python (helpful, not mandatory)

Why this matters

In real data platform work, you will:

  • Reproduce pipeline issues locally without burning cloud credits
  • Validate schema/contract changes before they break downstream teams
  • Run transformations and tests quickly on your laptop
  • Mock cloud services (S3, Kinesis, IAM) to test integrations
  • Ship platform templates and dev containers so everyone can get productive in minutes

Concept explained simply

Local dev and testing tooling is a set of small, reliable tools and patterns that let you run a mini data platform on your laptop. You spin up lightweight versions of storage, compute, orchestration, and testing so you can iterate fast and confidently.

Mental model

Think in layers:

  • Environment: containers/dev containers, reproducible shells
  • Data services: local S3, warehouse/DB, message queue
  • Transform + Orchestration: dbt/Spark + Airflow/Prefect locally
  • Testing: unit, data quality, contract, integration
  • Developer UX: make/tasks, pre-commit, seed fixtures, example datasets
What “good” looks like
  • One command to start the stack
  • One command to run tests
  • Sample data included; easy to reset
  • Docs in README; environment parity with CI

Core components of a local data sandbox

  • Container runtime: Docker or Podman
  • Object storage: MinIO or LocalStack (S3 API)
  • Warehouse/DB: Postgres, DuckDB, or SQLite
  • Compute: Spark local or DuckDB; optional Flink
  • Orchestration: Airflow or Prefect in local mode
  • Testing: pytest, dbt tests, Great Expectations, data contract checks
  • Dev productivity: Makefile/Taskfile, pre-commit, .env files, seed datasets

Minimal setup steps (recommended)

Step 1: Create a project folder with a README, .env, and Makefile. Add a docker-compose.yml describing services.
Step 2: Choose Postgres (warehouse) + MinIO (S3). Add a small sample dataset (CSV or Parquet) under ./data.
Step 3: Provide make targets: up, down, test, seed, clean. Keep outputs under ./local_artifacts for easy reset.
Step 4: Add tests: a simple data quality test (row count, null checks) and a unit test for a transform.
Step 5: Document how to run everything in under 2 minutes.
Example docker-compose.yml (trimmed)
version: '3.9'
services:
  postgres:
    image: postgres:15
    environment:
      POSTGRES_USER: demo
      POSTGRES_PASSWORD: demo
      POSTGRES_DB: warehouse
    ports:
      - "5432:5432"
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U demo -d warehouse"]
      interval: 5s
      timeout: 3s
      retries: 10
  minio:
    image: quay.io/minio/minio:latest
    command: server /data --console-address ":9001"
    environment:
      MINIO_ROOT_USER: minio
      MINIO_ROOT_PASSWORD: minio123
    ports:
      - "9000:9000"
      - "9001:9001"
    volumes:
      - ./local_artifacts/minio:/data

Worked examples

Example 1: Seed data and run a local transform

Walkthrough
  1. Start services: docker compose up -d
  2. Seed Postgres with a CSV:
    psql postgresql://demo:demo@localhost:5432/warehouse \
      -c "CREATE TABLE IF NOT EXISTS orders_raw (order_id int, user_id int, amount numeric, created_at timestamp);" \
      -c "\copy orders_raw FROM 'data/orders.csv' WITH (FORMAT csv, HEADER true);"
  3. Run a transform (example SQL):
    psql postgresql://demo:demo@localhost:5432/warehouse -c "
    CREATE TABLE IF NOT EXISTS orders_clean AS
    SELECT order_id, user_id, amount::numeric(12,2) AS amount, created_at::timestamp AS created_at
    FROM orders_raw
    WHERE amount IS NOT NULL;"
  4. Quick data check:
    psql postgresql://demo:demo@localhost:5432/warehouse -c "SELECT COUNT(*) FROM orders_clean;"

Example 2: Data quality tests with pytest + SQL

Walkthrough

Create tests/test_quality.py

import os
import psycopg

DSN = os.getenv("DSN", "postgresql://demo:demo@localhost:5432/warehouse")

def test_no_negative_amounts():
    with psycopg.connect(DSN) as conn:
        cur = conn.execute("SELECT COUNT(*) FROM orders_clean WHERE amount < 0")
        cnt = cur.fetchone()[0]
        assert cnt == 0, f"Found {cnt} negative amounts"

def test_recent_data_exists():
    with psycopg.connect(DSN) as conn:
        cur = conn.execute("SELECT COUNT(*) FROM orders_clean WHERE created_at > now() - interval '90 days'")
        cnt = cur.fetchone()[0]
        assert cnt > 0, "No recent data found"

Run: pytest -q

Example 3: Contract test for schema changes

Walkthrough

Define a simple schema contract using a SQL assertion file (contracts/orders_clean.sql):

-- Expect required columns and types
SELECT
  (SELECT COUNT(*) FROM information_schema.columns
    WHERE table_name = 'orders_clean' AND column_name IN ('order_id','user_id','amount','created_at')) = 4 AS has_columns;

-- Type spot-check (numeric scale)
SELECT pg_typeof(amount) = 'numeric'::regtype AS amount_is_numeric FROM orders_clean LIMIT 1;

Run checks:

psql postgresql://demo:demo@localhost:5432/warehouse -f contracts/orders_clean.sql

Interpret: any false rows mean the contract is broken.

Exercises (hands-on)

These mirror the exercises below. Run them now and mark your checklist.

Exercise 1: One-command local stack

Goal: Start Postgres + MinIO with Docker Compose, load a CSV, and verify a transform table appears.

  • Create docker-compose.yml with services (see example).
  • Add a Makefile with targets: up, seed, transform, test, down.
  • Seed the CSV into orders_raw, create orders_clean, and SELECT COUNT(*).

Exercise 2: Write tests

Goal: Add two tests—one quality test (no negative amounts) and one contract test (required columns exist).

  • Add tests/test_quality.py or equivalent SQL checks.
  • Ensure tests fail if you temporarily insert bad data, then pass after fix.

Checklist

  • [ ] docker compose up brings services healthy
  • [ ] Makefile target up works
  • [ ] Seeded data visible in orders_raw
  • [ ] orders_clean created with transformed data
  • [ ] Tests run with a single command
  • [ ] Negative amount test passes
  • [ ] Contract check passes
  • [ ] One-command cleanup resets state

Common mistakes and self-check

  • Missing parity with CI: Self-check—run identical commands locally and in CI (using Makefile/Taskfile) to ensure parity.
  • Brittle paths: Self-check—use env vars and relative project paths; verify on a clean clone.
  • Forgotten test data reset: Self-check—add a clean target that drops/re-creates tables/buckets.
  • Overcomplicated stacks: Self-check—start minimal (DB + storage); add only what’s needed for the task.
  • No healthchecks: Self-check—add healthchecks so tests wait until services are ready.

Practical projects

  • Bootstrap a team template repo with docker-compose, Makefile, and a sample dataset
  • Add data quality tests and a contract check for two critical tables
  • Introduce pre-commit hooks to validate SQL formatting and run a fast smoke test
  • Wrap common flows in make targets: up, seed, transform, test, down, clean

Learning path

  • Start: Minimal local stack with Postgres + MinIO
  • Next: Add automated tests (pytest/dbt tests) and pre-commit
  • Then: Add orchestration locally (Airflow or Prefect) for end-to-end runs
  • Finally: Mock cloud dependencies (LocalStack) and add contract testing in CI

Mini challenge

Create a single make e2e command that: brings up services, seeds data, runs transforms, executes tests, prints a short summary, then exits with the correct status code.

Next steps

  • Polish your template repo and share it with your team
  • Add sample data generators (e.g., synthetic orders) for richer tests
  • Expand tests to cover edge cases (empty files, schema drift, timezones)
About saving your progress

The quick test and exercises are available to everyone. If you log in, your progress will be saved automatically.

Practice Exercises

2 exercises to complete

Instructions

Build a minimal local stack with Docker Compose, seed data, and run a transform.

  1. Create docker-compose.yml with Postgres (user=demo, pass=demo, db=warehouse) and MinIO (optional).
  2. Add a Makefile with targets: up, seed, transform, test, down, clean.
  3. Place data/orders.csv with columns: order_id,user_id,amount,created_at.
  4. Seed into orders_raw, then create orders_clean casting amount and created_at.
  5. Verify with SELECT COUNT(*) FROM orders_clean; and ensure it is > 0.
Expected Output
Services healthy; orders_raw and orders_clean exist; COUNT(*) on orders_clean returns > 0.

Local Dev And Testing Tooling — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Local Dev And Testing Tooling?

AI Assistant

Ask questions about this tool