Why this matters
As a Data Platform Engineer, you repeatedly provision similar building blocks: VPCs, IAM roles, S3 data lakes, Kafka topics, Databricks workspaces, Airflow clusters, and monitoring. Without reusable modules and standards, each team does it differently—leading to drift, security gaps, and slow delivery. Reusable modules let you ship secure, consistent infrastructure quickly across environments (dev, stage, prod) and projects.
- Real tasks: create a secure S3 data lake with encryption and lifecycle; standardize Kafka topics; enforce tags for cost and lineage; roll out a new data platform to multiple regions.
- Outcome: faster provisioning, fewer mistakes, easier audits, and predictable upgrades.
Note: The Quick Test is available to everyone; only logged-in users get saved progress.
Concept explained simply
Think of IaC modules as Lego blocks. Each block does one thing well (e.g., a secure S3 bucket). Standards are the rules that make blocks compatible: naming, tags, variables, outputs, and versioning. With consistent blocks and rules, anyone can assemble a reliable platform.
Mental model
- Interface: clear inputs (variables) and outputs (references) like a function signature.
- Contract: documented behavior, defaults, and constraints.
- Versioned: changes are tracked using semantic versioning (MAJOR.MINOR.PATCH).
- Portable: environment differences handled via inputs, not copy-paste.
Standards to adopt
- Naming: predictable resource names, e.g.,
<org>-<platform>-<env>-<component>. Use lower case and hyphens. - Tags/labels: enforce in modules (owner, cost_center, env, data_classification, system).
- Security defaults: encryption at rest, least-privilege IAM, private networking where possible.
- Inputs/outputs: minimal, explicit variables; sensible secure defaults; clear outputs.
- Versioning: pin module and provider versions; use semantic versioning.
- Structure: keep
modules/(reusable) separate fromstacks/orenvs/(instantiations). - Testing: validate, lint, and plan examples before release.
- Docs: README with inputs, outputs, examples, and change log.
Worked examples
Example 1: Terraform module for a secure S3 data bucket
Show code
# modules/s3_data_bucket/variables.tf
variable "name" { type = string }
variable "tags" { type = map(string) }
variable "versioning" { type = bool default = true }
variable "lifecycle_days_to_glacier" { type = number default = 90 }
# modules/s3_data_bucket/main.tf
resource "aws_s3_bucket" "this" {
bucket = var.name
tags = var.tags
}
resource "aws_s3_bucket_versioning" "this" {
bucket = aws_s3_bucket.this.id
versioning_configuration { status = var.versioning ? "Enabled" : "Suspended" }
}
resource "aws_s3_bucket_server_side_encryption_configuration" "this" {
bucket = aws_s3_bucket.this.id
rule { apply_server_side_encryption_by_default { sse_algorithm = "AES256" } }
}
resource "aws_s3_bucket_lifecycle_configuration" "this" {
bucket = aws_s3_bucket.this.id
rule {
id = "transition-to-glacier"
status = "Enabled"
transition { days = var.lifecycle_days_to_glacier storage_class = "GLACIER" }
}
}
# modules/s3_data_bucket/outputs.tf
output "bucket_id" { value = aws_s3_bucket.this.id }
# envs/dev/data_bucket.tf
module "data_bucket" {
source = "../modules/s3_data_bucket"
name = "acme-data-dev-raw"
tags = {
owner = "data-platform"
env = "dev"
system = "data-lake"
cost_center = "dwh"
data_classification = "internal"
}
}
# envs/prod/data_bucket.tf
module "data_bucket" {
source = "../modules/s3_data_bucket"
name = "acme-data-prod-raw"
tags = {
owner = "data-platform"
env = "prod"
system = "data-lake"
cost_center = "dwh"
data_classification = "confidential"
}
lifecycle_days_to_glacier = 30
}
Example 2: Enforcing standard tags via locals
Show code
# modules/_standards/tags.tf
variable "env" { type = string }
variable "owner" { type = string default = "data-platform" }
variable "system" { type = string }
variable "extra_tags" { type = map(string) default = {} }
locals {
required = {
owner = var.owner
env = var.env
system = var.system
}
tags = merge(local.required, var.extra_tags)
}
output "tags" { value = local.tags }
# usage inside another module
module "std_tags" {
source = "../_standards"
env = var.env
system = "data-lake"
extra_tags = { cost_center = "dwh", data_classification = "internal" }
}
resource "aws_kms_key" "lake" {
description = "Data lake key"
tags = module.std_tags.tags
}
Example 3: Reusable Kafka topic module
Show code
# modules/kafka_topic/variables.tf
variable "name" { type = string }
variable "partitions" { type = number default = 3 }
variable "replication" { type = number default = 3 }
variable "retention_ms" { type = number default = 604800000 } # 7 days
# modules/kafka_topic/main.tf
resource "kafka_topic" "this" {
name = var.name
partitions = var.partitions
replication_factor = var.replication
config = {
"retention.ms" = tostring(var.retention_ms)
"cleanup.policy" = "delete"
}
}
output "topic_name" { value = kafka_topic.this.name }
# envs/prod/streaming.tf
module "orders_topic" {
source = "../modules/kafka_topic"
name = "acme-orders-prod"
partitions = 12
retention_ms = 2592000000 # 30 days
}
How to structure your repo
iac/
modules/
s3_data_bucket/
main.tf
variables.tf
outputs.tf
README.md
kafka_topic/
main.tf
variables.tf
outputs.tf
README.md
_standards/
tags.tf
envs/
dev/
main.tf
data_bucket.tf
prod/
main.tf
data_bucket.tf
providers.tf
versions.tf
- modules/: reusable building blocks
- envs/: instantiations per environment
- Pin provider/module versions in versions.tf to ensure repeatable builds
Versioning and compatibility
- MAJOR: breaking changes (rename variables, remove outputs, different default behavior that breaks plans)
- MINOR: backward-compatible features (new optional vars, new outputs)
- PATCH: fixes with no interface change
Rules: avoid breaking outputs/variables; when unavoidable, release a new major and provide a migration note. Pin versions in envs to avoid surprise upgrades.
Testing and validation of modules
- Validate:
terraform validateon modules and examples - Lint: static checks for naming, deprecated fields, and style
- Plan examples: keep an
examples/folder per module; runterraform planbefore releasing - Smoke deploy: for critical modules, deploy to a sandbox and destroy
Release checklist
- Update README with inputs/outputs/examples
- Run validate and lint
- Plan examples with pinned providers
- Tag version (e.g., v0.3.0)
Security and policy integration
- Make secure the default: encryption, private networking, least privilege
- Policy-as-code: design modules to pass organization policies (e.g., required tags, blocked public buckets)
- Expose safe toggles only; avoid exposing raw, risky flags by default
Who this is for
- Data Platform Engineers building repeatable data infrastructure
- Data Engineers owning pipelines but needing consistent infra patterns
- Platform/SRE partners standardizing cloud resources
Prerequisites
- Basic Terraform or equivalent IaC knowledge (resources, variables, outputs)
- Familiarity with your cloud provider’s core services (networking, storage, IAM)
- CLI access to a sandbox account
Learning path
- Wrap a single resource into a minimal module
- Add standards (naming, tags, security defaults)
- Introduce versioning and examples
- Create environment stacks and pin versions
- Add tests and validation to your workflow
Exercises
Complete these in a sandbox account. Keep your code under version control. Everyone can take the Quick Test; saved progress is for logged-in users.
Exercise 1 — Secure S3 data bucket module
Goal: Create a reusable module that provisions a secure S3 bucket with versioning, encryption, lifecycle, and standard tags. Instantiate it for dev and prod.
- Requirements:
- Inputs: name, env, data_classification, extra_tags (map)
- Defaults: versioning on, encryption on (AES256), lifecycle to GLACIER after 90 days
- Outputs: bucket_id
- Enforce tags: owner=data-platform, env, system=data-lake, plus extra_tags
- Deliverables:
- Module code
- Two env instantiations (dev, prod) with different names and data_classification
Starter checklist
- Create modules/s3_data_bucket with variables.tf, main.tf, outputs.tf
- Add a small README listing inputs/outputs
- Run terraform validate and plan
Exercise 2 — Standard interface and version pinning
Goal: Publish a simple standards module that returns required tags and show how to pin and use a module version from envs.
- Requirements:
- Create modules/_standards that composes required and extra tags
- Expose inputs: env, system, owner (default), extra_tags
- Tag a version (e.g., v0.1.0) and reference it locally via a source path and a comment with intended tag
- Deliverables:
- One env stack that calls _standards and applies tags to a resource
- versions.tf pinning provider version
Exercise tips
- Keep module inputs minimal and clear
- Provide sensible defaults for security
- Use locals to merge tags
Common mistakes
- Too many inputs: leads to confusion. Self-check: can I remove or default any input?
- Leaking provider details: keep interfaces cloud-agnostic when possible
- No version pinning: unexpected upgrades. Self-check: do envs pin both provider and module versions?
- Copy-paste per environment: prefer inputs and small deltas via variables
- Missing required tags: enforce in modules, not in envs
Practical projects
- Data lake foundation: modules for raw/curated buckets, KMS keys, access roles, and Athena/Glue configuration
- Streaming backbone: modules for Kafka topics with standardized retention and compaction policies
- Workspace bootstrap: module for a Databricks/Airflow environment with IAM roles, logs, metrics, and tags
Mini challenge
Create a module bundle that provisions a data ingestion path: source bucket + KMS + IAM role with restricted access. Expose only three inputs: env, system, and data_classification. Ensure all resources inherit your standard tags and security defaults. Run validate and plan for dev and prod.
Next steps
- Harden modules by adding validation rules and preconditions
- Add example folders and run plans before every release
- Extend standards to include logging, metrics, and backup policies