Topic Not Found

Why this matters

As a Data Platform Engineer, you repeatedly provision similar building blocks: VPCs, IAM roles, S3 data lakes, Kafka topics, Databricks workspaces, Airflow clusters, and monitoring. Without reusable modules and standards, each team does it differently—leading to drift, security gaps, and slow delivery. Reusable modules let you ship secure, consistent infrastructure quickly across environments (dev, stage, prod) and projects.

Real tasks: create a secure S3 data lake with encryption and lifecycle; standardize Kafka topics; enforce tags for cost and lineage; roll out a new data platform to multiple regions.
Outcome: faster provisioning, fewer mistakes, easier audits, and predictable upgrades.

Note: The Quick Test is available to everyone; only logged-in users get saved progress.

Concept explained simply

Think of IaC modules as Lego blocks. Each block does one thing well (e.g., a secure S3 bucket). Standards are the rules that make blocks compatible: naming, tags, variables, outputs, and versioning. With consistent blocks and rules, anyone can assemble a reliable platform.

Mental model

Interface: clear inputs (variables) and outputs (references) like a function signature.
Contract: documented behavior, defaults, and constraints.
Versioned: changes are tracked using semantic versioning (MAJOR.MINOR.PATCH).
Portable: environment differences handled via inputs, not copy-paste.

Standards to adopt

Naming: predictable resource names, e.g., <org>-<platform>-<env>-<component>. Use lower case and hyphens.
Tags/labels: enforce in modules (owner, cost_center, env, data_classification, system).
Security defaults: encryption at rest, least-privilege IAM, private networking where possible.
Inputs/outputs: minimal, explicit variables; sensible secure defaults; clear outputs.
Versioning: pin module and provider versions; use semantic versioning.
Structure: keep modules/ (reusable) separate from stacks/ or envs/ (instantiations).
Testing: validate, lint, and plan examples before release.
Docs: README with inputs, outputs, examples, and change log.

Worked examples

Example 1: Terraform module for a secure S3 data bucket

Show code

# modules/s3_data_bucket/variables.tf
variable "name" { type = string }
variable "tags" { type = map(string) }
variable "versioning" { type = bool default = true }
variable "lifecycle_days_to_glacier" { type = number default = 90 }

# modules/s3_data_bucket/main.tf
resource "aws_s3_bucket" "this" {
  bucket = var.name
  tags   = var.tags
}

resource "aws_s3_bucket_versioning" "this" {
  bucket = aws_s3_bucket.this.id
  versioning_configuration { status = var.versioning ? "Enabled" : "Suspended" }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "this" {
  bucket = aws_s3_bucket.this.id
  rule { apply_server_side_encryption_by_default { sse_algorithm = "AES256" } }
}

resource "aws_s3_bucket_lifecycle_configuration" "this" {
  bucket = aws_s3_bucket.this.id
  rule {
    id     = "transition-to-glacier"
    status = "Enabled"
    transition { days = var.lifecycle_days_to_glacier storage_class = "GLACIER" }
  }
}

# modules/s3_data_bucket/outputs.tf
output "bucket_id" { value = aws_s3_bucket.this.id }

# envs/dev/data_bucket.tf
module "data_bucket" {
  source = "../modules/s3_data_bucket"
  name   = "acme-data-dev-raw"
  tags = {
    owner = "data-platform"
    env   = "dev"
    system = "data-lake"
    cost_center = "dwh"
    data_classification = "internal"
  }
}

# envs/prod/data_bucket.tf
module "data_bucket" {
  source = "../modules/s3_data_bucket"
  name   = "acme-data-prod-raw"
  tags = {
    owner = "data-platform"
    env   = "prod"
    system = "data-lake"
    cost_center = "dwh"
    data_classification = "confidential"
  }
  lifecycle_days_to_glacier = 30
}

Example 2: Enforcing standard tags via locals

Show code

# modules/_standards/tags.tf
variable "env" { type = string }
variable "owner" { type = string default = "data-platform" }
variable "system" { type = string }
variable "extra_tags" { type = map(string) default = {} }

locals {
  required = {
    owner  = var.owner
    env    = var.env
    system = var.system
  }
  tags = merge(local.required, var.extra_tags)
}

output "tags" { value = local.tags }

# usage inside another module
module "std_tags" {
  source = "../_standards"
  env    = var.env
  system = "data-lake"
  extra_tags = { cost_center = "dwh", data_classification = "internal" }
}

resource "aws_kms_key" "lake" {
  description = "Data lake key"
  tags        = module.std_tags.tags
}

Example 3: Reusable Kafka topic module

Show code

# modules/kafka_topic/variables.tf
variable "name" { type = string }
variable "partitions" { type = number default = 3 }
variable "replication" { type = number default = 3 }
variable "retention_ms" { type = number default = 604800000 } # 7 days

# modules/kafka_topic/main.tf
resource "kafka_topic" "this" {
  name               = var.name
  partitions         = var.partitions
  replication_factor = var.replication
  config = {
    "retention.ms" = tostring(var.retention_ms)
    "cleanup.policy" = "delete"
  }
}

output "topic_name" { value = kafka_topic.this.name }

# envs/prod/streaming.tf
module "orders_topic" {
  source     = "../modules/kafka_topic"
  name       = "acme-orders-prod"
  partitions = 12
  retention_ms = 2592000000 # 30 days
}

How to structure your repo

iac/
  modules/
    s3_data_bucket/
      main.tf
      variables.tf
      outputs.tf
      README.md
    kafka_topic/
      main.tf
      variables.tf
      outputs.tf
      README.md
    _standards/
      tags.tf
  envs/
    dev/
      main.tf
      data_bucket.tf
    prod/
      main.tf
      data_bucket.tf
  providers.tf
  versions.tf

modules/: reusable building blocks
envs/: instantiations per environment
Pin provider/module versions in versions.tf to ensure repeatable builds

Versioning and compatibility

MAJOR: breaking changes (rename variables, remove outputs, different default behavior that breaks plans)
MINOR: backward-compatible features (new optional vars, new outputs)
PATCH: fixes with no interface change

Rules: avoid breaking outputs/variables; when unavoidable, release a new major and provide a migration note. Pin versions in envs to avoid surprise upgrades.

Testing and validation of modules

Validate: terraform validate on modules and examples
Lint: static checks for naming, deprecated fields, and style
Plan examples: keep an examples/ folder per module; run terraform plan before releasing
Smoke deploy: for critical modules, deploy to a sandbox and destroy

Release checklist

Update README with inputs/outputs/examples
Run validate and lint
Plan examples with pinned providers
Tag version (e.g., v0.3.0)

Security and policy integration

Make secure the default: encryption, private networking, least privilege
Policy-as-code: design modules to pass organization policies (e.g., required tags, blocked public buckets)
Expose safe toggles only; avoid exposing raw, risky flags by default

Who this is for

Data Platform Engineers building repeatable data infrastructure
Data Engineers owning pipelines but needing consistent infra patterns
Platform/SRE partners standardizing cloud resources

Prerequisites

Basic Terraform or equivalent IaC knowledge (resources, variables, outputs)
Familiarity with your cloud provider’s core services (networking, storage, IAM)
CLI access to a sandbox account

Learning path

Wrap a single resource into a minimal module
Add standards (naming, tags, security defaults)
Introduce versioning and examples
Create environment stacks and pin versions
Add tests and validation to your workflow

Exercises

Complete these in a sandbox account. Keep your code under version control. Everyone can take the Quick Test; saved progress is for logged-in users.

Exercise 1 — Secure S3 data bucket module

Goal: Create a reusable module that provisions a secure S3 bucket with versioning, encryption, lifecycle, and standard tags. Instantiate it for dev and prod.

Requirements:
- Inputs: name, env, data_classification, extra_tags (map)
- Defaults: versioning on, encryption on (AES256), lifecycle to GLACIER after 90 days
- Outputs: bucket_id
- Enforce tags: owner=data-platform, env, system=data-lake, plus extra_tags
Deliverables:
- Module code
- Two env instantiations (dev, prod) with different names and data_classification

Starter checklist

Create modules/s3_data_bucket with variables.tf, main.tf, outputs.tf
Add a small README listing inputs/outputs
Run terraform validate and plan

Exercise 2 — Standard interface and version pinning

Goal: Publish a simple standards module that returns required tags and show how to pin and use a module version from envs.

Requirements:
- Create modules/_standards that composes required and extra tags
- Expose inputs: env, system, owner (default), extra_tags
- Tag a version (e.g., v0.1.0) and reference it locally via a source path and a comment with intended tag
Deliverables:
- One env stack that calls _standards and applies tags to a resource
- versions.tf pinning provider version

Exercise tips

Keep module inputs minimal and clear
Provide sensible defaults for security
Use locals to merge tags

Common mistakes

Too many inputs: leads to confusion. Self-check: can I remove or default any input?
Leaking provider details: keep interfaces cloud-agnostic when possible
No version pinning: unexpected upgrades. Self-check: do envs pin both provider and module versions?
Copy-paste per environment: prefer inputs and small deltas via variables
Missing required tags: enforce in modules, not in envs

Practical projects

Data lake foundation: modules for raw/curated buckets, KMS keys, access roles, and Athena/Glue configuration
Streaming backbone: modules for Kafka topics with standardized retention and compaction policies
Workspace bootstrap: module for a Databricks/Airflow environment with IAM roles, logs, metrics, and tags

Mini challenge

Create a module bundle that provisions a data ingestion path: source bucket + KMS + IAM role with restricted access. Expose only three inputs: env, system, and data_classification. Ensure all resources inherit your standard tags and security defaults. Run validate and plan for dev and prod.

Next steps

Harden modules by adding validation rules and preconditions
Add example folders and run plans before every release
Extend standards to include logging, metrics, and backup policies

Menu

Reusable Modules And Standards

Table of Contents

Why this matters

Concept explained simply

Mental model

Standards to adopt

Worked examples

Example 1: Terraform module for a secure S3 data bucket

Example 2: Enforcing standard tags via locals

Example 3: Reusable Kafka topic module

How to structure your repo

Versioning and compatibility

Testing and validation of modules

Security and policy integration

Who this is for

Prerequisites

Learning path

Exercises

Exercise 1 — Secure S3 data bucket module

Exercise 2 — Standard interface and version pinning

Common mistakes

Practical projects

Mini challenge

Next steps

Practice Exercises

Build a secure S3 data bucket module and use it in dev/prod

Instructions

Expected Output

Create a standards module and pin versions

Reusable Modules And Standards — Quick Test

Have questions about Reusable Modules And Standards?

AI Assistant