Why this matters
As a Data Platform Engineer, you provision and evolve cloud platforms for ingestion, storage, processing, and analytics. Terraform lets you:
- Spin up and version-control data infrastructure (data lake buckets, IAM, VPCs, warehouses) safely and repeatably.
- Review changes before they happen (plans), catch drift, and roll forward confidently.
- Share reusable modules so teams get consistent, secure foundations.
Who this is for
- Engineers building or maintaining cloud data platforms.
- Anyone moving manual cloud setup into reliable, auditable code.
Prerequisites
- Basic command-line skills.
- Familiarity with at least one cloud provider (AWS, Azure, or GCP). Examples here use AWS.
- Installed Terraform CLI (v1.4+).
Concept explained simply
Terraform turns cloud infrastructure into code files you commit to git. You write the desired end-state, then Terraform computes what to create, change, or delete, and applies it.
Mental model
- Blueprints: .tf files describe your desired resources (e.g., buckets, roles).
- State: a mapping that remembers what already exists in the cloud.
- Plan then apply: preview changes, then execute them.
- Modules: reusable building blocks for common patterns (e.g., a secure S3 bucket).
Jargon buster
- Provider: Plugin that talks to a platform (e.g., AWS, Azure).
- Resource: A thing to create (e.g., aws_s3_bucket).
- Data source: Read-only lookup (e.g., current AWS account ID).
- Variables: Inputs you can set per environment.
- Outputs: Values Terraform prints after apply to share with other stacks.
- State: A file that tracks real resources Terraform manages.
Terraform workflow in 5 steps
- Write: Add/modify .tf files (resources, variables, providers).
- Init:
terraform initto download providers and set up backends. - Validate & format:
terraform fmt,terraform validate. - Plan:
terraform planto preview changes. - Apply:
terraform applyto make changes for real.
Core files and structure
project/
main.tf # resources
variables.tf # input variables
outputs.tf # outputs
providers.tf # provider and auth config
versions.tf # pin Terraform and provider versions
terraform.tfvars # values for variables (do not store secrets in git)
Template snippets
# versions.tf
terraform {
required_version = ">= 1.4.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
# providers.tf
variable "aws_region" {
type = string
default = "us-east-1"
}
provider "aws" {
region = var.aws_region
# Use environment variables for credentials: AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY or SSO
}
# variables.tf
variable "environment" { type = string }
# outputs.tf
output "example" {
value = "ok"
}
Worked examples
Example 1 — Initialize a new Terraform project
- Create files: versions.tf, providers.tf, variables.tf as shown above.
- Run
terraform init. - Run
terraform fmtandterraform validate.
Outcome: Terraform downloads the AWS provider and the configuration validates.
Example 2 — Secure S3 bucket for a data lake
Create a versioned, encrypted bucket with public access blocked.
# main.tf
resource "aws_s3_bucket" "raw" {
bucket = "acme-data-raw-${var.environment}"
tags = {
Environment = var.environment
Owner = "data-platform"
}
}
resource "aws_s3_bucket_versioning" "raw" {
bucket = aws_s3_bucket.raw.id
versioning_configuration { status = "Enabled" }
}
resource "aws_s3_bucket_server_side_encryption_configuration" "raw" {
bucket = aws_s3_bucket.raw.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
resource "aws_s3_bucket_public_access_block" "raw" {
bucket = aws_s3_bucket.raw.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
output "raw_bucket_name" { value = aws_s3_bucket.raw.bucket }
Run:
terraform plan -var environment=dev
terraform apply -var environment=dev
Outcome: Terraform creates a secure bucket with versioning and encryption.
Example 3 — Variables and outputs for reuse
Make bucket name and tags configurable. Add a terraform.tfvars:
# variables.tf
variable "bucket_prefix" { type = string }
variable "common_tags" { type = map(string) }
# main.tf
resource "aws_s3_bucket" "bronze" {
bucket = "${var.bucket_prefix}-bronze-${var.environment}"
tags = var.common_tags
}
# terraform.tfvars (example values)
bucket_prefix = "acme-data"
common_tags = {
Environment = "dev"
Owner = "dp"
}
Outcome: You can swap prefixes and tags per environment without editing code.
Common mistakes (and how to self-check)
- Forgetting to review plans: Always run
terraform planand read the diff before apply. - Committing state: Add
*.tfstateand*.tfstate.backupto .gitignore. - Mixing environments: Keep separate workspaces or directories for dev/stage/prod; avoid shared state.
- Not pinning versions: Use
versions.tfto pin Terraform and providers. - Inline secrets: Use environment variables or secret managers, not plain-text in .tf files.
- Manual console edits: Causes drift. Prefer Terraform changes and re-apply.
Self-check
- Does
terraform validatepass? - Does the plan show only expected resources and attributes?
- Is state excluded from version control?
- Are provider and Terraform versions pinned?
Exercises
Complete the exercises below. You don't need to run them in a real cloud; writing correct Terraform code is enough for practice.
- Exercise 1: Create a secure, versioned S3 bucket with variables and outputs.
- Exercise 2: Build a small module to create two differently named buckets by calling it twice.
- Checklist before you consider an exercise done:
- Configuration passes
terraform fmtandterraform validate. - Plan shows only expected creations and no destroys.
- Variables and outputs are named clearly.
- Configuration passes
Practical projects
- Data lake foundation: Buckets for raw/bronze/silver with encryption, versioning, lifecycle policies, and outputs.
- Analytics warehouse bootstrap: Parameterize a warehouse cluster/database and attach IAM roles or service accounts.
- Networking baseline: VPC, subnets, and security groups to isolate ETL jobs.
Learning path
- Start with Terraform CLI, resources, variables, and outputs.
- Learn state basics and remote backends (S3 + DynamoDB, or your cloud equivalent).
- Create and consume modules; standardize tagging and encryption.
- Use workspaces or directories to separate environments.
- Add validation:
terraform validate, pre-commit hooks, and plans in CI. - Later: policy-as-code and drift detection in pipelines.
Mini challenge
Create a Terraform configuration that provisions two buckets (raw and bronze) with a shared tag map and distinct names, and outputs both names. Add a variable to toggle versioning on/off.
Next steps
- Refactor repeated code into a module.
- Try a remote backend (S3 or your cloud equivalent) after you understand local state. Note: Remote backend buckets/tables must exist before enabling the backend.
- Integrate
terraform planinto your PR checks.
Progress saving note
The quick test on this page is available to everyone. If you log in, your test and exercise progress will be saved to your account.