Topic Not Found

Why this matters

As a Data Platform Engineer, you provision and evolve cloud platforms for ingestion, storage, processing, and analytics. Terraform lets you:

Spin up and version-control data infrastructure (data lake buckets, IAM, VPCs, warehouses) safely and repeatably.
Review changes before they happen (plans), catch drift, and roll forward confidently.
Share reusable modules so teams get consistent, secure foundations.

Who this is for

Engineers building or maintaining cloud data platforms.
Anyone moving manual cloud setup into reliable, auditable code.

Prerequisites

Basic command-line skills.
Familiarity with at least one cloud provider (AWS, Azure, or GCP). Examples here use AWS.
Installed Terraform CLI (v1.4+).

Concept explained simply

Terraform turns cloud infrastructure into code files you commit to git. You write the desired end-state, then Terraform computes what to create, change, or delete, and applies it.

Mental model

Blueprints: .tf files describe your desired resources (e.g., buckets, roles).
State: a mapping that remembers what already exists in the cloud.
Plan then apply: preview changes, then execute them.
Modules: reusable building blocks for common patterns (e.g., a secure S3 bucket).

Jargon buster

Provider: Plugin that talks to a platform (e.g., AWS, Azure).
Resource: A thing to create (e.g., aws_s3_bucket).
Data source: Read-only lookup (e.g., current AWS account ID).
Variables: Inputs you can set per environment.
Outputs: Values Terraform prints after apply to share with other stacks.
State: A file that tracks real resources Terraform manages.

Terraform workflow in 5 steps

Write: Add/modify .tf files (resources, variables, providers).
Init: terraform init to download providers and set up backends.
Validate & format: terraform fmt, terraform validate.
Plan: terraform plan to preview changes.
Apply: terraform apply to make changes for real.

Core files and structure

project/
  main.tf            # resources
  variables.tf       # input variables
  outputs.tf         # outputs
  providers.tf       # provider and auth config
  versions.tf        # pin Terraform and provider versions
  terraform.tfvars   # values for variables (do not store secrets in git)

Template snippets

# versions.tf
terraform {
  required_version = ">= 1.4.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

# providers.tf
variable "aws_region" {
  type    = string
  default = "us-east-1"
}
provider "aws" {
  region = var.aws_region
  # Use environment variables for credentials: AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY or SSO
}

# variables.tf
variable "environment" { type = string }

# outputs.tf
output "example" {
  value = "ok"
}

Worked examples

Example 1 — Initialize a new Terraform project

Create files: versions.tf, providers.tf, variables.tf as shown above.
Run terraform init.
Run terraform fmt and terraform validate.

Outcome: Terraform downloads the AWS provider and the configuration validates.

Example 2 — Secure S3 bucket for a data lake

Create a versioned, encrypted bucket with public access blocked.

# main.tf
resource "aws_s3_bucket" "raw" {
  bucket = "acme-data-raw-${var.environment}"
  tags = {
    Environment = var.environment
    Owner       = "data-platform"
  }
}

resource "aws_s3_bucket_versioning" "raw" {
  bucket = aws_s3_bucket.raw.id
  versioning_configuration { status = "Enabled" }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "raw" {
  bucket = aws_s3_bucket.raw.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

resource "aws_s3_bucket_public_access_block" "raw" {
  bucket                  = aws_s3_bucket.raw.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

output "raw_bucket_name" { value = aws_s3_bucket.raw.bucket }

Run:

terraform plan -var environment=dev
terraform apply -var environment=dev

Outcome: Terraform creates a secure bucket with versioning and encryption.

Example 3 — Variables and outputs for reuse

Make bucket name and tags configurable. Add a terraform.tfvars:

# variables.tf
variable "bucket_prefix" { type = string }
variable "common_tags" { type = map(string) }

# main.tf
resource "aws_s3_bucket" "bronze" {
  bucket = "${var.bucket_prefix}-bronze-${var.environment}"
  tags   = var.common_tags
}

# terraform.tfvars (example values)
bucket_prefix = "acme-data"
common_tags = {
  Environment = "dev"
  Owner       = "dp"
}

Outcome: You can swap prefixes and tags per environment without editing code.

Common mistakes (and how to self-check)

Forgetting to review plans: Always run terraform plan and read the diff before apply.
Committing state: Add *.tfstate and *.tfstate.backup to .gitignore.
Mixing environments: Keep separate workspaces or directories for dev/stage/prod; avoid shared state.
Not pinning versions: Use versions.tf to pin Terraform and providers.
Inline secrets: Use environment variables or secret managers, not plain-text in .tf files.
Manual console edits: Causes drift. Prefer Terraform changes and re-apply.

Self-check

Does terraform validate pass?
Does the plan show only expected resources and attributes?
Is state excluded from version control?
Are provider and Terraform versions pinned?

Exercises

Complete the exercises below. You don't need to run them in a real cloud; writing correct Terraform code is enough for practice.

Exercise 1: Create a secure, versioned S3 bucket with variables and outputs.
Exercise 2: Build a small module to create two differently named buckets by calling it twice.

Checklist before you consider an exercise done:
- Configuration passes terraform fmt and terraform validate.
- Plan shows only expected creations and no destroys.
- Variables and outputs are named clearly.

Practical projects

Data lake foundation: Buckets for raw/bronze/silver with encryption, versioning, lifecycle policies, and outputs.
Analytics warehouse bootstrap: Parameterize a warehouse cluster/database and attach IAM roles or service accounts.
Networking baseline: VPC, subnets, and security groups to isolate ETL jobs.

Learning path

Start with Terraform CLI, resources, variables, and outputs.
Learn state basics and remote backends (S3 + DynamoDB, or your cloud equivalent).
Create and consume modules; standardize tagging and encryption.
Use workspaces or directories to separate environments.
Add validation: terraform validate, pre-commit hooks, and plans in CI.
Later: policy-as-code and drift detection in pipelines.

Mini challenge

Create a Terraform configuration that provisions two buckets (raw and bronze) with a shared tag map and distinct names, and outputs both names. Add a variable to toggle versioning on/off.

Next steps

Refactor repeated code into a module.
Try a remote backend (S3 or your cloud equivalent) after you understand local state. Note: Remote backend buckets/tables must exist before enabling the backend.
Integrate terraform plan into your PR checks.

Progress saving note

The quick test on this page is available to everyone. If you log in, your test and exercise progress will be saved to your account.

Menu

Terraform Basics

Table of Contents

Why this matters

Who this is for

Prerequisites

Concept explained simply

Mental model

Terraform workflow in 5 steps

Core files and structure

Worked examples

Common mistakes (and how to self-check)

Exercises

Practical projects

Learning path

Mini challenge

Next steps

Practice Exercises

Secure, versioned S3 bucket with variables and outputs

Instructions

Expected Output

Reusable bucket module (called twice)

Terraform Basics — Quick Test

Have questions about Terraform Basics?

AI Assistant