How to learn IAM And Permissions Basics for Cloud Basics in Machine Learning Engineer for free

Who this is for

Machine Learning Engineers deploying models or training jobs on cloud platforms.
MLOps engineers wiring CI/CD, data access, and monitoring for ML systems.
Data scientists who need predictable, secure access to cloud data and compute.

Prerequisites

Basic understanding of cloud resources (compute, storage, projects/accounts).
Comfort with JSON/YAML configuration files.
Familiarity with your cloud provider's console or CLI (any provider is fine).

Why this matters

As an ML Engineer, you will: grant training jobs access to datasets, restrict who can read model artifacts, rotate secrets for pipelines, and prove compliance with audit logs. Getting IAM and permissions right prevents data leaks, avoids outage-causing denials, and keeps costs under control.

Real task: Let a training job read from a specific bucket prefix and write only to a model-artifacts location.
Real task: Allow CI to push images to a private registry but block it from deleting tags.
Real task: Give a contractor time-limited, read-only access to a dataset and nothing else.

Core concepts explained simply

Identity and Access Management (IAM) controls who can do what on which resource, and under which conditions.

Identity: a person (user), machine (service account or managed identity), or group/role.
Permission: an allowed action (e.g., read object, write log, start job).
Policy: attaches permissions to identities on resources. Has an effect (allow/deny) and optional conditions (time, IP, resource prefix).
Scope: where the policy applies (account/project, resource group, bucket, container, specific path/prefix).
Session: temporary credentials a job uses; should be short-lived.

Mental model: Access = Identity + Permission + Resource + Condition. Start narrow and expand only when a job fails for a legitimate reason.

Least privilege: Give the minimum permissions needed, scoped to the smallest resource possible, for the shortest time.

Guardrails: Deny policies, organization constraints, naming conventions, and logging that make it hard to do risky things by accident.

Cloud translation (mind the vocabulary differences):

AWS: IAM users/roles, policies (JSON), resource ARNs, SCPs for guardrails.
GCP: Principals (users/service accounts), roles/bindings, resource hierarchy (org > folder > project), IAM Conditions.
Azure: Entra ID principals, role assignments, scopes (subscription/resource group/resource), managed identities.

Worked examples

Example 1 — AWS: Training job reads data and writes artifacts

Goal: A training role can read only s3://ml-data/projects/churn/train/* and write only s3://ml-artifacts/churn/*.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ReadTrainingData",
      "Effect": "Allow",
      "Action": ["s3:GetObject"],
      "Resource": [
        "arn:aws:s3:::ml-data/projects/churn/train/*"
      ]
    },
    {
      "Sid": "WriteArtifacts",
      "Effect": "Allow",
      "Action": ["s3:PutObject"],
      "Resource": [
        "arn:aws:s3:::ml-artifacts/churn/*"
      ]
    }
  ]
}

Mental check: Identity (training role), Permissions (GetObject/PutObject), Resources (specific prefixes), Conditions (none yet). Least privilege: satisfied.

Example 2 — GCP: Vertex AI training reads dataset prefix

Goal: A Vertex AI custom job's service account can read only gs://ml-data/churn/train/*.

# Binding a custom role with storage.objectViewer over a prefix via IAM Conditions
bindings:
- role: roles/storage.objectViewer
  members:
  - serviceAccount:vertex-train@PROJECT_ID.iam.gserviceaccount.com
  condition:
    title: ReadTrainPrefixOnly
    expression: resource.name.startsWith("projects/_/buckets/ml-data/objects/churn/train/")
    description: Limit to training prefix

Mental check: Condition narrows scope to the exact object path prefix.

Example 3 — Azure: AML compute with managed identity and scoped storage access

Goal: Azure ML compute's system-assigned managed identity can read from a specific container and write to an artifacts container.

Assign Storage Blob Data Reader on scope: Storage Account > Container ml-data/churn/train.
Assign Storage Blob Data Contributor on scope: Storage Account > Container ml-artifacts/churn.
Use the managed identity in the AML job so tokens are short-lived.

Mental check: Right identity, right roles, minimum scopes.

Hands-on exercises

Do these locally as design tasks. They mirror the graded exercises below.

Exercise 1 — Design a least-privilege plan for a training pipeline

Identify all identities: CI pipeline, training job, model registry writer.
List needed resources: data prefix, artifacts location, container registry, logs.
Assign minimal permissions per identity with the narrowest scope (prefix-level where possible).
Add one guardrail (a deny or org policy) to block wildcard writes to data buckets.

Expected result: A short plan listing identities, roles/permissions, scopes, conditions, and one deny-style guardrail.

Exercise 2 — Write a read-only policy for a data prefix

Create a minimal policy that allows reading only from a single dataset prefix and nothing else. Use pseudo-JSON if needed. Include:

Effect: Allow
Action: only read/list
Resource: the exact prefix path
Optional condition: restrict to that prefix

Expected result: A small JSON-like document granting read to a specific prefix only.

Preflight checklist for ML jobs

Identity is a service account/managed identity (not a personal user).
Permissions include only required actions (read data, write artifacts, write logs).
Scope is the smallest resource (specific bucket/container or prefix).
Sessions are temporary; no long-lived keys checked into code.
Audit logs enabled; you can trace who accessed what.
Guardrails exist: deny wildcards on data buckets, restricted public access.

Common mistakes and how to self-check

Using broad roles (e.g., admin or owner). Self-check: Can this identity delete resources? If yes, you overscoped.
Wildcarding resources (e.g., bucket/* when only a prefix is needed). Self-check: Try listing outside the prefix; if it works, tighten scope.
Permanent secrets in code. Self-check: Search repos for keys/tokens; replace with managed identities or secret managers.
No separation of duties (CI has prod write/delete). Self-check: Review role boundaries; ensure CI can push but not delete or deploy to prod without approval.
Missing logging. Self-check: Trigger a read and confirm it appears in audit logs within minutes.

Mini challenge

You must give a contractor read-only access to the dataset prefix for 7 days and nothing else. Describe:

The identity you will create or use.
The exact permissions and scope.
How you will make access expire automatically.
One monitoring step to verify correct use.

Learning path

Step 1: Identities and sessions — service accounts, managed identities, short-lived credentials.
Step 2: Policies — allow vs deny, resource scoping, conditions.
Step 3: Data access patterns — prefix-level access, read vs write split.
Step 4: Guardrails — org policies, deny statements, private endpoints.
Step 5: Audit and review — enable logs, periodic access reviews, least-privilege drift checks.
Step 6: Automation — codify IAM in IaC and add policy tests in CI.

Practical projects

Project 1: Build a minimal ML sandbox with two identities: trainer (read data, write artifacts) and registry-writer (write to model registry only). Prove it with a small training run.
Project 2: Add a deny guardrail that blocks wildcard writes to data buckets/containers. Attempt a write outside the allowed prefix to confirm the deny triggers.
Project 3: Create IAM policy unit tests in CI to fail PRs that add broad permissions or wildcards.

Next steps

Refine your policies from the exercises into reusable templates.
Pair with a teammate to do a 20-minute access review on your current ML projects.
When ready, take the Quick Test below. Everyone can take it for free; only logged-in users get saved progress.

Ready? Take the Quick Test

Target score: 70% or higher. If you miss the mark, revisit the exercises and the common mistakes section, then try again.

Menu

IAM And Permissions Basics

Table of Contents

Who this is for

Prerequisites

Why this matters

Core concepts explained simply

Worked examples

Hands-on exercises

Exercise 1 — Design a least-privilege plan for a training pipeline

Exercise 2 — Write a read-only policy for a data prefix

Preflight checklist for ML jobs

Common mistakes and how to self-check

Mini challenge

Learning path

Practical projects

Next steps

Ready? Take the Quick Test

Practice Exercises

Design a least-privilege plan for a training pipeline

Instructions

Expected Output

Write a minimal read-only policy for a dataset prefix

IAM And Permissions Basics — Quick Test

Have questions about IAM And Permissions Basics?

AI Assistant