luvv to helpDiscover the Best Free Online Tools

Security And Compliance For ML

Learn Security And Compliance For ML for MLOps Engineer for free: roadmap, examples, subskills, and a skill exam.

Published: January 4, 2026 | Updated: January 4, 2026

Why this skill matters for MLOps Engineers

Security and compliance for ML ensures your data, models, and pipelines are protected end-to-end. As an MLOps Engineer, you enable safe experimentation and reliable delivery by enforcing least-privilege access, protecting PII, isolating networks, logging everything important, and proving that releases meet policy.

  • Protect sensitive training data and model outputs.
  • Prevent credential leaks and code-to-production supply chain risks.
  • Enable regulated workloads (e.g., privacy laws) without blocking delivery.
  • Reduce incident impact with strong auditability and recovery paths.

Who this is for

  • MLOps and ML platform engineers shipping models to production.
  • Data/ML engineers handling pipelines with sensitive data.
  • Team leads who need practical, auditable controls.

Prerequisites

  • Basic Python and CLI skills.
  • Familiarity with containers and CI/CD.
  • Basic understanding of cloud IAM and Kubernetes is helpful.

Learning path

1) Foundations: identities, access, and secrets
  1. Map system components: users, services, data stores, model registry, CI/CD, inference/training clusters.
  2. Apply least-privilege IAM roles to pipelines and services.
  3. Move credentials to a vault or secrets manager; remove from code and config.
2) Data protection: PII handling and encryption
  1. Identify PII fields and set redaction/masking rules at ingestion.
  2. Encrypt data at rest and in transit; enforce TLS everywhere.
  3. Set retention and deletion policies for training artifacts and logs.
3) Network isolation and safe connectivity
  1. Place workloads in private networks; restrict egress by allow-lists.
  2. Use service-to-service authentication (mTLS or workload identity).
  3. Expose only required endpoints, with rate limits and WAF where possible.
4) Auditability and governance
  1. Emit structured audit logs for data access, model pushes, and inference calls.
  2. Protect logs with append-only storage and lifecycle policies.
  3. Create runbooks: access reviews, key rotation, incident response.
5) Secure deployment and model risk management
  1. Sign and verify images/artifacts; scan dependencies.
  2. Gate releases with approvals and automated checks.
  3. Document model risks, monitoring plans, and rollback steps.

Worked examples

Example 1 — Least-privilege IAM for training to read a specific dataset

Grant a training job read-only access to a single bucket/prefix, nothing else.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ReadOnlyDataset",
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::ml-datasets",
        "arn:aws:s3:::ml-datasets/customer-churn/*"
      ],
      "Condition": {
        "StringLike": {"s3:prefix": ["customer-churn/*"]}
      }
    }
  ]
}

Attach this role to the training job’s compute. Avoid wildcard actions and unrestricted resources.

Example 2 — PII redaction in a Python preprocessing step

Remove emails and phone numbers before writing to disk or sending to downstream stages.

import re

def redact_pii(text: str) -> str:
    email_pattern = r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+"
    phone_pattern = r"(\+?\d{1,3}[\s-]?)?(\(?\d{3}\)?[\s-]?)\d{3}[\s-]?\d{4}"
    text = re.sub(email_pattern, "[REDACTED_EMAIL]", text)
    text = re.sub(phone_pattern, "[REDACTED_PHONE]", text)
    return text

records = [
    {"id": 1, "note": "Contact jane.doe@example.com or +1-212-555-1212"},
]

clean = [{**r, "note": redact_pii(r["note"])} for r in records]
print(clean)

Redaction should happen as early as possible. Keep a tested ruleset and unit tests for PII patterns.

Example 3 — Store secrets outside code (Kubernetes + env)

Create a Kubernetes Secret and mount it as an environment variable. Do not commit the values to Git.

# k8s secret (values are base64-encoded)
apiVersion: v1
kind: Secret
metadata:
  name: model-api-secrets
  namespace: prod
type: Opaque
data:
  TOKEN: c3VwZXJfc2VjcmV0X3Rva2Vu
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inference
  namespace: prod
spec:
  replicas: 2
  selector: {matchLabels: {app: inference}}
  template:
    metadata: {labels: {app: inference}}
    spec:
      containers:
        - name: app
          image: ghcr.io/org/inference:1.2.3
          env:
            - name: THIRD_PARTY_TOKEN
              valueFrom:
                secretKeyRef:
                  name: model-api-secrets
                  key: TOKEN

Use a secrets manager and automated rotation. Limit who can read the Secret and audit all access.

Example 4 — NetworkPolicy to restrict egress

Deny all egress by default and allow only the model registry and metrics endpoint.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: inference-egress
  namespace: prod
spec:
  podSelector:
    matchLabels:
      app: inference
  policyTypes: [Egress]
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              ns: platform
          podSelector:
            matchLabels:
              app: model-registry
      ports:
        - protocol: TCP
          port: 8443
    - to:
        - ipBlock:
            cidr: 10.10.20.0/24
      ports:
        - protocol: TCP
          port: 9090

Pair this with DNS allow-lists or private endpoints to avoid accidental data exfiltration.

Example 5 — Structured audit logging in Python

Write structured, rotating logs for traceability.

import json
import logging
from logging.handlers import RotatingFileHandler

logger = logging.getLogger("audit")
logger.setLevel(logging.INFO)
handler = RotatingFileHandler("/var/log/inference_audit.log", maxBytes=10_000_000, backupCount=5)
formatter = logging.Formatter('%(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)

def audit(event_type, **kwargs):
    logger.info(json.dumps({"event": event_type, **kwargs}))

# Example usage
audit("model_loaded", model="churn-v3", sha256="abc123")
audit("predict", user="svc-inference", request_id="r-789", status=200)

Ship logs to an append-only store. Include request IDs, actor, model version, and hashes.

Example 6 — Verify signed model image in CI

Block deployments if signature verification fails.

# Pseudocode shell steps in CI
cosign verify --key cosign.pub ghcr.io/org/inference:1.2.3
if [ $? -ne 0 ]; then
  echo "Signature verification failed" >&2
  exit 1
fi

# Continue only after verify
trivy image --exit-code 1 ghcr.io/org/inference:1.2.3

Combine signature verification with image scanning to reduce supply chain risk.

Drills and quick exercises

  • Create a role with read access to one dataset path and verify it cannot read others.
  • Write a unit test suite for your PII redaction function with 10+ edge cases.
  • Rotate one API token and prove zero downtime during rotation.
  • Enable deny-by-default egress for a test namespace and allowlist only two endpoints.
  • Emit structured audit logs for model load and predict; include request IDs.
  • Sign a container image and enforce verification in CI before deploy.

Common mistakes and debugging tips

  • Overly broad IAM policies: Start with read-only and specific resources; use access advisor or logs to refine.
  • Secrets in env for long periods: Rotate regularly; prefer short-lived credentials.
  • Redaction too late in the pipeline: Redact before writing to disk or emitting logs.
  • Open egress: Deny by default and explicitly allow required hosts/ports.
  • Missing context in logs: Always log who/what/when/version/request-id.
  • Skipping verification on hotfixes: Automate signature checks so they cannot be bypassed.
Debugging tips
  • Permissions: Use cloud access logs to identify the exact denied action and resource.
  • Secrets: Confirm mount paths/keys in pod; test with a minimal pod and env print.
  • Network: Use a test pod to run curl/dig from inside the namespace to verify policies.
  • Logging: Validate JSON schema with a linter before shipping; ensure time is synchronized (NTP).

Mini project: Secure ML inference service

Goal: Deploy a small inference API with end-to-end controls.

  1. Create a minimal model server (mock predict OK).
  2. Store one external API key in a secrets manager; mount it at runtime.
  3. Configure IAM so the service can only read its model from a single bucket/path.
  4. Apply a NetworkPolicy to allow registry and metrics only.
  5. Emit structured audit logs for startup, model load, and predict.
  6. Sign the image and enforce signature verification in CI.
  7. Document risks (e.g., data exfiltration, PII in logs) and controls; require one approval to deploy.
Acceptance checklist
  • Secrets never appear in code, logs, or images.
  • Denied egress traffic is visible in network logs.
  • Audit logs include model hash and request IDs.
  • CI fails if signature or scan fails.
  • A reviewer can verify IAM and network constraints from manifests/policies.

Subskills

  • Access Control And IAM Basics — Design least-privilege roles for ML pipelines; scope permissions to exact resources; plan rotation.
  • PII Handling And Redaction — Identify PII and redact/mask early; set retention and deletion policies.
  • Secure Secrets Storage — Keep secrets in a vault or secrets manager; automate rotation and limited blast radius.
  • Network Isolation Basics — Private networking, deny-by-default egress, allow-lists, and service identity.
  • Audit Logs And Governance — Structured, append-only logs; reviews, runbooks, and lifecycle policies.
  • Model Risk Management Basics — Document risks/controls, define approval gates, monitor for drift/incidents.
  • Secure Deployment Practices — Sign and verify artifacts, scan images, and use progressive rollouts.

Next steps

  • Automate policy checks in CI to block insecure changes.
  • Add runtime security (e.g., minimal base images, read-only root FS).
  • Expand monitoring to include data drift, security events, and anomaly alerts.

Security And Compliance For ML — Skill Exam

This exam checks your practical understanding of Security and Compliance for ML. No time limit. You can retake it anytime. Everyone can take the exam; if you are logged in, your progress and best score are saved.Score 70% or higher to pass.

12 questions70% to pass

Have questions about Security And Compliance For ML?

AI Assistant

Ask questions about this tool