Why this matters
As a Machine Learning Engineer, broken workflows, misconfigured Dockerfiles, and subtle Python issues can waste GPU time and delay releases. Pipeline linting and static checks catch problems before they reach runners or clusters, keeping builds fast, reliable, and secure.
- Prevent failed runs due to YAML typos or bad conditionals.
- Enforce consistent Python style, imports, and types in Airflow/Kubeflow/TFX code.
- Harden Docker images and avoid insecure code patterns.
- Gate pull requests so only healthy pipelines reach main.
Real tasks you will do
- Add pre-commit hooks for ruff, black, isort, mypy, bandit, yamllint, actionlint, hadolint.
- Set CI jobs to fail on linter errors and auto-format diffs.
- Validate Kubeflow/Airflow DAG structure and Python types.
- Block secrets and unsafe shell in code and workflows.
Concept explained simply
Linting and static checks are automated reviews of your code and config without running the pipeline. They flag style problems, errors, and risky patterns early.
Mental model
Think of your repo as a factory. Linters are the gatekeepers at each door:
- Door 1 (YAML): yamllint/actionlint ensure your CI plan is valid and safe.
- Door 2 (Python): ruff/black/isort/mypy/pylint keep DAGs and pipeline code clean and correct.
- Door 3 (Security): bandit/secret scanners warn about unsafe practices.
- Door 4 (Containers): hadolint keeps Docker images reproducible and small.
Pass all doors locally (pre-commit) and again in CI so nothing slips through.
Common tools you can use
- Python: ruff, black, isort, mypy, pylint, bandit, nbQA (for notebooks)
- YAML/CI: yamllint, actionlint (GitHub Actions), kubeval/kubeconform (Kubernetes YAML)
- Containers: hadolint (Dockerfile)
- General: pre-commit (runs linters locally and in CI), git-secrets or similar secret scanners
Worked examples
Example 1: GitHub Actions workflow linting (yamllint + actionlint)
Problem: A workflow fails to start because of indentation and invalid if syntax.
# .github/workflows/ci.yml (broken)
name: CI
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
if: ${{ github.event_name == 'pull_request' && startsWith(github.head_ref, 'feat/') }}
steps:
- uses: actions/checkout@v4
- run: echo "Testing"
What happens:
- yamllint flags indentation (jobs.test, steps).
- actionlint flags the if expression syntax/contexts.
Fix:
name: CI
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
if: ${{ github.event_name == 'pull_request' && startsWith(github.head_ref, 'feat/') }}
steps:
- uses: actions/checkout@v4
- run: echo "Testing"
Example 2: Airflow/Kubeflow Python linting (ruff + mypy + bandit)
Problem: Unused imports, dynamic types, and unsafe shell usage.
# pipeline.py (before)
import os, sys
from datetime import datetime
import subprocess
START = datetime.now()
def run_step(cmd):
return subprocess.check_output(cmd, shell=True)
result = run_step(["python", "train.py"]) # type is unknown
- ruff: unused imports (os, sys), import order, formatting.
- mypy: type of result unknown; function signature ambiguous.
- bandit: shell=True is unsafe.
Fix:
from __future__ import annotations
from datetime import datetime
import subprocess
from typing import List
START = datetime.now()
def run_step(cmd: List[str]) -> bytes:
# Avoid shell=True; pass a list
return subprocess.check_output(cmd)
result: bytes = run_step(["python", "train.py"])
Example 3: Dockerfile hardening (hadolint)
Problem: Large, non-reproducible image.
# Dockerfile (before)
FROM python:3.11
RUN pip install -U pip
RUN pip install numpy pandas scikit-learn
COPY . /app
WORKDIR /app
CMD python train.py
- hadolint: pin versions, combine RUN, use non-root, avoid latest base if possible.
Fix:
FROM python:3.11-slim
ENV PIP_NO_CACHE_DIR=1
RUN adduser --disabled-password --gecos '' appuser \
&& pip install --no-cache-dir --upgrade pip \
&& pip install --no-cache-dir numpy==1.26.4 pandas==2.1.4 scikit-learn==1.3.2
WORKDIR /app
COPY . /app
USER appuser
CMD ["python", "train.py"]
Setup in minutes (pre-commit + CI)
Add a pre-commit configuration at the repo root:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.6.9
hooks:
- id: ruff
args: ["--fix"]
- id: ruff-format
- repo: https://github.com/psf/black
rev: 24.10.0
hooks:
- id: black
- repo: https://github.com/pycqa/isort
rev: 5.13.2
hooks:
- id: isort
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.11.2
hooks:
- id: mypy
additional_dependencies: ["types-requests"]
- repo: https://github.com/PyCQA/bandit
rev: 1.7.8
hooks:
- id: bandit
args: ["-ll", "-r", "."]
- repo: https://github.com/adrienverge/yamllint
rev: v1.35.1
hooks:
- id: yamllint
- repo: https://github.com/rhysd/actionlint
rev: v1.7.1
hooks:
- id: actionlint
- repo: https://github.com/hadolint/hadolint
rev: v2.12.0
hooks:
- id: hadolint
Initialize locally:
pip install pre-commit
pre-commit install
pre-commit run --all-files
Tip: run in CI
Add a job that installs and runs pre-commit on all files. Fail the build if any hook fails. This ensures pull requests are gated by the same checks as on your laptop.
Exercises
Do these hands-on tasks. They mirror the exercises below so you can check your work.
Exercise 1 — Wire up pre-commit linters for an ML pipeline repo
Create a .pre-commit-config.yaml with Python, YAML, CI, and Docker linters; run them; fix issues until all pass.
What to include
- ruff (with --fix), black, isort
- mypy (with at least one typed function)
- bandit (security)
- yamllint + actionlint (CI YAML)
- hadolint (Dockerfile)
Exercise 2 — Fix a broken workflow and unsafe Python step
Given the snippets below, list the linters that would fail and fix them.
Snippets
# .github/workflows/build.yml (broken)
name: Build
on: push
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: echo ${{ secrets.MY_KEY }}
# step.py (broken)
import os
import subprocess
def run(cmd):
return subprocess.check_output(cmd, shell=True)
run(["python", "script.py"])
- Which rules fire (by tool)?
- Provide corrected files.
Checklist
- [ ] All hooks added to .pre-commit-config.yaml
- [ ] pre-commit run --all-files returns no failures
- [ ] mypy runs on at least one module and enforces types
- [ ] actionlint passes for all workflows
- [ ] hadolint passes for Dockerfile
- [ ] bandit reports no high-severity issues
Common mistakes and self-check
- Only running linters in CI: install pre-commit locally so issues are fixed before commits.
- Letting formatters and linters disagree: order hooks as formatters (black/isort) then ruff; configure rules to avoid conflicts.
- Ignoring type errors: add minimal type hints to key functions (data loaders, preprocessors, trainers) and enable mypy incremental adoption.
- Using shell=True carelessly: prefer list args; if needed, validate inputs.
- Unpinned dependencies in Dockerfile: pin versions to keep builds reproducible.
- YAML secrets echo: never print secrets in workflows; use them only as inputs to secure actions.
Self-check routine (5 minutes)
- Run pre-commit on all files.
- Scan the diff to ensure formatters changed only whitespace/style.
- Fix the first error class across the repo; repeat.
- Re-run until green, then push and verify CI matches local results.
Who this is for
- Machine Learning Engineers owning training/inference pipelines.
- Data Scientists contributing code to productionized workflows.
- MLOps engineers maintaining CI/CD and cluster deployments.
Prerequisites
- Basic Git and pull request workflow.
- Python project structure familiarity.
- Comfort with YAML and Dockerfiles.
Learning path
- Set up pre-commit and run ruff/black/isort locally.
- Add mypy and type the top-level pipeline functions.
- Enable yamllint and actionlint for workflows.
- Harden Dockerfile with hadolint; pin versions.
- Add bandit and a secret-scanning hook.
- Run the same pre-commit in CI on pull requests.
Practical projects
- Project 1: Retrofit an existing ML repo with full linting, making CI fail on violations.
- Project 2: Build a minimal Kubeflow or Airflow pipeline template with typed components and enforced style.
- Project 3: Create a secure training Docker image with pinned versions, non-root user, and passing hadolint.
Next steps
- Expand rules over time (more strict ruff, mypy settings).
- Introduce notebook checks via nbQA for tutorial notebooks.
- Add Kubernetes manifest validation if deploying jobs to a cluster.
Mini challenge
Take a small repo with one workflow, one Dockerfile, and a simple pipeline.py. In under 30 minutes, integrate pre-commit hooks, make them pass locally, and add a CI job that runs pre-commit on all files. Aim for zero warnings.
Hint
Start with formatters (black/isort), then ruff --fix, then mypy on one module, then YAML and Docker linters, and finally bandit.
Quick test is available to everyone; only logged-in users get saved progress.