luvv to helpDiscover the Best Free Online Tools
Topic 4 of 7

Parameterized Runs

Learn Parameterized Runs for free with explanations, exercises, and a quick test (for MLOps Engineer).

Published: January 4, 2026 | Updated: January 4, 2026

Who this is for

  • MLOps Engineers who deploy the same pipeline across datasets, environments, or model versions.
  • Data Scientists who want reproducible experiments with different hyperparameters.
  • Data/Platform Engineers standardizing pipelines across teams.

Prerequisites

  • Basic Python and familiarity with at least one workflow engine (e.g., Airflow, Prefect, Dagster, Kubeflow).
  • Comfort with YAML/JSON for configs.
  • Understanding of environment separation (dev/stage/prod) and basic secrets handling.

Why this matters

In real MLOps work, you rarely write a brand-new pipeline for each case. Instead, you reuse one pipeline and pass parameters to change inputs, dates, hyperparameters, environments, or output locations. Parameterized runs:

  • Eliminate copy-paste pipelines and reduce maintenance.
  • Enable safe promotion from dev to prod by flipping parameters (e.g., connections, buckets, feature toggles).
  • Power backfills, A/B evaluations, and hyperparameter sweeps.
  • Improve reproducibility: every run records the exact parameters that produced the result.

Concept explained simply

A parameterized run is a single pipeline that behaves differently based on inputs you pass at runtime, like dataset=orders, date=2025-12-01, or n_estimators=200. The code stays the same; the behavior changes.

Mental model

  • Inputs: parameters come from CLI, UI, API, schedules, or files.
  • Defaults: reasonable defaults make ad-hoc runs easy.
  • Validation: reject bad parameters early (types, ranges, enums).
  • Idempotency: same parameters → same outputs. Include parameters in artifact paths and run IDs.
  • Lineage: log parameters with the run so results are traceable.
  • Security: secrets are not regular parameters—use secret stores/engine-native secret management.

Core patterns you should know

Pattern 1 — Typed schema and validation

Define expected types, enums, and ranges. Fail fast with meaningful errors if parameters are invalid.

Pattern 2 — Idempotent run IDs and artifact names

Derive a run_id from a stable hash of parameters. Include parameters in output paths like s3://bucket/model={model}/date={date}/.

Pattern 3 — Fan-out (matrix) runs

Generate multiple child runs for a grid of parameters (e.g., hyperparameter sweep). Keep per-child run IDs and aggregate results at the end.

Pattern 4 — Parameter sources
  • CLI flags (local dev), UI forms (ad-hoc), API (automation), schedule (recurring defaulted params), and config files.
  • Never hardcode environment-specific values in code; pass them as parameters or resolve via environment configs.

Worked examples

Example 1 — Airflow DAG with params and dag_run.conf

from airflow.decorators import dag, task
from airflow.models.param import Param
from datetime import datetime
import hashlib

@dag(
    schedule=None,
    start_date=datetime(2024,1,1),
    catchup=False,
    params={
        "dataset": Param("orders", enum=["orders", "customers"]),
        "as_of": Param("2025-12-01", type="string"),
        "model_version": Param("v1", type="string")
    }
)
def train_pipeline():
    @task
    def resolve_params(**context):
        conf = (context["dag_run"].conf or {})
        p = {k: conf.get(k, context["params"][k]) for k in ["dataset", "as_of", "model_version"]}
        rid = hashlib.sha1(str(sorted(p.items())).encode()).hexdigest()[:10]
        return {**p, "run_id": rid}

    @task
    def load_data(p):
        print(f"Loading {p['dataset']} as_of={p['as_of']}")
        return {"rows": 123}

    @task
    def train(p, data):
        path = f"s3://ml/artifacts/{p['model_version']}/dataset={p['dataset']}/date={p['as_of']}/run={p['run_id']}" 
        print(f"Training and saving to {path}")
        return {"model_path": path}

    p = resolve_params()
    d = load_data(p)
    _ = train(p, d)

pipeline = train_pipeline()

Trigger via UI or API; pass overrides in dag_run.conf to reuse the same DAG.

Example 2 — Prefect flow with deployment parameters

from prefect import flow, task
from datetime import date
import hashlib

@task
def load(dataset: str, as_of: str):
    print(f"Load {dataset} as_of={as_of}")
    return [1,2,3]

@task
def train(model_version: str, data):
    print(f"Train {model_version} on {len(data)} rows")

@flow
def ml_pipeline(dataset: str = "orders", as_of: str | None = None, model_version: str = "v1"):
    as_of = as_of or date.today().isoformat()
    rid = hashlib.sha1(f"{dataset}-{as_of}-{model_version}".encode()).hexdigest()[:8]
    data = load(dataset, as_of)
    train(model_version, data)
    print(f"run_id={rid}")

if __name__ == "__main__":
    ml_pipeline(dataset="customers", as_of="2025-12-01", model_version="v2")

In a Prefect Deployment, set default parameters per environment; override at run time via UI/API.

Example 3 — Dagster job with config and partitioned runs

from dagster import job, op, Config, define_asset_job

class TrainConfig(Config):
    dataset: str = "orders"
    as_of: str = "2025-12-01"
    model_version: str = "v1"

@op
def load_op(cfg: TrainConfig):
    print(f"Load {cfg.dataset} @ {cfg.as_of}")
    return {"rows": 111}

@op
def train_op(cfg: TrainConfig, data):
    print(f"Train {cfg.model_version} on {data['rows']} rows")

@job
def train_job():
    d = load_op()
    train_op(d)

# Provide run config at execution time:
# run_config = {"ops": {"load_op": {"config": {"dataset": "customers", "as_of": "2025-12-02", "model_version": "v2"}}}} 
# dagster job execute with run_config to parameterize the run.

Use partitioned configs for backfills (e.g., per-date partitions) to fan out runs automatically.

Set up your first parameterized run (step-by-step)

  1. List your variable inputs: data source, date, environment, hyperparameters, output paths.
  2. Define defaults and a validation schema (types, ranges, enums).
  3. Propagate parameters to every task that needs them—avoid global state.
  4. Derive an idempotent run_id from parameters; include it in artifact paths.
  5. Log parameters in your run metadata and in model cards/metrics.
  6. Test with both defaults and overrides (CLI/UI/API).

Common mistakes and how to self-check

  • Missing validation: Add clear errors when values are out of range or wrong type.
  • Implicit globals: Ensure tasks read from passed parameters, not module-level variables.
  • Non-idempotent outputs: Include parameters in output paths; rerunning should not overwrite unrelated artifacts.
  • Leaking secrets: Pass secret references via your engine's secret store; never as plain-text parameters.
  • Poor lineage: Always log the full parameter set with the run results.

Self-check: Given the same parameters, do you get the same outputs and paths? Can you reconstruct which parameters produced a model from your metadata alone?

Hands-on exercises

Mirror of the exercises below. Do them in your preferred orchestration engine.

Exercise 1: Single-run parameterization

  • Create a pipeline with parameters: dataset (orders/customers), as_of (YYYY-MM-DD), model_version (v1/v2).
  • Print all parameters and create a run_id from them (e.g., short hash).
  • Write outputs to a path that includes model_version, dataset, as_of, and run_id.

Exercise 2: Validation and idempotency

  • Add validation: dataset must be one of [orders, customers], as_of must parse as a date, model_version not empty.
  • Fail fast with a clear message if invalid.
  • Prove idempotency: running the same parameters twice yields the same run_id and same artifact path.

Checklist

  • Parameters have defaults and validation.
  • Artifacts include parameters and run_id in their paths.
  • Runs log full parameter sets in metadata.
  • No secrets are passed as plain parameters.

Practical projects

  • Backfill project: Fan-out runs for the last 14 days using a date parameter; aggregate success metrics at the end.
  • Hyperparameter sweep: Run grid search over two parameters and produce a leaderboard artifact.
  • Blue/green deploy: Use an environment parameter to write to separate buckets and compare model KPIs.

Learning path

  1. Define a simple pipeline with hardcoded values.
  2. Introduce parameters with defaults and validation.
  3. Add idempotent run IDs and lineage logging.
  4. Implement fan-out (matrix) runs and result aggregation.
  5. Parameterize environment-specific configs (dev/stage/prod).

Next steps

  • Integrate parameters with your experiment tracker to log configs alongside metrics.
  • Create a library module to standardize parameter parsing, validation, and run_id generation across pipelines.
  • Automate backfills via schedules that set default parameter windows.

Mini challenge

Design a parameter set to safely promote a model from staging to production without code changes. Include parameters for input source, output path, model_version, and any feature flag needed. Describe how you would validate, log, and derive run_id.

Quick test

Available to everyone. Only logged-in users have their progress saved.

Practice Exercises

2 exercises to complete

Instructions

Create a pipeline with parameters: dataset in [orders, customers], as_of (YYYY-MM-DD), model_version (string). Print the parameters, compute a short hash run_id from them, and save a dummy artifact to a path like .../model_version=.../dataset=.../as_of=.../run=....

  • Provide sensible defaults.
  • Allow overriding via your engine's UI/CLI/API.
Expected Output
A successful run that logs parameters, prints a stable run_id for the same inputs, and writes an artifact to a parameterized path.

Parameterized Runs — Quick Test

Test your knowledge with 6 questions. Pass with 70% or higher.

6 questions70% to pass

Have questions about Parameterized Runs?

AI Assistant

Ask questions about this tool