luvv to helpDiscover the Best Free Online Tools
Topic 3 of 8

Job Scheduling Basics

Learn Job Scheduling Basics for free with explanations, exercises, and a quick test (for ETL Developer).

Published: January 11, 2026 | Updated: January 11, 2026

Who this is for

  • ETL Developers and Data Engineers who need reliable, predictable pipelines.
  • Analysts transitioning to building scheduled data jobs.
  • Anyone working with cron, orchestrators (e.g., Airflow, Prefect), or cloud schedulers.

Prerequisites

  • Basic command-line familiarity.
  • Understanding of ETL/ELT steps and batch vs. streaming.
  • Comfort with reading logs and simple error messages.

Why this matters

In real teams, data must arrive on time. Finance needs daily revenue by 7:00, marketing wants hourly user events, and machine learning retrains weekly at low traffic hours. Scheduling turns your code into dependable operations: it defines when to run, what runs first, how failures retry, and how to avoid overlaps.

  • You’ll coordinate dependencies (e.g., load sales only after raw files arrive).
  • You’ll handle time zones and daylight saving changes without surprises.
  • You’ll control retries, timeouts, concurrency, and alerts to keep SLAs green.

Concept explained simply

A scheduler decides when a job should start. An orchestrator coordinates many jobs with dependencies and rules. Think of the scheduler as the clock and the orchestrator as the conductor.

Mental model

Imagine a train schedule:

  • Time tables = cron expressions or intervals (when).
  • Switches = dependencies (only go if the previous train arrived).
  • Signals = concurrency limits (prevent two trains on the same track).
  • Delays and re-routing = retries and backoff.
  • Control room = monitoring, SLAs, and alerts.

Core concepts and terminology

  • Time-based scheduling: cron expressions (minute hour day-of-month month day-of-week). Example: 15 2 * * * means 02:15 every day.
  • Intervals: run every N minutes/hours (e.g., every 15 minutes).
  • Time zones & DST: prefer UTC for schedules; convert timestamps at the edges. Daylight saving can cause skips or double-runs.
  • Dependencies: ensure upstream jobs or data partitions exist before running. DAGs represent these relationships.
  • Retries & backoff: automatic re-attempts with delays (e.g., 3 retries, exponential backoff).
  • Concurrency & locking: limit parallel runs (e.g., only 1 active run) to avoid overlaps.
  • Timeouts & SLAs: fail or alert if a job exceeds a runtime limit; track if data is late.
  • Idempotency: running the same task twice yields the same final state (safe retries and backfills).
  • Calendars & holidays: pause/skip on defined calendars (e.g., business days only).
  • Event-driven triggers: start when an event happens (file arrival, message), often combined with time-based safety windows.
  • Monitoring & alerting: logs, metrics, and notifications on failure/latency.

Worked examples

Example 1: Nightly sales ETL with file arrival

  • Goal: Load sales daily at 02:15 after vendor file arrives (~01:30).
  • Schedule: 15 2 * * * (use UTC if possible, e.g., 02:15 UTC).
  • Dependency: Wait for file sensor until 02:45; if not present, retry sensor every 5 minutes for 1 hour.
  • Timeout: Total job timeout 90 minutes.
  • Alert: Page if not finished by 04:00 (SLA breach).
Why it works

Fixed time + sensor ensures you don’t load partial data. SLA gives a clear “late” signal.

Example 2: Hourly incremental pipeline with backfill safety

  • Goal: Ingest user events hourly at minute 10, process last closed hour.
  • Schedule: 10 * * * *.
  • Windowing: Each run reads [H-1, H) based on the run time, not system time.
  • Concurrency: Limit to 1 active run to avoid overlapping windows.
  • Retries: 5 attempts, exponential backoff starting at 2 minutes.
  • Idempotency: Upsert into target partition for hour H-1; reruns are safe.
Why it works

Aligning processing to closed time windows prevents double-counting and enables safe retries/backfills.

Example 3: Weekly full refresh skipping maintenance

  • Goal: Full refresh Sunday 04:00, but skip planned maintenance Sundays of month 1 and 7.
  • Schedule: 0 4 * * 0 (every Sunday 04:00), plus a calendar that excludes maintenance days; or move to Monday 04:00 on those dates.
  • Timeout: 4 hours; alerts on overrun.
  • DST: Keep in UTC to avoid shifting window.
Why it works

Explicit calendar logic prevents conflicts with maintenance. UTC avoids DST surprises.

Practical steps: design a reliable schedule

  1. Define freshness: when must data be ready? Set SLA (e.g., by 07:00 daily).
  2. Choose trigger: time-based (cron/interval) and/or event-based (file arrived). Add a max wait.
  3. Pick time zone: default to UTC. If business-hour specific, convert at the edges.
  4. Define windowing: process a closed interval (e.g., previous hour/day).
  5. Set safety controls: retries with backoff, timeouts, concurrency=1 for non-idempotent steps.
  6. Plan failures: clear alerts, rerun strategy, and idempotent writes.
  7. Document: cron, dependencies, retry policy, SLA, and runbook for on-call.

Exercises

These mirror the exercises in the Exercises panel below.

Exercise 1: Turn a requirement into a schedule

Requirement: “Load finance KPIs at 06:05 Monday–Friday. Skip public holidays. If a run fails, retry up to 3 times with 10-minute gaps. Alert if runtime exceeds 45 minutes. Avoid overlapping runs.”

  • Deliverables:
    • Cron expression (assume UTC).
    • Concurrency and retry policy.
    • SLA/timeout settings.
    • Holiday handling approach.
Hints
  • Cron fields order: minute hour day-of-month month day-of-week.
  • Concurrency 1 prevents overlaps.
  • Use a business-day calendar or skip logic.
Suggested solution

Cron: 5 6 * * 1-5 (06:05 Mon–Fri). Concurrency: 1. Retries: 3 with 10m delay. Timeout: 45m; SLA: ready by 07:00. Holiday skip: maintain a holiday calendar or a pre-check step that exits cleanly if holiday.

Exercise 2: Fix a double-run issue

Symptom: Your job “daily_orders” ran twice around DST fall-back. The second run overwrote data.

  • Task: Propose changes to prevent double-runs and protect data.
Hints
  • Consider UTC schedule and idempotency.
  • Set single active run and partitioned writes.
Suggested solution

Move schedule to UTC. Enforce concurrency=1. Write to date-partitioned targets with upsert/replace, keyed by logical execution date, not wall-clock, so a repeated run updates the same partition deterministically. Add a uniqueness lock or job run key to avoid duplicate triggers.

Checklist: ready to schedule

  • Schedule defined in UTC (or documented local with DST plan).
  • Dependency checks (file/table sensors) with max wait and alerts.
  • Windowing logic tied to run time (execution date).
  • Retries with backoff and clear max attempts.
  • Timeout per task and overall job SLA.
  • Concurrency/locking to avoid overlaps.
  • Idempotent writes and backfill plan.
  • Monitoring: alerts on failure and lateness.

Common mistakes and how to self-check

  • DST surprises: Local time schedules cause skips/doubles. Self-check: Does your job ever run twice or not at all on DST change? Fix: Use UTC or explicit DST handling.
  • Overlapping runs: No locking. Self-check: Any two runs of the same job at once? Fix: Set concurrency=1 and use run keys.
  • Non-idempotent writes: Appends duplicate rows. Self-check: Rerun the same execution date; do results change? Fix: Upsert/replace by partition/key.
  • Missing dependencies: Processing before data lands. Self-check: Add sensors and max wait with alerts.
  • No timeouts: Jobs hang forever. Self-check: Do any runs exceed historical p95 runtime without failing? Add timeouts.

Practical projects

  • Project A: Build an hourly ingestion job that processes [H-1, H) with retries and concurrency=1; validate idempotency by rerunning the same hour.
  • Project B: Create a daily DAG with a file sensor, a transform, and a load step, with an SLA alert if not done by 08:00.
  • Project C: Implement a holiday-aware weekly job that skips on a given calendar but backfills the next business day.

Mini challenge

You inherit a pipeline that runs “0 0 * * *” local time and often double-loads on DST. In one paragraph, describe a migration plan to UTC with minimal downtime, including validation steps and a rollback plan.

Learning path

  • Start: Job Scheduling Basics (this page).
  • Next: Dependencies and Sensors in Orchestrators.
  • Then: Retries, Backoff, and Idempotency Patterns.
  • Advanced: SLAs, Observability, and On-call Runbooks.

Next steps

  • Apply the checklist to one of your existing jobs.
  • Configure alerts and timeouts for a critical pipeline.
  • Run a controlled backfill to validate idempotency.

Quick Test and progress

Take the Quick Test below to check your understanding. Available to everyone; only logged-in users will have test progress saved automatically.

Practice Exercises

2 exercises to complete

Instructions

Requirement: “Load finance KPIs at 06:05 Monday–Friday. Skip public holidays. If a run fails, retry up to 3 times with 10-minute gaps. Alert if runtime exceeds 45 minutes. Avoid overlapping runs.” Assume UTC.

  • Provide a cron expression.
  • Define concurrency and retry policy.
  • Set timeout/SLA values.
  • Describe how to implement holiday skipping.
Expected Output
A cron string, a clear concurrency=1 policy, retries=3 with 10m delay, runtime timeout 45m, SLA by 07:00, and a holiday skip mechanism (calendar or pre-check).

Job Scheduling Basics — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Job Scheduling Basics?

AI Assistant

Ask questions about this tool