Why this matters
Analytics engineering touches live dashboards, executive metrics, and ML features. A safe branching strategy prevents broken dashboards, protects production data, and speeds up reviews. You will use branches to ship dbt models, SQL transforms, Airflow DAG updates, and backfills with low risk.
- Real tasks: add a new metric model; fix a failing transformation; rename a column used by BI; run a backfill safely.
- Goal: short-lived branches, reliable promotion to staging and production, and quick hotfixes when metrics break.
Concept explained simply
A branching strategy is a small set of rules for how you create, name, test, review, and merge branches. For data work, we pair branches with environments (dev → staging → prod) and data-safety steps (tests, backfills, rollbacks).
Mental model
- Main (prod code) is always releasable. Think of it as the recipe your production warehouse follows.
- Feature branches are playgrounds. You run models in your dev schema, test, and open a PR.
- Staging is a dress rehearsal. CI runs tests; optional staging schema mirrors prod to catch surprises.
- Merge to main triggers prod deployment. Tag a release after success.
Recommended branching model for data teams
Use trunk-based with short-lived feature branches, plus an emergency hotfix path.
- main: production-ready code only.
- feat/*: new models, metrics, or DAGs.
- fix/*: bug fixes not impacting prod urgently.
- hotfix/*: urgent production issues branched from main.
- ops/*: operational tasks like backfills/migrations.
Alternative note: GitFlow vs trunk-based
GitFlow adds long-lived develop and release branches. It can slow small teams. Most data teams move faster with trunk-based + staging environment + CI tests. If you have strict change windows, add temporary release/* branches for risky migrations.
Branch naming conventions
- feat/model-new_revenue_metric
- fix/test-failing_unique_order_ids
- hotfix/model-dup_rows_in_customers
- ops/backfill-orders_2023_q4
Use lowercase, hyphens for spaces, and prefixes to signal intent.
Environment mapping for data
- Dev: run changes in your personal schema (e.g., schema: username_dev). Example: dbt run --target dev
- Staging: shared schema mirroring prod sources. CI runs here on PRs.
- Prod: only from main after PR approval and checks passing.
Data-specific branching rules
- Migrations (renames, type changes): use ops/* branch, include reversible steps, and deploy behind compatibility windows.
- Backfills: isolate in ops/backfill-*; run with guardrails (date filters, dry-run, row counts).
- Snapshots and seeds: call them out in PR description; list impacts and rollback plan.
- Tests: add schema + data tests before merge (unique, not_null, accepted_values).
Worked examples
Example 1: Add a new metric model
- Create branch: git checkout -b feat/model-new_revenue_metric
- Build in dev: add models/mart_revenue.sql and tests. Run: dbt run --select mart_revenue --target dev; dbt test --select mart_revenue --target dev
- Open PR: describe inputs, outputs, tests, and impact on dashboards.
- Staging checks: CI runs models + tests in staging schema.
- Merge and deploy: after approval, merge to main; CI deploys to prod; tag release v2025.01.10.
Command bundle
git checkout -b feat/model-new_revenue_metric dbt run --select mart_revenue --target dev dbt test --select mart_revenue --target dev git add . && git commit -m "feat: add mart_revenue with tests" git push -u origin feat/model-new_revenue_metric
Example 2: Hotfix a failing dashboard
- Start from main: git checkout main && git pull
- Create hotfix branch: git checkout -b hotfix/model-dup_rows_in_customers
- Fix in dev: adjust join keys; run impacted models only: dbt run --select +customers --target dev; dbt test --select customers --target dev
- Open small PR with clear before/after row counts.
- Merge as priority; CI deploys to prod; tag v2025.01.10-hotfix1.
Safety checklist
- Row counts stable within expected tolerance
- Duplicates eliminated (unique test passes)
- Downstream exposures checked
Example 3: Column rename with backfill
- Plan: introduce new column alongside old, populate both temporarily.
- Branch: git checkout -b ops/migration-rename_gmv_to_gross_revenue
- Dev: write transform to compute gross_revenue; keep gmv for compatibility.
- Staging: PR runs tests; BI updates point to gross_revenue.
- Backfill: run ops/backfill in batches with row-count checks.
- Clean-up: remove gmv in a later PR after consumers switch.
Releases and tagging
- Tag production-stable points: vYYYY.MM.DD or semver (v1.4.0).
- Tag after successful prod deployment to enable quick rollback.
- Note migrations/backfills in tag message.
PR and code review checklist
- Branch prefix matches intent (feat/fix/hotfix/ops)
- Tests added/updated; all green in CI
- Impacted models listed (upstream/downstream)
- Data validation: row counts, uniqueness, null rates, sample records
- Roll-forward and rollback plan described
- Performance considered (incremental where possible)
Common mistakes and self-check
- Long-lived branches drifting from main → Keep branches small; rebase or merge main often.
- Testing only in dev → Require staging CI run before merge.
- Breaking schema changes without compatibility window → Add dual-write/dual-read period.
- Backfills from feature branches → Use ops/backfill-* with clear scopes.
- No tagging → Tag releases to enable quick rollbacks.
Self-check prompts
- Can you deploy this change safely at 4pm on a weekday?
- What is your rollback plan in 5 minutes or less?
- Who depends on this model, and have they been notified?
Exercises
Do these now. The Quick Test at the end is available to everyone; logged-in users will see saved progress.
Exercise 1 — Plan a safe feature branch
ID: ex1
- Create a branch to add a new dbt model for monthly active users (MAU).
- List the exact steps from dev run to tagging a release.
- Write 3 PR checks you will perform.
Need a hint?
- Prefix your branch with feat/
- Think: dev → staging (CI) → main (prod)
- Include tests and downstream impact
Exercise 2 — Hotfix procedure
ID: ex2
- A dashboard broke due to duplicated orders. Draft the hotfix branch name.
- List 5 steps you’ll take from branch creation to merge and tag.
- Include two validation checks.
Need a hint?
- Start hotfix from main
- Run only affected models
- Add uniqueness tests
Practical projects
- Set up CI to run dbt build on PRs into staging schema; block merge on failing tests.
- Create a sample migration: add new column with dual-write; backfill in batches; remove old column later.
- Write a branch-naming policy and PR template for your team.
Who this is for
- Analytics Engineers and BI Developers shipping SQL/dbt changes
- Data Engineers managing DAGs and transformations
- Anyone integrating data changes with dashboards
Prerequisites
- Basic Git (clone, branch, commit, merge)
- Familiarity with SQL/dbt or similar transformation tools
- Understanding of dev/staging/prod environments
Learning path
- Before: Git basics; dbt run/test; CI fundamentals
- Now: Branching strategy for safe data deployments
- Next: Pull request best practices; CI/CD for data; Data quality gates
Mini challenge
You need to change the grain of a model from daily to weekly, and three dashboards depend on it. Outline your approach in 6 steps using branches, compatibility windows, CI checks, and tagging. Keep prod stable at all times.
Next steps
- Adopt the naming conventions and PR checklist on your next change
- Automate staging builds on PRs
- Run the Quick Test below to confirm you’ve got it