Why this matters
In Analytics Engineering, pull requests (PRs) and code reviews protect data quality, reduce incidents, and spread knowledge across the team. You use PRs to propose changes to SQL models, dbt projects, metrics logic, documentation, and orchestration files. Reviews catch breaking changes early, verify performance, and ensure naming, testing, and lineage are consistent.
- Real task: propose a new dim_customers model and get feedback before it reaches production.
- Real task: review a teammate’s PR to ensure joins won’t duplicate rows and tests cover edge cases.
- Real task: fix a failing CI check by updating schema tests and rerunning pipelines before merge.
Who this is for
- Analytics Engineers working with Git-based workflows (dbt, SQL, Python transformations).
- BI Developers and Data Analysts contributing to shared analytics repositories.
- Data Engineers collaborating on warehouse models and tests.
Prerequisites
- Basic Git: clone, branch, commit, push, pull.
- Comfort with SQL and your analytics stack (e.g., dbt, warehouse).
- Understanding of your team’s branching and deployment strategy.
Concept explained simply
A pull request is a proposal to merge your branch into a target branch (usually main). It packages your changes with context so others can review, run checks, and decide when it’s safe to merge.
Mental model: the change package conveyor
Think of a PR as a clearly labeled box on a conveyor:
- Pack it well: small, focused commits; descriptive title; thorough description.
- Quality gate: automated checks run (lint, tests, build).
- Human inspection: reviewers open the box, inspect items, and request fixes.
- Ship: when it passes, merge to main cleanly and safely.
Typical PR flow (commands)
git checkout -b feat/dim-customers-normalized
# make changes
git add .
git commit -m "feat(dbt): add normalized email to dim_customers and tests"
git push -u origin feat/dim-customers-normalized
# open a PR on your platform (Draft if not ready)
States and gates
- Draft: early feedback, not ready to merge.
- Open: ready for full review.
- Checks: CI status (pass/fail).
- Reviews: approvals, changes requested.
- Merge: squash, rebase, or merge commit.
Merge strategies (when to use)
- Squash and merge: preferred for analytics; one clean commit per PR, easy rollbacks.
- Rebase and merge: linear history, requires clean rebase; good for advanced users.
- Merge commit: preserves branch history; can be noisy.
Worked examples
Example 1: Authoring a PR for a dbt model change
- Create a small branch: one logical change only.
- Write good commits: imperative, scoped messages.
- Open a PR with a template covering context, impact, test plan, risks, and rollout.
- Mark as Draft if CI or validation isn’t done yet.
- Address feedback promptly; push updates; re-request review.
- Squash-merge after approvals and green checks; delete the branch.
Sample PR description
Title: feat(dbt): normalize customer emails in dim_customers
Context
- Adds email_normalized using lower(trim(email)) to improve deduping.
- Updates schema tests: unique + not_null on customer_id and email_normalized.
Impact
- Affects downstream Customer 360 dashboard metrics that group by email.
- Expect minor shift in counts due to normalization; documented below.
Test plan
- Ran `dbt build -s dim_customers` locally.
- Compared row counts before/after: delta +0.2% (expected from dedupe).
- Verified no dupe groups by customer_id/email_normalized.
Performance
- Added index suggestion in model config (warehouse-dependent).
Risks & rollback
- If unexpected drop in counts >1%, revert via squash commit rollback.
Release notes
- New column: email_normalized (string). Update queries to use it.
Example 2: Reviewing SQL changes
- Scan the diff for SELECT *, cartesian joins, and non-deterministic logic.
- Check tests: unique, not_null, and relationship tests updated?
- Validate performance: filters pushed down? unnecessary CTE expansion?
- Think lineage: will this break downstream dashboards or contracts?
- Leave specific, actionable comments with suggestions.
Review comment examples
- “Possible duplicate rows due to join on customer_id without date boundary. Consider joining on the latest record only.”
- “Please replace SELECT * with explicit columns to stabilize schema and improve performance.”
- “Add a not_null test for email_normalized and update unique test accordingly.”
Example 3: Fixing failing checks and conflicts
- Pull latest main and rebase/merge to resolve conflicts.
- Run lint/tests locally. Fix style and schema test failures.
- Push updates; ensure CI passes and summarize fixes in a short comment.
Conflict tip
Resolve conflicts in the model SQL and schema.yml together; mismatches often cause CI test failures even if SQL compiles.
Author checklist
- Is the PR focused on one logical change and under ~300 lines of diff?
- Clear title and description: context, impact, test plan, risks, rollback.
- Tests updated: not_null, unique, relationships, freshness (if relevant).
- No SELECT *; explicit columns and stable schema.
- Performance considered: filters, joins, incremental strategy as needed.
- Documentation updated (model docs, column descriptions).
Reviewer checklist
- Does the change preserve correctness and avoid duplication?
- Are edge cases handled (nulls, late-arriving data, time zones)?
- Do tests cover the new logic and guard against regressions?
- Is the change minimal and readable? Any dead code or unused columns?
- Any data contract impacts communicated (column add/remove/rename)?
- CI green and local validation steps reproduced when risky?
Exercises
Do these before the quick test. Tip: the quick test at the bottom is available to everyone; log in to save your progress.
Exercise 1 — Write a strong PR description
Scenario: You added email_normalized to dim_customers, updated tests, and expect minor changes in the Customer 360 dashboard. Draft a PR description using the structure from this lesson.
- Include: context, impact, test plan, performance notes, risks, rollback, and release notes.
Hints
- State why the change is needed and what changes downstream.
- Include exact tests run and observed deltas.
- Mention how to roll back quickly if needed.
Exercise 2 — Review a diff
Read the diff and list at least five specific review comments.
--- a/models/stg_orders.sql
+++ b/models/stg_orders.sql
@@
- select *
- from raw.orders o
- left join raw.customers c on o.customer_id = c.customer_id
- where order_date >= current_date - 90
+ select
+ o.order_id,
+ o.customer_id,
+ o.amount,
+ o.order_date,
+ c.email,
+ c.signup_date
+ from raw.orders o
+ left join raw.customers c on o.customer_id = c.customer_id and c.is_active = true
+ where date(o.order_date) >= current_date - 90
Hints
- Check for SELECT * removal—did schema stability improve fully?
- Any performance pitfalls or non-deterministic date handling?
- Any potential row duplication or missing tests?
Common mistakes and self-check
- Oversized PRs: hard to review and risky to merge. Self-check: can you split it into smaller, independent PRs?
- Poor descriptions: reviewers lack context. Self-check: would a new teammate understand the why, what, and how to validate?
- No tests updated: logic changes without guards. Self-check: list tests that should fail if a regression occurs.
- SELECT * and unstable schemas: downstream breakages. Self-check: enumerate exact output columns.
- Ignoring performance: slow joins or missing filters. Self-check: estimate row counts and filter selectivity.
- Approving without running risky checks locally: fragile merges. Self-check: run the critical model and compare key metrics.
Practical projects
- Set up a PR template with sections: Context, Impact, Test Plan, Risks, Rollback, Release Notes.
- Create CODEOWNERS or review rules for critical models to enforce expert reviews.
- Add CI steps to run dbt build on changed models and surface test results in the PR.
- Simulate an incident: introduce a failing test in a branch, catch it via PR checks, and fix before merge.
Learning path
- Branching and committing effectively.
- Creating PRs with strong descriptions and small scope.
- Code review heuristics for analytics (joins, tests, performance, lineage).
- CI checks for dbt/SQL and reading test outputs.
- Merge strategies and rollback tactics.
- Advanced: review ownership, templates, and automated validations.
Mini challenge
Open a Draft PR for an in-progress model and ask one targeted question in the description (e.g., “Is this join key correct for late-arriving updates?”). Convert to Ready for review only after tests pass.
Next steps
- Integrate data quality checks into PR CI for changed models.
- Adopt squash merges for clean history and easier rollbacks.
- Run lightweight release notes from merged PRs to inform BI stakeholders.
Quick Test
This test is available to everyone. Log in to save your progress and see your history.