Why this matters
As a BI Analyst, you translate raw operational data into trustworthy metrics. Metric reconciliation ensures numbers in your reports match the source systems your business uses (ERP, CRM, payments, event logs). You will be asked to explain why yesterdays revenue changed, why counts differ from finance reports, or whether refunds were included. Reconciliation makes your dashboards credible and prevents bad decisions.
- Real task: Verify daily sales in the BI warehouse equals the e-commerce platforms sales report.
- Real task: Explain a gap between Active Users in your dashboard and the product analytics export.
- Real task: Confirm finance revenue after refunds matches your metric logic before a board meeting.
Note: The quick test is free for everyone; only logged-in users will have their progress saved.
Who this is for
- BI Analysts and Data Analysts responsible for dashboards and KPI reporting.
- Analytics Engineers verifying ELT/ETL outputs.
- Anyone tasked with explaining metric differences to stakeholders.
Prerequisites
- Comfort with SQL (aggregations, joins, filters, window functions).
- Basic understanding of your companys data model (fact vs dimension tables).
- Awareness of time zones, currencies, and soft-delete patterns.
Concept explained simply
Metric reconciliation compares a metric computed from the BI warehouse with the same metric computed directly from the source system. If they differ, you isolate why and fix either the logic, the data pipeline, or expectations.
Mental model
- Define: Write a one-line formula for the metric (filters, time zone, currency, inclusion/exclusion rules).
- Compute twice: Once in source, once in warehouse, using equivalent logic.
- Align contexts: Same date boundaries, statuses, deduping, and joins.
- Compare: Summaries first (totals), then drill into segments (by day, channel, status).
- Explain: Classify differences by cause (timing/latency, logic/rules, data quality, currency/FX, time zone).
Core reconciliation workflow
- Write an explicit metric spec metric name, calculation, filters, time window, time zone, identifiers, currency, and inclusion/exclusion rules.
- Choose a time slice pick a closed period (e.g., a full UTC day) to avoid partial loads.
- Build comparable extracts simple, idempotent SQL in both source and warehouse.
- Start broad compare totals. If off, compare by dimensions such as status, channel, country.
- Pinpoint causes reconcile stepwise: counts, then sums, then derived rates. Investigate timestamps, dedupe rules, statuses.
- Document outcome capture SQL, assumptions, deltas by reason, and owner actions.
Advanced tips
- Use a fixed as-of timestamp for both sides to neutralize late-arriving data.
- Apply the same status mapping layer in both systems (e.g., map raw statuses to Paid/Refunded).
- Segment by load source or ingestion batch to detect pipeline gaps.
Worked examples
Example 1: Daily orders and gross revenue
Spec: Orders where status in (paid, shipped, completed), gross revenue = sum(item_price * qty) before refunds; time zone = UTC; date = 2025-03-01.
Source-side SQL sketch
-- Source (operational DB)
SELECT CAST(created_at AT TIME ZONE 'UTC' AS DATE) AS order_date,
COUNT(DISTINCT order_id) AS orders,
SUM(amount) AS gross_revenue
FROM source.orders
WHERE status IN ('paid','shipped','completed')
AND CAST(created_at AT TIME ZONE 'UTC' AS DATE) = DATE '2025-03-01'
GROUP BY 1;
Warehouse-side SQL sketch
-- Warehouse (modeled)
SELECT order_date_utc AS order_date,
COUNT(DISTINCT order_id) AS orders,
SUM(gross_amount) AS gross_revenue
FROM mart.fact_orders
WHERE order_date_utc = DATE '2025-03-01'
AND order_status_group = 'Paid'
GROUP BY 1;
Compare totals, then by status/channel. If counts match but revenue differs, check item-level rounding, currency conversion, or missing line items.
Example 2: Daily active users (DAU)
Spec: Distinct users with at least one qualifying event; time zone = product time zone (e.g., America/Los_Angeles).
Source-side SQL sketch (events)
SELECT CAST(event_time AT TIME ZONE 'America/Los_Angeles' AS DATE) AS d,
COUNT(DISTINCT user_id) AS dau
FROM source.events
WHERE is_bot = false
AND CAST(event_time AT TIME ZONE 'America/Los_Angeles' AS DATE) = DATE '2025-03-01'
GROUP BY 1;
Warehouse-side SQL sketch
SELECT event_date_pst AS d,
COUNT(DISTINCT user_id) AS dau
FROM mart.fact_events
WHERE event_date_pst = DATE '2025-03-01'
AND bot_flag = 0
GROUP BY 1;
If warehouse DAU is higher, verify bot filters and late-arriving events. If lower, check dedupe logic and unioned event sources.
Example 3: Net revenue including refunds
Spec: Net revenue = gross revenue refunds (posted date, currency = USD, FX rate at posting date). Compare finance system vs BI.
Finance-side SQL sketch
SELECT posted_date,
SUM(gross_usd) - SUM(refund_usd) AS net_revenue
FROM finance.gl_daily
WHERE posted_date = DATE '2025-03-01'
GROUP BY 1;
Warehouse-side SQL sketch
SELECT revenue_date_utc AS posted_date,
SUM(gross_usd) - SUM(refund_usd) AS net_revenue
FROM mart.fin_revenue
WHERE revenue_date_utc = DATE '2025-03-01'
GROUP BY 1;
Common causes of mismatch: FX rate date mismap, refund date based on order date instead of refund posted date, missing partial refunds.
Choosing scope: sampling vs full
- Start with full-day totals if data volume is manageable.
- For very large data, sample a stable slice (e.g., a single channel or region) and then expand.
- Always lock an as-of time to avoid fresh late-arriving records during comparison.
Exercises
Do these in order. Mirror your answers against the solutions. Keep your SQL idempotent and copy your reasoning.
Exercise 1: Reconcile daily order count and gross revenue
Context: Source table source.orders(order_id, created_at, status, amount, currency). Warehouse table mart.fact_orders(order_id, order_date_utc, order_status_group, gross_amount_usd). Target date: 2025-03-01 (UTC). Currency should be USD in both computations.
- Write two SQL queries (source and warehouse) to compute order_count and gross_revenue_usd for the date.
- Compare totals, then segment by status to find differences.
- Document at least one plausible root cause if there is a difference.
Checklist
- Used the same UTC date boundary on both sides
- Aligned statuses into a comparable mapping
- Ensured currency conversion to USD is consistent
- Produced a 1-row total and a by-status breakdown
Exercise 2: Reconcile DAU with bot filtering and time zone
Context: Source events in source.events(user_id, event_time, is_bot). Warehouse events in mart.fact_events(user_id, event_date_pst, bot_flag). Product time zone = America/Los_Angeles. Target date: 2025-03-01 in product time.
- Compute DAU on both sides with the same time zone and bot filter.
- If there is a mismatch, segment by event_type or platform if available.
- List two likely causes and how to validate them.
Checklist
- Time zone conversion is applied consistently
- Bot filter is aligned
- Distinct user logic is identical
- Segmentation used to isolate the gap
Common mistakes and self-check
- Different time zones. Self-check: Print min/max timestamps for the chosen date on both sides.
- Mismatched status logic. Self-check: Build a mapping table and join it in both queries.
- Double counting. Self-check: COUNT(DISTINCT id) vs COUNT(*); inspect duplicates.
- Currency/FX mismap. Self-check: Show FX rate per day used in both computations.
- Partial loads/latency. Self-check: Compare as-of timestamps and ingestion batch ids.
How to present reconciliation results
- One-line metric spec and as-of time.
- Totals table with warehouse vs source and diff (%).
- Example: metric, source_value, warehouse_value, diff, diff_pct
- Top-3 difference drivers with numbers.
- Owner + next action (pipeline fix, metric definition change, or accept variance).
Practical projects
- Build a reusable reconciliation template that takes metric name, date, and outputs warehouse vs source totals and by-dimension breakdowns.
- Create a daily job that snapshots reconciliation summaries for the top 5 KPIs and flags variance > 1%.
- Document a metric playbook: spec, sample SQL, common pitfalls, and sign-off criteria for each core KPI.
Learning path
- Start: Reconcile a single-day metric (orders or DAU).
- Next: Add refunds/returns and FX to handle financial parity.
- Then: Automate with scheduled checks and alerting on variance thresholds.
Mini challenge
Your finance team reports net revenue is 0.8% lower than your dashboard for the last closed day. Propose a 5-step plan to isolate the cause. Include the exact cuts you will run and the order youll run them.
Next steps
- Finish the exercises and save your SQL sketches.
- Take the quick test to confirm you can spot common traps.
- Apply this to one production KPI and ship a short reconciliation report.