How to learn Metric Reconciliation With Source Systems for Data Quality Checks in BI Analyst for free

Why this matters

As a BI Analyst, you translate raw operational data into trustworthy metrics. Metric reconciliation ensures numbers in your reports match the source systems your business uses (ERP, CRM, payments, event logs). You will be asked to explain why yesterdays revenue changed, why counts differ from finance reports, or whether refunds were included. Reconciliation makes your dashboards credible and prevents bad decisions.

Real task: Verify daily sales in the BI warehouse equals the e-commerce platforms sales report.
Real task: Explain a gap between Active Users in your dashboard and the product analytics export.
Real task: Confirm finance revenue after refunds matches your metric logic before a board meeting.

Note: The quick test is free for everyone; only logged-in users will have their progress saved.

Who this is for

BI Analysts and Data Analysts responsible for dashboards and KPI reporting.
Analytics Engineers verifying ELT/ETL outputs.
Anyone tasked with explaining metric differences to stakeholders.

Prerequisites

Comfort with SQL (aggregations, joins, filters, window functions).
Basic understanding of your companys data model (fact vs dimension tables).
Awareness of time zones, currencies, and soft-delete patterns.

Concept explained simply

Metric reconciliation compares a metric computed from the BI warehouse with the same metric computed directly from the source system. If they differ, you isolate why and fix either the logic, the data pipeline, or expectations.

Mental model

Define: Write a one-line formula for the metric (filters, time zone, currency, inclusion/exclusion rules).
Compute twice: Once in source, once in warehouse, using equivalent logic.
Align contexts: Same date boundaries, statuses, deduping, and joins.
Compare: Summaries first (totals), then drill into segments (by day, channel, status).
Explain: Classify differences by cause (timing/latency, logic/rules, data quality, currency/FX, time zone).

Core reconciliation workflow

Write an explicit metric spec metric name, calculation, filters, time window, time zone, identifiers, currency, and inclusion/exclusion rules.
Choose a time slice pick a closed period (e.g., a full UTC day) to avoid partial loads.
Build comparable extracts simple, idempotent SQL in both source and warehouse.
Start broad compare totals. If off, compare by dimensions such as status, channel, country.
Pinpoint causes reconcile stepwise: counts, then sums, then derived rates. Investigate timestamps, dedupe rules, statuses.
Document outcome capture SQL, assumptions, deltas by reason, and owner actions.

Advanced tips

Use a fixed as-of timestamp for both sides to neutralize late-arriving data.
Apply the same status mapping layer in both systems (e.g., map raw statuses to Paid/Refunded).
Segment by load source or ingestion batch to detect pipeline gaps.

Worked examples

Example 1: Daily orders and gross revenue

Spec: Orders where status in (paid, shipped, completed), gross revenue = sum(item_price * qty) before refunds; time zone = UTC; date = 2025-03-01.

Source-side SQL sketch

-- Source (operational DB)
SELECT CAST(created_at AT TIME ZONE 'UTC' AS DATE) AS order_date,
       COUNT(DISTINCT order_id) AS orders,
       SUM(amount) AS gross_revenue
FROM source.orders
WHERE status IN ('paid','shipped','completed')
  AND CAST(created_at AT TIME ZONE 'UTC' AS DATE) = DATE '2025-03-01'
GROUP BY 1;

Warehouse-side SQL sketch

-- Warehouse (modeled)
SELECT order_date_utc AS order_date,
       COUNT(DISTINCT order_id) AS orders,
       SUM(gross_amount) AS gross_revenue
FROM mart.fact_orders
WHERE order_date_utc = DATE '2025-03-01'
  AND order_status_group = 'Paid'
GROUP BY 1;

Compare totals, then by status/channel. If counts match but revenue differs, check item-level rounding, currency conversion, or missing line items.

Example 2: Daily active users (DAU)

Spec: Distinct users with at least one qualifying event; time zone = product time zone (e.g., America/Los_Angeles).

Source-side SQL sketch (events)

SELECT CAST(event_time AT TIME ZONE 'America/Los_Angeles' AS DATE) AS d,
       COUNT(DISTINCT user_id) AS dau
FROM source.events
WHERE is_bot = false
  AND CAST(event_time AT TIME ZONE 'America/Los_Angeles' AS DATE) = DATE '2025-03-01'
GROUP BY 1;

Warehouse-side SQL sketch

SELECT event_date_pst AS d,
       COUNT(DISTINCT user_id) AS dau
FROM mart.fact_events
WHERE event_date_pst = DATE '2025-03-01'
  AND bot_flag = 0
GROUP BY 1;

If warehouse DAU is higher, verify bot filters and late-arriving events. If lower, check dedupe logic and unioned event sources.

Example 3: Net revenue including refunds

Spec: Net revenue = gross revenue refunds (posted date, currency = USD, FX rate at posting date). Compare finance system vs BI.

Finance-side SQL sketch

SELECT posted_date,
       SUM(gross_usd) - SUM(refund_usd) AS net_revenue
FROM finance.gl_daily
WHERE posted_date = DATE '2025-03-01'
GROUP BY 1;

Warehouse-side SQL sketch

SELECT revenue_date_utc AS posted_date,
       SUM(gross_usd) - SUM(refund_usd) AS net_revenue
FROM mart.fin_revenue
WHERE revenue_date_utc = DATE '2025-03-01'
GROUP BY 1;

Common causes of mismatch: FX rate date mismap, refund date based on order date instead of refund posted date, missing partial refunds.

Choosing scope: sampling vs full

Start with full-day totals if data volume is manageable.
For very large data, sample a stable slice (e.g., a single channel or region) and then expand.
Always lock an as-of time to avoid fresh late-arriving records during comparison.

Exercises

Do these in order. Mirror your answers against the solutions. Keep your SQL idempotent and copy your reasoning.

Exercise 1: Reconcile daily order count and gross revenue

Context: Source table source.orders(order_id, created_at, status, amount, currency). Warehouse table mart.fact_orders(order_id, order_date_utc, order_status_group, gross_amount_usd). Target date: 2025-03-01 (UTC). Currency should be USD in both computations.

Write two SQL queries (source and warehouse) to compute order_count and gross_revenue_usd for the date.
Compare totals, then segment by status to find differences.
Document at least one plausible root cause if there is a difference.

Checklist

Used the same UTC date boundary on both sides
Aligned statuses into a comparable mapping
Ensured currency conversion to USD is consistent
Produced a 1-row total and a by-status breakdown

Exercise 2: Reconcile DAU with bot filtering and time zone

Context: Source events in source.events(user_id, event_time, is_bot). Warehouse events in mart.fact_events(user_id, event_date_pst, bot_flag). Product time zone = America/Los_Angeles. Target date: 2025-03-01 in product time.

Compute DAU on both sides with the same time zone and bot filter.
If there is a mismatch, segment by event_type or platform if available.
List two likely causes and how to validate them.

Checklist

Time zone conversion is applied consistently
Bot filter is aligned
Distinct user logic is identical
Segmentation used to isolate the gap

Common mistakes and self-check

Different time zones. Self-check: Print min/max timestamps for the chosen date on both sides.
Mismatched status logic. Self-check: Build a mapping table and join it in both queries.
Double counting. Self-check: COUNT(DISTINCT id) vs COUNT(*); inspect duplicates.
Currency/FX mismap. Self-check: Show FX rate per day used in both computations.
Partial loads/latency. Self-check: Compare as-of timestamps and ingestion batch ids.

How to present reconciliation results

One-line metric spec and as-of time.
Totals table with warehouse vs source and diff (%).
- Example: metric, source_value, warehouse_value, diff, diff_pct
Top-3 difference drivers with numbers.
Owner + next action (pipeline fix, metric definition change, or accept variance).

Practical projects

Build a reusable reconciliation template that takes metric name, date, and outputs warehouse vs source totals and by-dimension breakdowns.
Create a daily job that snapshots reconciliation summaries for the top 5 KPIs and flags variance > 1%.
Document a metric playbook: spec, sample SQL, common pitfalls, and sign-off criteria for each core KPI.

Learning path

Start: Reconcile a single-day metric (orders or DAU).
Next: Add refunds/returns and FX to handle financial parity.
Then: Automate with scheduled checks and alerting on variance thresholds.

Mini challenge

Your finance team reports net revenue is 0.8% lower than your dashboard for the last closed day. Propose a 5-step plan to isolate the cause. Include the exact cuts you will run and the order youll run them.

Next steps

Finish the exercises and save your SQL sketches.
Take the quick test to confirm you can spot common traps.
Apply this to one production KPI and ship a short reconciliation report.

Menu

Metric Reconciliation With Source Systems

Table of Contents

Why this matters

Who this is for

Prerequisites

Concept explained simply

Mental model

Core reconciliation workflow

Worked examples

Example 1: Daily orders and gross revenue

Example 2: Daily active users (DAU)

Example 3: Net revenue including refunds

Choosing scope: sampling vs full

Exercises

Exercise 1: Reconcile daily order count and gross revenue

Exercise 2: Reconcile DAU with bot filtering and time zone

Common mistakes and self-check

Practical projects

Learning path

Mini challenge

Next steps

Practice Exercises

Reconcile daily order count and gross revenue (UTC)

Instructions

Expected Output

Reconcile DAU with time zone and bot filter

Metric Reconciliation With Source Systems — Quick Test

Have questions about Metric Reconciliation With Source Systems?

AI Assistant