luvv to helpDiscover the Best Free Online Tools
Topic 6 of 8

Identifying Data Sources And Gaps

Learn Identifying Data Sources And Gaps for free with explanations, exercises, and a quick test (for BI Developer).

Published: December 24, 2025 | Updated: December 24, 2025

Why this matters

As a BI Developer, you turn business questions into trusted dashboards. That only works if you can quickly find the right data, understand its limits, and call out what is missing. This skill helps you avoid stalled projects, wrong metrics, and last-minute surprises.

  • Real tasks: map KPIs to systems/tables, confirm fields and grain, check freshness, and document gaps with clear mitigation options.
  • Common scenarios: marketing ROI needing ad spend + revenue, churn needing lifecycle events, supply metrics needing inventory and sales.

Concept explained simply

Identifying data sources and gaps means: list all possible places your needed fields could come from, evaluate if they are fit for purpose, and write down anything that prevents you from building the metric today.

Mental model: GRAIN+Q

  • G = Grain: what one row represents (order, order line, session, event).
  • R = Range (history): how far back the data goes.
  • A = Availability: access, permissions, and where it lives (DW, SaaS, API).
  • I = Integration: how you join it (keys, IDs, mapping tables).
  • N = Naming/definitions: do fields match your metric definition.
  • +Q = Quality: freshness, completeness, nulls, duplicates.
What to capture for each candidate source
  • Owner/contact and system (e.g., Warehouse, ERP, CRM, Ads platform)
  • Table/view name and last refresh time
  • Grain (row meaning) and primary keys
  • Critical fields present/absent
  • History depth and retention policies
  • Freshness/cadence (real-time, hourly, daily)
  • Access/privacy constraints (PII, restricted scopes)
  • Known data quality notes (nulls, duplicates, outliers)

A repeatable 5-step method

  1. Clarify the metric question and decision. Write the exact definition, filters, and time window.
  2. Derive the data needs. List entities, fields, and required joins.
  3. Inventory candidate sources. For each, fill the GRAIN+Q checklist.
  4. Identify and categorize gaps. Missing field, grain mismatch, freshness/latency, coverage, history, definition misalignment, access/privacy.
  5. Recommend mitigations. Alternate sources, derive/compute, adjust definition, add tracking, use proxies, or accept latency. Document trade-offs.
Simple templates you can copy

Source inventory entry:

  • System/Table:
  • Owner:
  • Grain/PK:
  • Key fields present/missing:
  • History depth:
  • Freshness:
  • Access/privacy:
  • Quality notes:

Gap log entry:

  • Gap:
  • Type (missing field / grain / freshness / history / coverage / definition / access):
  • Impact:
  • Mitigation options:
  • Decision/next step:

Worked examples

Example 1: Weekly revenue by campaign

Requirement: Weekly net revenue by campaign for last 26 weeks, only paid orders, attributed to last click from web.

  • Data needs: order_id, net_revenue, paid status, order_date, campaign_id/utm, last-click logic, channel filters.
  • Candidate sources: DW fact_orders (paid flag, revenue), dim_campaigns, web_analytics_sessions (utm, last_click).
  • Gaps found:
    • Freshness: orders updated daily 02:00; stakeholders expect near real-time. Type: freshness.
    • Attribution: last-click field not materialized; needs logic. Type: definition/transform.
    • Coverage: direct traffic has no campaign. Type: coverage.
  • Mitigations: set expectation to daily at 09:00; implement deterministic last-click in model; display "Unattributed" bucket.

Example 2: Churn rate by monthly cohort

Requirement: Churn rate for subscription users by signup cohort, last 12 months.

  • Data needs: user_id, signup_date, cancel_date, reactivation, plan_status timeline.
  • Candidate sources: billing_events (subscription_started/canceled), users_dim (signup), support_tickets (cancellations via support).
  • Gaps found:
    • History: billing_events only retained 9 months. Type: history.
    • Integration: some cancellations only in support_tickets, no reliable user_id; email used. Type: integration.
  • Mitigations: pull archived billing data from cold storage to cover older months; create email-to-user_id mapping table and measure residual unmatched rate.

Example 3: Out-of-stock rate by store

Requirement: Daily OOS rate by store and product category.

  • Data needs: on_hand_qty by day, sales by day, store_id, product_category.
  • Candidate sources: inventory_snapshot_daily, sales_fact, product_dim.
  • Gaps found:
    • Grain mismatch: inventory at store-product-day; sales at order line with timestamps. Type: grain.
    • Quality: negative on_hand values on weekends. Type: quality.
  • Mitigations: aggregate sales to day and align timezone; cap negative inventory at zero and flag days with anomalies for exclusion.

Exercises

Complete these before checking solutions. Use the checklist below to stay consistent.

Exercise 1 — Source inventory and gap log: Monthly Active Users by country

Goal: Identify sources and gaps to report Monthly Active Users (MAU) by country for the last 12 months, web only, excluding known bots.

  1. Define MAU precisely.
  2. List candidate sources and fill GRAIN+Q for each.
  3. List gaps and propose mitigations.

Deliverable: a short source inventory and a gap log with decisions.

Exercise 2 — Source mapping: Gross margin weekly by product category

Goal: Identify sources and gaps to report gross margin by product category, weekly, last 52 weeks.

  1. Define gross margin in fields.
  2. List candidate sources and joins.
  3. Flag gaps and mitigation options.

Deliverable: a brief mapping with gaps and suggested path.

Exercise checklist

  • Exact metric definition with filters/time window.
  • Source inventory includes owner, grain, history, freshness, key fields.
  • Join keys identified; fallbacks if missing.
  • Gaps categorized and impact stated.
  • Mitigations are realistic with trade-offs.

Common mistakes and self-check

  • Mistake: Assuming field names equal definitions. Self-check: compare business definition with source documentation and sample values.
  • Mistake: Ignoring grain. Self-check: write row meaning for every table you use.
  • Mistake: Overlooking history limits. Self-check: query min/max dates and retention policies.
  • Mistake: Missing join key quality. Self-check: compute join coverage and duplicate rates.
  • Mistake: Promising real-time on daily sources. Self-check: write the refresh cadence and share it with stakeholders.

Mini challenge

Stakeholder asks: "Show average resolution time for high-priority tickets by customer tier, last quarter." In 5 minutes, jot down: metric definition, candidate sources, expected gaps, and one mitigation per gap. Keep it concise.

Who this is for

  • Aspiring and junior BI Developers who translate stakeholder needs into datasets and dashboards.
  • Data analysts moving toward BI/analytics engineering responsibilities.

Prerequisites

  • Comfort with SQL joins, aggregations, and date logic.
  • Basic understanding of data warehousing concepts (facts, dimensions, grain).
  • Familiarity with your companys data sources and access policies.

Learning path

  1. Requirement clarification and metric definitions.
  2. Identifying data sources and gaps (this lesson).
  3. Proposing mitigations and scoping transformations.
  4. Modeling datasets with tested joins and grain.
  5. Validating metrics and documenting assumptions.

Practical projects

  • Build a source inventory for one domain (e.g., marketing). Include grain, history, freshness, and owner for 81 key tables.
  • Create a gap log for two KPIs and propose concrete mitigations with timelines.
  • Publish a one-page data readiness summary to align stakeholder expectations.

Next steps

  • Apply the GRAIN+Q model to your current top KPI.
  • Finish the exercises and then take the Quick Test.
  • Schedule a brief review with a domain owner to confirm assumptions.

Progress and test

The Quick Test below is available to everyone. Only logged-in users have their progress saved.

Practice Exercises

2 exercises to complete

Instructions

Define Monthly Active Users (MAU) as unique users with at least one web session in a calendar month. Report MAU by country for the last 12 months, excluding known bots.

  1. Write the exact definition you will implement, including filters.
  2. List candidate sources and fill a brief GRAIN+Q entry for each.
  3. Document gaps (type, impact) and propose at least one mitigation per gap.
Expected Output
A concise source inventory (24 entries) and a gap log with clear mitigation options and trade-offs.

Identifying Data Sources And Gaps — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Identifying Data Sources And Gaps?

AI Assistant

Ask questions about this tool