Why this matters
Mental model
Think of Import as pre-cooked meals (fast to serve, prep in batches) and Direct Query as a made-to-order kitchen (always fresh, but each dish takes time and uses kitchen resources). Composite models let you serve pre-cooked for most, and made-to-order for special requests.
Core decision factors
- Data volume and grain: Big wide tables can exceed memory or refresh windows (favor Direct Query or Aggregations).
- Freshness requirements: Minute-level freshness (favor Direct Query or Hybrid tables). Hourly/daily is fine (favor Import with incremental refresh).
- User concurrency and SLAs: Many users needing sub-second response (favor Import, or Aggregations + Direct Query fallback).
- Source system capacity/cost: Expensive warehouse queries or strict limits (favor Import to offload and cache).
- Security location: Complex row-level security in source (favor Direct Query or Live to keep rules enforced centrally).
- Transformation complexity: Heavy shaping needed (favor Import with ELT + incremental refresh; ensure query folding for performance).
- Governance/lineage: Central semantic models via live connections reduce duplicated modeling.
Tip: query folding check
When using transformations for Import/Direct Query, prefer steps that fold to the source. Non-folding steps force full data pulls or slow gateways.
Worked examples
Example 1 — Executive KPIs
Need: sub-second KPI cards and simple trend charts for last 3 years; refresh every 2 hours; 5k daily active users.
- Strategy: Import mode with incremental refresh (partitioned by date).
- Why: Fast in-memory aggregations, manageable refresh windows, offloads source.
- Extras: Pre-calc common measures; apply aggregations if the fact is very large.
Example 2 — Near real-time operations
Need: warehouse events within 2–5 minutes; detailed drilldowns by device and status; 300 concurrent users during shift changes.
- Strategy: Composite model: Import aggregated table at 15-min grain + Direct Query detailed events table.
- Why: Most visuals hit fast Import aggregates; deep drill hits Direct Query only when needed.
- Extras: Configure aggregation mappings so the engine routes to Import first; add simple filters to keep Direct Query result sets small.
Example 3 — Massive dataset with strict security
Need: 5B row fact with customer-level RLS defined in the data warehouse; strict data residency; queries must respect source policies.
- Strategy: Direct Query (or Live) to curated warehouse model; optional Import of non-sensitive dimensions.
- Why: Centralized RLS, governance, and no huge extracts; dimensions cached for speed.
- Extras: Use source-side aggregates and materialized views; limit high-cardinality columns in visuals.
Decision flow
Benchmarks you can run
- Cold vs warm load time: open report pages twice and compare.
- Cross-filter time: measure slicer-to-visual response.
- Query count per interaction: confirm Direct Query isn’t issuing excessive queries.
- Concurrency test: simulate 20–50 users opening the same page (staggered) and record max latency.
- Refresh SLA (Import): measure full vs. incremental refresh duration; ensure window fits your schedule.
Result interpretation tips
- If Import cold loads are slow, pre-warm via scheduled caching or landing pages.
- If Direct Query chokes on wide visuals, reduce columns, add source indexes, or use aggregate tables.
Exercises
Complete these here, then record your answers in the Exercises section below to compare with the sample solutions.
Exercise 1 — Choose the mode
Scenario: 120M-row sales fact, daily refresh; pages show last 24 months by month; need sub-1.5s loads; 800 daily users; warehouse costs surge under heavy interactive use.
- Decide: Import, Direct Query, or Composite?
- List top 3 reasons for your choice.
- Add one mitigation for your main risk.
Exercise 2 — Design a composite model
Scenario: Events table (2B rows) needs up-to-5-minute freshness for drilldowns; dashboards mostly show hourly counts.
- Propose an Import aggregate table (grain, columns).
- Define Direct Query detail table, filters to keep queries small.
- Explain how your aggregation mapping will route most visuals to Import.
Self-check checklist
- [ ] You justified freshness vs. speed trade-offs.
- [ ] You proposed incremental refresh for Import where applicable.
- [ ] You reduced Direct Query cardinality via filters, parameters, or simpler visuals.
- [ ] You considered source-side indexes/clustered tables or materialized views.
- [ ] You planned for concurrency (pre-aggregations, caching, smaller visuals).
Common mistakes and self-check
- Mistake: Using Import for huge, highly volatile facts without incremental refresh. Fix: Partition by date and refresh only recent partitions.
- Mistake: Direct Query on complex visuals with many columns. Fix: Reduce columns, add source-side aggregates, limit visuals per page.
- Mistake: Ignoring RLS placement. Fix: If rules must live in the warehouse, avoid extracting sensitive facts; consider Composite.
- Mistake: Skipping concurrent-user tests. Fix: Test with realistic page opens, not just single-user timings.
- Mistake: Non-folding transformations. Fix: Use transformations that fold; push heavy logic into the warehouse.
Quick self-audit
- Do you know the required freshness window?
- Have you sized memory/refresh time for Import?
- Do your Direct Query pages avoid high-cardinality fields?
- Is there an aggregation strategy to catch 80–90% of queries?
Practical projects
- Project A: Convert an Import-only report to Composite with a daily Import aggregate and Direct Query detail. Measure latency delta.
- Project B: Implement incremental refresh on a 200M-row fact. Document refresh time before vs. after.
- Project C: Create an aggregation table and mapping that answers at least three existing visuals without hitting the detail table.
Who this is for
- BI Developers and Analytics Engineers deciding data access patterns for dashboards
- Data Modelers optimizing semantic models for speed and governance
Prerequisites
- Basic data modeling (facts, dimensions, star schemas)
- Understanding of refresh, partitions, and query performance basics
- Familiarity with your data warehouse behavior under load
Learning path
- Start: Import vs Direct Query fundamentals (this page)
- Then: Incremental refresh and partitioning
- Then: Aggregations and composite models
- Then: RLS design and evaluation
- Finally: Concurrency testing and capacity planning
Next steps
- Run one benchmark on an existing report (cold vs warm load).
- Identify one page to simplify visuals for Direct Query.
- Add an Import aggregate table for your top slow page.
Mini challenge
Pick a slow Direct Query page. Create a single Import aggregate that answers at least two visuals on that page. Measure before/after response times and write a 3-bullet summary of results.