How to learn Import Versus Direct Query Strategy for Performance Optimization in BI Developer for free

Why this matters

Mental model

Think of Import as pre-cooked meals (fast to serve, prep in batches) and Direct Query as a made-to-order kitchen (always fresh, but each dish takes time and uses kitchen resources). Composite models let you serve pre-cooked for most, and made-to-order for special requests.

Core decision factors

Data volume and grain: Big wide tables can exceed memory or refresh windows (favor Direct Query or Aggregations).
Freshness requirements: Minute-level freshness (favor Direct Query or Hybrid tables). Hourly/daily is fine (favor Import with incremental refresh).
User concurrency and SLAs: Many users needing sub-second response (favor Import, or Aggregations + Direct Query fallback).
Source system capacity/cost: Expensive warehouse queries or strict limits (favor Import to offload and cache).
Security location: Complex row-level security in source (favor Direct Query or Live to keep rules enforced centrally).
Transformation complexity: Heavy shaping needed (favor Import with ELT + incremental refresh; ensure query folding for performance).
Governance/lineage: Central semantic models via live connections reduce duplicated modeling.

Tip: query folding check

When using transformations for Import/Direct Query, prefer steps that fold to the source. Non-folding steps force full data pulls or slow gateways.

Worked examples

Example 1 — Executive KPIs

Need: sub-second KPI cards and simple trend charts for last 3 years; refresh every 2 hours; 5k daily active users.

Strategy: Import mode with incremental refresh (partitioned by date).
Why: Fast in-memory aggregations, manageable refresh windows, offloads source.
Extras: Pre-calc common measures; apply aggregations if the fact is very large.

Example 2 — Near real-time operations

Need: warehouse events within 2–5 minutes; detailed drilldowns by device and status; 300 concurrent users during shift changes.

Strategy: Composite model: Import aggregated table at 15-min grain + Direct Query detailed events table.
Why: Most visuals hit fast Import aggregates; deep drill hits Direct Query only when needed.
Extras: Configure aggregation mappings so the engine routes to Import first; add simple filters to keep Direct Query result sets small.

Example 3 — Massive dataset with strict security

Need: 5B row fact with customer-level RLS defined in the data warehouse; strict data residency; queries must respect source policies.

Strategy: Direct Query (or Live) to curated warehouse model; optional Import of non-sensitive dimensions.
Why: Centralized RLS, governance, and no huge extracts; dimensions cached for speed.
Extras: Use source-side aggregates and materialized views; limit high-cardinality columns in visuals.

Decision flow

Step 1: Is freshness needed under 15 minutes? If yes, consider Direct Query or Composite; else, Import.

Step 2: Can the dataset fit memory and refresh in time? If yes, Import; if no, Composite or Direct Query with Aggregations.

Step 3: Does governance require source-enforced RLS? If yes, favor Direct Query/Live or Composite with sensitive data in Direct Query.

Step 4: Is the warehouse costly under heavy query? If yes, favor Import or Aggregations to offload.

Step 5: Test with representative visuals and concurrency before finalizing.

Benchmarks you can run

Cold vs warm load time: open report pages twice and compare.
Cross-filter time: measure slicer-to-visual response.
Query count per interaction: confirm Direct Query isn’t issuing excessive queries.
Concurrency test: simulate 20–50 users opening the same page (staggered) and record max latency.
Refresh SLA (Import): measure full vs. incremental refresh duration; ensure window fits your schedule.

Result interpretation tips

If Import cold loads are slow, pre-warm via scheduled caching or landing pages.
If Direct Query chokes on wide visuals, reduce columns, add source indexes, or use aggregate tables.

Exercises

Complete these here, then record your answers in the Exercises section below to compare with the sample solutions.

Exercise 1 — Choose the mode

Scenario: 120M-row sales fact, daily refresh; pages show last 24 months by month; need sub-1.5s loads; 800 daily users; warehouse costs surge under heavy interactive use.

Decide: Import, Direct Query, or Composite?
List top 3 reasons for your choice.
Add one mitigation for your main risk.

Exercise 2 — Design a composite model

Scenario: Events table (2B rows) needs up-to-5-minute freshness for drilldowns; dashboards mostly show hourly counts.

Propose an Import aggregate table (grain, columns).
Define Direct Query detail table, filters to keep queries small.
Explain how your aggregation mapping will route most visuals to Import.

Self-check checklist

[ ] You justified freshness vs. speed trade-offs.
[ ] You proposed incremental refresh for Import where applicable.
[ ] You reduced Direct Query cardinality via filters, parameters, or simpler visuals.
[ ] You considered source-side indexes/clustered tables or materialized views.
[ ] You planned for concurrency (pre-aggregations, caching, smaller visuals).

Common mistakes and self-check

Mistake: Using Import for huge, highly volatile facts without incremental refresh. Fix: Partition by date and refresh only recent partitions.
Mistake: Direct Query on complex visuals with many columns. Fix: Reduce columns, add source-side aggregates, limit visuals per page.
Mistake: Ignoring RLS placement. Fix: If rules must live in the warehouse, avoid extracting sensitive facts; consider Composite.
Mistake: Skipping concurrent-user tests. Fix: Test with realistic page opens, not just single-user timings.
Mistake: Non-folding transformations. Fix: Use transformations that fold; push heavy logic into the warehouse.

Quick self-audit

Do you know the required freshness window?
Have you sized memory/refresh time for Import?
Do your Direct Query pages avoid high-cardinality fields?
Is there an aggregation strategy to catch 80–90% of queries?

Practical projects

Project A: Convert an Import-only report to Composite with a daily Import aggregate and Direct Query detail. Measure latency delta.
Project B: Implement incremental refresh on a 200M-row fact. Document refresh time before vs. after.
Project C: Create an aggregation table and mapping that answers at least three existing visuals without hitting the detail table.

Who this is for

BI Developers and Analytics Engineers deciding data access patterns for dashboards
Data Modelers optimizing semantic models for speed and governance

Prerequisites

Basic data modeling (facts, dimensions, star schemas)
Understanding of refresh, partitions, and query performance basics
Familiarity with your data warehouse behavior under load

Learning path

Start: Import vs Direct Query fundamentals (this page)
Then: Incremental refresh and partitioning
Then: Aggregations and composite models
Then: RLS design and evaluation
Finally: Concurrency testing and capacity planning

Next steps

Run one benchmark on an existing report (cold vs warm load).
Identify one page to simplify visuals for Direct Query.
Add an Import aggregate table for your top slow page.

Mini challenge

Pick a slow Direct Query page. Create a single Import aggregate that answers at least two visuals on that page. Measure before/after response times and write a 3-bullet summary of results.

Menu

Import Versus Direct Query Strategy

Table of Contents