5 min read
Direct Lake Mode: Power BI Performance Without Import Overhead
Direct Lake is a revolutionary connection mode in Power BI that combines the performance of Import mode with the freshness of DirectQuery. Today, I will explain how Direct Lake works and when to use it.
Understanding Connection Modes
┌─────────────────────────────────────────────────────┐
│ Power BI Connection Modes │
├─────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────┐│
│ │ Import Mode ││
│ │ Data copied into Power BI memory ││
│ │ + Fastest query performance ││
│ │ - Data refresh required ││
│ │ - Memory limits ││
│ └─────────────────────────────────────────────────┘│
│ │
│ ┌─────────────────────────────────────────────────┐│
│ │ DirectQuery ││
│ │ Queries sent to source database ││
│ │ + Always fresh data ││
│ │ - Slower query performance ││
│ │ - Source must be available ││
│ └─────────────────────────────────────────────────┘│
│ │
│ ┌─────────────────────────────────────────────────┐│
│ │ Direct Lake (NEW) ││
│ │ Reads directly from Delta/Parquet files ││
│ │ + Import-like performance ││
│ │ + Near real-time freshness ││
│ │ + No scheduled refresh needed ││
│ │ - Fabric/OneLake only ││
│ └─────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────┘
How Direct Lake Works
┌─────────────────────────────────────────────────────┐
│ Direct Lake Architecture │
├─────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────┐│
│ │ Power BI Report ││
│ │ (DAX Query) ││
│ └─────────────────────┬───────────────────────────┘│
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────┐│
│ │ Analysis Services Engine ││
│ │ (VertiPaq) - In-memory columnstore ││
│ │ ││
│ │ [Transcodes Parquet to VertiPaq on demand] ││
│ └─────────────────────┬───────────────────────────┘│
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────┐│
│ │ OneLake ││
│ │ Delta Tables (Parquet files) ││
│ │ ││
│ │ [No data copy - reads directly] ││
│ └─────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────┘
Creating Direct Lake Semantic Models
From Lakehouse
# Step 1: Prepare data in Lakehouse
# Create optimized tables for analytics
sales_df = spark.read.format("delta").table("silver_sales")
customers_df = spark.read.format("delta").table("silver_customers")
# Write to gold layer optimized for BI
sales_df.write \
.format("delta") \
.mode("overwrite") \
.option("delta.columnMapping.mode", "name") \
.saveAsTable("gold_fact_sales")
customers_df.write \
.format("delta") \
.mode("overwrite") \
.option("delta.columnMapping.mode", "name") \
.saveAsTable("gold_dim_customer")
# Step 2: Create semantic model in Fabric UI
# Lakehouse > SQL endpoint > New semantic model
# Select tables for the model
Semantic Model Configuration
{
"model": {
"name": "Sales Analytics",
"defaultMode": "DirectLake",
"tables": [
{
"name": "Sales",
"columns": [
{"name": "SalesKey", "dataType": "int64"},
{"name": "CustomerKey", "dataType": "int64"},
{"name": "ProductKey", "dataType": "int64"},
{"name": "SaleDate", "dataType": "dateTime"},
{"name": "Quantity", "dataType": "int64"},
{"name": "Amount", "dataType": "decimal"}
],
"partitions": [
{
"name": "DirectLakePartition",
"mode": "DirectLake",
"source": {
"type": "Entity",
"expression": "gold_fact_sales"
}
}
]
},
{
"name": "Customer",
"columns": [
{"name": "CustomerKey", "dataType": "int64"},
{"name": "CustomerName", "dataType": "string"},
{"name": "Segment", "dataType": "string"},
{"name": "Country", "dataType": "string"}
]
}
],
"relationships": [
{
"name": "Sales_Customer",
"fromTable": "Sales",
"fromColumn": "CustomerKey",
"toTable": "Customer",
"toColumn": "CustomerKey"
}
]
}
}
DAX Measures
DAX works the same way with Direct Lake:
// Measures work identically to Import mode
Total Sales = SUM(Sales[Amount])
Total Quantity = SUM(Sales[Quantity])
Average Order Value = DIVIDE([Total Sales], DISTINCTCOUNT(Sales[OrderID]))
YTD Sales = TOTALYTD([Total Sales], 'Date'[Date])
Sales LY = CALCULATE([Total Sales], SAMEPERIODLASTYEAR('Date'[Date]))
YoY Growth = DIVIDE([Total Sales] - [Sales LY], [Sales LY])
// Time intelligence
Sales MTD = TOTALMTD([Total Sales], 'Date'[Date])
Sales QTD = TOTALQTD([Total Sales], 'Date'[Date])
Rolling 12M Sales =
CALCULATE(
[Total Sales],
DATESINPERIOD('Date'[Date], MAX('Date'[Date]), -12, MONTH)
)
Framing and Fallback
Direct Lake uses “framing” to handle data updates:
# Framing: Point-in-time snapshot of Delta table
# When data changes in Lakehouse:
# 1. Power BI detects new Delta version
# 2. Creates new "frame" pointing to new data
# 3. Queries use new frame
# Fallback to DirectQuery
# If Direct Lake can't handle a query, it falls back to DirectQuery
# Reasons for fallback:
fallback_reasons = [
"Unsupported DAX functions",
"Query exceeds memory limits",
"Data types not supported",
"Complex calculations"
]
# Monitor fallback in semantic model settings
# Reduce fallback by optimizing data model
Optimization for Direct Lake
Table Design
-- Optimize table structure for Direct Lake
-- Use appropriate data types
CREATE TABLE gold_fact_sales (
sales_key BIGINT,
customer_key INT, -- Use INT for keys (smaller)
product_key INT,
date_key INT,
quantity INT,
unit_price DECIMAL(10,2), -- Appropriate precision
amount DECIMAL(15,2),
discount_pct DECIMAL(5,4)
)
USING DELTA;
-- Avoid:
-- - Large string columns
-- - Complex nested types
-- - Too many columns
V-Order Optimization
# V-Order optimizes Parquet files for Power BI
# Enable V-Order when writing tables
df.write \
.format("delta") \
.mode("overwrite") \
.option("vorder", "true") \
.saveAsTable("optimized_sales")
# Or optimize existing table
spark.sql("OPTIMIZE gold_fact_sales") # Includes V-Order in Fabric
Partitioning Strategy
# Partition large tables appropriately
# Direct Lake handles partitions efficiently
df.write \
.format("delta") \
.mode("overwrite") \
.partitionBy("year", "month") \
.saveAsTable("sales_partitioned")
# For Direct Lake:
# - Partition by low-cardinality columns
# - Avoid too many partitions
# - Year/Month is often good for time-series
When to Use Direct Lake
direct_lake_scenarios = {
"ideal_for": [
"Fabric Lakehouses and Warehouses",
"Large datasets (billions of rows)",
"Near real-time freshness requirements",
"Scenarios where import refresh is too slow"
],
"not_suitable_for": [
"External data sources (use Import or DirectQuery)",
"Complex row-level security scenarios",
"Highly complex DAX requiring DirectQuery fallback",
"Very small datasets (Import may be simpler)"
],
"hybrid_scenarios": [
"Mix Direct Lake with Import for reference tables",
"Use composite models when needed"
]
}
Monitoring Performance
# Check Direct Lake performance in Fabric
# Capacity Metrics app shows:
# - Query duration
# - Memory usage
# - Fallback frequency
# DAX Studio queries for analysis
# EVALUATE
# VAR _tables = INFO.TABLES()
# RETURN _tables
# Monitor in Power BI
# Performance analyzer shows query times
# Check for DirectQuery fallback warnings
Direct Lake is a game-changer for Power BI performance on Fabric data. Tomorrow, I will cover Semantic Models in more detail.