June 4, 2023 1 min read

Direct Lake Mode: Power BI Performance Without Import Overhead

Power BI Direct Lake Microsoft Fabric Performance Analytics

Direct Lake is a revolutionary connection mode in Power BI that combines the performance of Import mode with the freshness of DirectQuery. Today, I will explain how Direct Lake works and when to use it.

Understanding Connection Modes

┌─────────────────────────────────────────────────────┐
│           Power BI Connection Modes                  │
├─────────────────────────────────────────────────────┤
│                                                      │
│  ┌─────────────────────────────────────────────────┐│
│  │ Import Mode                                      ││
│  │ Data copied into Power BI memory                ││
│  │ + Fastest query performance                      ││
│  │ - Data refresh required                          ││
│  │ - Memory limits                                  ││
│  └─────────────────────────────────────────────────┘│
│                                                      │
│  ┌─────────────────────────────────────────────────┐│
│  │ DirectQuery                                      ││
│  │ Queries sent to source database                 ││
│  │ + Always fresh data                              ││
│  │ - Slower query performance                       ││
│  │ - Source must be available                       ││
│  └─────────────────────────────────────────────────┘│
│                                                      │
│  ┌─────────────────────────────────────────────────┐│
│  │ Direct Lake (NEW)                                ││
│  │ Reads directly from Delta/Parquet files         ││
│  │ + Import-like performance                        ││
│  │ + Near real-time freshness                       ││
│  │ + No scheduled refresh needed                    ││
│  │ - Fabric/OneLake only                            ││
│  └─────────────────────────────────────────────────┘│
│                                                      │
└─────────────────────────────────────────────────────┘

How Direct Lake Works

┌─────────────────────────────────────────────────────┐
│              Direct Lake Architecture                │
├─────────────────────────────────────────────────────┤
│                                                      │
│  ┌─────────────────────────────────────────────────┐│
│  │              Power BI Report                     ││
│  │  (DAX Query)                                     ││
│  └─────────────────────┬───────────────────────────┘│
│                        │                            │
│                        ▼                            │
│  ┌─────────────────────────────────────────────────┐│
│  │          Analysis Services Engine               ││
│  │  (VertiPaq) - In-memory columnstore             ││
│  │                                                  ││
│  │  [Transcodes Parquet to VertiPaq on demand]    ││
│  └─────────────────────┬───────────────────────────┘│
│                        │                            │
│                        ▼                            │
│  ┌─────────────────────────────────────────────────┐│
│  │              OneLake                             ││
│  │  Delta Tables (Parquet files)                   ││
│  │                                                  ││
│  │  [No data copy - reads directly]                ││
│  └─────────────────────────────────────────────────┘│
│                                                      │
└─────────────────────────────────────────────────────┘

Creating Direct Lake Semantic Models

From Lakehouse

# Step 1: Prepare data in Lakehouse
# Create optimized tables for analytics

sales_df = spark.read.format("delta").table("silver_sales")
customers_df = spark.read.format("delta").table("silver_customers")

# Write to gold layer optimized for BI
sales_df.write \
    .format("delta") \
    .mode("overwrite") \
    .option("delta.columnMapping.mode", "name") \
    .saveAsTable("gold_fact_sales")

customers_df.write \
    .format("delta") \
    .mode("overwrite") \
    .option("delta.columnMapping.mode", "name") \
    .saveAsTable("gold_dim_customer")

# Step 2: Create semantic model in Fabric UI
# Lakehouse > SQL endpoint > New semantic model
# Select tables for the model

Semantic Model Configuration

{
  "model": {
    "name": "Sales Analytics",
    "defaultMode": "DirectLake",
    "tables": [
      {
        "name": "Sales",
        "columns": [
          {"name": "SalesKey", "dataType": "int64"},
          {"name": "CustomerKey", "dataType": "int64"},
          {"name": "ProductKey", "dataType": "int64"},
          {"name": "SaleDate", "dataType": "dateTime"},
          {"name": "Quantity", "dataType": "int64"},
          {"name": "Amount", "dataType": "decimal"}
        ],
        "partitions": [
          {
            "name": "DirectLakePartition",
            "mode": "DirectLake",
            "source": {
              "type": "Entity",
              "expression": "gold_fact_sales"
            }
          }
        ]
      },
      {
        "name": "Customer",
        "columns": [
          {"name": "CustomerKey", "dataType": "int64"},
          {"name": "CustomerName", "dataType": "string"},
          {"name": "Segment", "dataType": "string"},
          {"name": "Country", "dataType": "string"}
        ]
      }
    ],
    "relationships": [
      {
        "name": "Sales_Customer",
        "fromTable": "Sales",
        "fromColumn": "CustomerKey",
        "toTable": "Customer",
        "toColumn": "CustomerKey"
      }
    ]
  }
}

DAX Measures

DAX works the same way with Direct Lake:

// Measures work identically to Import mode
Total Sales = SUM(Sales[Amount])

Total Quantity = SUM(Sales[Quantity])

Average Order Value = DIVIDE([Total Sales], DISTINCTCOUNT(Sales[OrderID]))

YTD Sales = TOTALYTD([Total Sales], 'Date'[Date])

Sales LY = CALCULATE([Total Sales], SAMEPERIODLASTYEAR('Date'[Date]))

YoY Growth = DIVIDE([Total Sales] - [Sales LY], [Sales LY])

// Time intelligence
Sales MTD = TOTALMTD([Total Sales], 'Date'[Date])

Sales QTD = TOTALQTD([Total Sales], 'Date'[Date])

Rolling 12M Sales =
CALCULATE(
    [Total Sales],
    DATESINPERIOD('Date'[Date], MAX('Date'[Date]), -12, MONTH)
)

Framing and Fallback

Direct Lake uses “framing” to handle data updates:

# Framing: Point-in-time snapshot of Delta table
# When data changes in Lakehouse:
# 1. Power BI detects new Delta version
# 2. Creates new "frame" pointing to new data
# 3. Queries use new frame

# Fallback to DirectQuery
# If Direct Lake can't handle a query, it falls back to DirectQuery
# Reasons for fallback:
fallback_reasons = [
    "Unsupported DAX functions",
    "Query exceeds memory limits",
    "Data types not supported",
    "Complex calculations"
]

# Monitor fallback in semantic model settings
# Reduce fallback by optimizing data model

Optimization for Direct Lake

Table Design

-- Optimize table structure for Direct Lake
-- Use appropriate data types
CREATE TABLE gold_fact_sales (
    sales_key BIGINT,
    customer_key INT,          -- Use INT for keys (smaller)
    product_key INT,
    date_key INT,
    quantity INT,
    unit_price DECIMAL(10,2),  -- Appropriate precision
    amount DECIMAL(15,2),
    discount_pct DECIMAL(5,4)
)
USING DELTA;

-- Avoid:
-- - Large string columns
-- - Complex nested types
-- - Too many columns

V-Order Optimization

# V-Order optimizes Parquet files for Power BI
# Enable V-Order when writing tables

df.write \
    .format("delta") \
    .mode("overwrite") \
    .option("vorder", "true") \
    .saveAsTable("optimized_sales")

# Or optimize existing table
spark.sql("OPTIMIZE gold_fact_sales")  # Includes V-Order in Fabric

Partitioning Strategy

# Partition large tables appropriately
# Direct Lake handles partitions efficiently

df.write \
    .format("delta") \
    .mode("overwrite") \
    .partitionBy("year", "month") \
    .saveAsTable("sales_partitioned")

# For Direct Lake:
# - Partition by low-cardinality columns
# - Avoid too many partitions
# - Year/Month is often good for time-series

When to Use Direct Lake

direct_lake_scenarios = {
    "ideal_for": [
        "Fabric Lakehouses and Warehouses",
        "Large datasets (billions of rows)",
        "Near real-time freshness requirements",
        "Scenarios where import refresh is too slow"
    ],
    "not_suitable_for": [
        "External data sources (use Import or DirectQuery)",
        "Complex row-level security scenarios",
        "Highly complex DAX requiring DirectQuery fallback",
        "Very small datasets (Import may be simpler)"
    ],
    "hybrid_scenarios": [
        "Mix Direct Lake with Import for reference tables",
        "Use composite models when needed"
    ]
}

Monitoring Performance

# Check Direct Lake performance in Fabric
# Capacity Metrics app shows:
# - Query duration
# - Memory usage
# - Fallback frequency

# DAX Studio queries for analysis
# EVALUATE
# VAR _tables = INFO.TABLES()
# RETURN _tables

# Monitor in Power BI
# Performance analyzer shows query times
# Check for DirectQuery fallback warnings

Direct Lake is a game-changer for Power BI performance on Fabric data. Tomorrow, I will cover Semantic Models in more detail.