January 13, 2024 1 min read

Microsoft Fabric January 2024 Updates: What's New and What It Means

Microsoft Fabric Data Platform Power BI Azure Analytics

Microsoft Fabric continues its rapid evolution since GA in November 2023. January 2024 brings significant updates across the platform. Here’s what’s new and how to take advantage of these features.

Key Updates Overview

1. Copilot for Fabric Enhancements

Copilot capabilities have expanded across workloads:

# New: Natural language to Spark code
# In Fabric notebooks, use the Copilot sidebar

# Example prompt: "Load the sales data from OneLake and calculate year-over-year growth by region"

# Copilot generates:
from pyspark.sql.functions import col, sum, lag, year
from pyspark.sql.window import Window

# Load sales data
sales_df = spark.read.format("delta").load("Tables/sales_fact")

# Calculate YoY growth
window_spec = Window.partitionBy("region").orderBy("year")

growth_df = sales_df \
    .groupBy("region", year("sale_date").alias("year")) \
    .agg(sum("amount").alias("total_sales")) \
    .withColumn("prev_year_sales", lag("total_sales").over(window_spec)) \
    .withColumn(
        "yoy_growth",
        (col("total_sales") - col("prev_year_sales")) / col("prev_year_sales") * 100
    )

display(growth_df)

2. Direct Lake Performance Improvements

Direct Lake mode now supports:

Larger tables (up to 50GB per table in preview)
Improved fallback handling
Better query performance

# Monitor Direct Lake performance
from sempy import fabric

# Get Direct Lake metrics
metrics = fabric.get_direct_lake_metrics(
    workspace="your-workspace",
    dataset="your-dataset"
)

# Check for fallback events
fallback_events = [m for m in metrics if m["type"] == "fallback"]

if fallback_events:
    print("Warning: Direct Lake fell back to import mode")
    for event in fallback_events:
        print(f"  Table: {event['table']}, Reason: {event['reason']}")

3. OneLake File Explorer Updates

The OneLake File Explorer now supports:

Sync status indicators
Selective sync
Conflict resolution

# PowerShell commands for OneLake management
# Sync specific folders
onelake sync --workspace "production" --lakehouse "main" --path "Files/reports"

# Check sync status
onelake status --workspace "production"

# Resolve conflicts
onelake resolve --workspace "production" --strategy "remote-wins"

4. Data Pipeline Improvements

New activities and connectors:

{
  "name": "Enhanced Copy Activity",
  "activities": [
    {
      "name": "Copy with AI Mapping",
      "type": "Copy",
      "inputs": [
        {
          "type": "AzureBlobStorageSource",
          "connection": "blob-connection"
        }
      ],
      "outputs": [
        {
          "type": "LakehouseTable",
          "lakehouse": "my-lakehouse",
          "table": "destination_table"
        }
      ],
      "mappingType": "intelligent",
      "schemaEvolution": {
        "enabled": true,
        "addNewColumns": true,
        "handleTypeChanges": "coerce"
      }
    }
  ]
}

5. Real-Time Intelligence Updates

Eventstreams now support:

// New KQL functions for real-time data
// Tumbling window with late arrival handling
events
| where ingestion_time() > ago(1h)
| summarize
    count = count(),
    avg_value = avg(value),
    late_arrivals = countif(event_time < bin(ingestion_time(), 5m) - 10m)
    by bin(event_time, 5m)
| where late_arrivals > 0
| project event_time, count, avg_value, late_arrivals

Practical Implementation

Setting Up a Modern Data Pipeline

# Complete Fabric notebook for data ingestion pattern

from pyspark.sql.functions import *
from pyspark.sql.types import *
from delta.tables import DeltaTable
import json

# Configuration
config = {
    "source_path": "abfss://raw@onelake.dfs.fabric.microsoft.com/lakehouse/Files/incoming",
    "bronze_table": "Tables/bronze_sales",
    "silver_table": "Tables/silver_sales",
    "gold_table": "Tables/gold_sales_summary"
}

# Bronze Layer: Raw ingestion
def ingest_to_bronze(source_path: str, bronze_table: str):
    """Ingest raw data to bronze layer."""

    # Read with schema inference
    raw_df = spark.read \
        .option("inferSchema", "true") \
        .option("header", "true") \
        .csv(source_path)

    # Add metadata
    bronze_df = raw_df \
        .withColumn("_ingestion_timestamp", current_timestamp()) \
        .withColumn("_source_file", input_file_name()) \
        .withColumn("_batch_id", lit(spark.sparkContext.applicationId))

    # Write to bronze (append mode)
    bronze_df.write \
        .format("delta") \
        .mode("append") \
        .option("mergeSchema", "true") \
        .saveAsTable(bronze_table)

    return bronze_df.count()

# Silver Layer: Cleansed and conformed
def process_to_silver(bronze_table: str, silver_table: str):
    """Transform bronze to silver with quality checks."""

    bronze_df = spark.read.table(bronze_table)

    # Data quality checks
    quality_checks = bronze_df \
        .withColumn("is_valid_date", col("sale_date").isNotNull()) \
        .withColumn("is_valid_amount", col("amount") > 0) \
        .withColumn("is_valid_customer", col("customer_id").isNotNull())

    # Filter to valid records only
    valid_df = quality_checks.filter(
        col("is_valid_date") & col("is_valid_amount") & col("is_valid_customer")
    )

    # Log invalid records
    invalid_count = quality_checks.filter(
        ~(col("is_valid_date") & col("is_valid_amount") & col("is_valid_customer"))
    ).count()

    if invalid_count > 0:
        print(f"Warning: {invalid_count} invalid records filtered out")

    # Transform
    silver_df = valid_df \
        .withColumn("sale_date", to_date("sale_date")) \
        .withColumn("amount", col("amount").cast("decimal(18,2)")) \
        .withColumn("year", year("sale_date")) \
        .withColumn("month", month("sale_date")) \
        .withColumn("quarter", quarter("sale_date")) \
        .select(
            "sale_id",
            "customer_id",
            "product_id",
            "sale_date",
            "amount",
            "quantity",
            "year",
            "month",
            "quarter",
            "_ingestion_timestamp"
        )

    # Merge into silver table (SCD Type 1)
    if DeltaTable.isDeltaTable(spark, silver_table):
        delta_table = DeltaTable.forName(spark, silver_table)

        delta_table.alias("target") \
            .merge(silver_df.alias("source"), "target.sale_id = source.sale_id") \
            .whenMatchedUpdateAll() \
            .whenNotMatchedInsertAll() \
            .execute()
    else:
        silver_df.write.format("delta").saveAsTable(silver_table)

    return silver_df.count()

# Gold Layer: Business aggregates
def build_gold_summary(silver_table: str, gold_table: str):
    """Build gold summary table for reporting."""

    silver_df = spark.read.table(silver_table)

    # Create business summary
    gold_df = silver_df \
        .groupBy("year", "month", "quarter", "product_id") \
        .agg(
            count("sale_id").alias("transaction_count"),
            sum("amount").alias("total_revenue"),
            avg("amount").alias("avg_transaction_value"),
            countDistinct("customer_id").alias("unique_customers"),
            sum("quantity").alias("units_sold")
        ) \
        .withColumn("revenue_per_customer",
            col("total_revenue") / col("unique_customers")
        ) \
        .withColumn("_processed_at", current_timestamp())

    # Overwrite gold table
    gold_df.write \
        .format("delta") \
        .mode("overwrite") \
        .saveAsTable(gold_table)

    return gold_df.count()

# Execute pipeline
print(f"Bronze ingested: {ingest_to_bronze(config['source_path'], config['bronze_table'])} rows")
print(f"Silver processed: {process_to_silver(config['bronze_table'], config['silver_table'])} rows")
print(f"Gold summarized: {build_gold_summary(config['silver_table'], config['gold_table'])} rows")

Monitoring with Fabric Metrics

# Using the Fabric Python SDK for monitoring
from sempy import fabric

# Get workspace capacity utilization
capacity_metrics = fabric.get_capacity_metrics(
    capacity_name="your-capacity",
    timeframe="last_24h"
)

# Display CU utilization
for metric in capacity_metrics:
    print(f"Time: {metric['timestamp']}")
    print(f"  CU Used: {metric['cu_used']:.2f}")
    print(f"  CU Available: {metric['cu_available']:.2f}")
    print(f"  Utilization: {metric['utilization_pct']:.1f}%")

# Get pipeline run history
pipeline_runs = fabric.get_pipeline_runs(
    workspace="your-workspace",
    pipeline="daily-ingestion",
    days=7
)

# Analyze failures
failures = [r for r in pipeline_runs if r["status"] == "Failed"]
if failures:
    print(f"\n{len(failures)} failures in the last 7 days:")
    for f in failures:
        print(f"  {f['start_time']}: {f['error_message']}")

Best Practices for January 2024

Enable Copilot selectively - Start with data engineering notebooks
Monitor Direct Lake fallbacks - Set up alerts for fallback events
Use intelligent schema mapping - Reduces manual pipeline work
Leverage OneLake shortcuts - Avoid data duplication
Implement medallion architecture - Bronze/Silver/Gold pattern

Coming Soon

Based on the Fabric roadmap:

Git integration GA (expected Q1 2024)
Enhanced workspace migration tools
Improved cross-workspace lineage
More Copilot capabilities

Conclusion

Microsoft Fabric’s January 2024 updates focus on polish and enterprise readiness. The improvements to Direct Lake, Copilot, and data pipelines address real production needs. Start incorporating these features into your data platform strategy.