5 min read
Microsoft Fabric January 2024 Updates: What's New and What It Means
Microsoft Fabric continues its rapid evolution since GA in November 2023. January 2024 brings significant updates across the platform. Here’s what’s new and how to take advantage of these features.
Key Updates Overview
1. Copilot for Fabric Enhancements
Copilot capabilities have expanded across workloads:
# New: Natural language to Spark code
# In Fabric notebooks, use the Copilot sidebar
# Example prompt: "Load the sales data from OneLake and calculate year-over-year growth by region"
# Copilot generates:
from pyspark.sql.functions import col, sum, lag, year
from pyspark.sql.window import Window
# Load sales data
sales_df = spark.read.format("delta").load("Tables/sales_fact")
# Calculate YoY growth
window_spec = Window.partitionBy("region").orderBy("year")
growth_df = sales_df \
.groupBy("region", year("sale_date").alias("year")) \
.agg(sum("amount").alias("total_sales")) \
.withColumn("prev_year_sales", lag("total_sales").over(window_spec)) \
.withColumn(
"yoy_growth",
(col("total_sales") - col("prev_year_sales")) / col("prev_year_sales") * 100
)
display(growth_df)
2. Direct Lake Performance Improvements
Direct Lake mode now supports:
- Larger tables (up to 50GB per table in preview)
- Improved fallback handling
- Better query performance
# Monitor Direct Lake performance
from sempy import fabric
# Get Direct Lake metrics
metrics = fabric.get_direct_lake_metrics(
workspace="your-workspace",
dataset="your-dataset"
)
# Check for fallback events
fallback_events = [m for m in metrics if m["type"] == "fallback"]
if fallback_events:
print("Warning: Direct Lake fell back to import mode")
for event in fallback_events:
print(f" Table: {event['table']}, Reason: {event['reason']}")
3. OneLake File Explorer Updates
The OneLake File Explorer now supports:
- Sync status indicators
- Selective sync
- Conflict resolution
# PowerShell commands for OneLake management
# Sync specific folders
onelake sync --workspace "production" --lakehouse "main" --path "Files/reports"
# Check sync status
onelake status --workspace "production"
# Resolve conflicts
onelake resolve --workspace "production" --strategy "remote-wins"
4. Data Pipeline Improvements
New activities and connectors:
{
"name": "Enhanced Copy Activity",
"activities": [
{
"name": "Copy with AI Mapping",
"type": "Copy",
"inputs": [
{
"type": "AzureBlobStorageSource",
"connection": "blob-connection"
}
],
"outputs": [
{
"type": "LakehouseTable",
"lakehouse": "my-lakehouse",
"table": "destination_table"
}
],
"mappingType": "intelligent",
"schemaEvolution": {
"enabled": true,
"addNewColumns": true,
"handleTypeChanges": "coerce"
}
}
]
}
5. Real-Time Intelligence Updates
Eventstreams now support:
// New KQL functions for real-time data
// Tumbling window with late arrival handling
events
| where ingestion_time() > ago(1h)
| summarize
count = count(),
avg_value = avg(value),
late_arrivals = countif(event_time < bin(ingestion_time(), 5m) - 10m)
by bin(event_time, 5m)
| where late_arrivals > 0
| project event_time, count, avg_value, late_arrivals
Practical Implementation
Setting Up a Modern Data Pipeline
# Complete Fabric notebook for data ingestion pattern
from pyspark.sql.functions import *
from pyspark.sql.types import *
from delta.tables import DeltaTable
import json
# Configuration
config = {
"source_path": "abfss://raw@onelake.dfs.fabric.microsoft.com/lakehouse/Files/incoming",
"bronze_table": "Tables/bronze_sales",
"silver_table": "Tables/silver_sales",
"gold_table": "Tables/gold_sales_summary"
}
# Bronze Layer: Raw ingestion
def ingest_to_bronze(source_path: str, bronze_table: str):
"""Ingest raw data to bronze layer."""
# Read with schema inference
raw_df = spark.read \
.option("inferSchema", "true") \
.option("header", "true") \
.csv(source_path)
# Add metadata
bronze_df = raw_df \
.withColumn("_ingestion_timestamp", current_timestamp()) \
.withColumn("_source_file", input_file_name()) \
.withColumn("_batch_id", lit(spark.sparkContext.applicationId))
# Write to bronze (append mode)
bronze_df.write \
.format("delta") \
.mode("append") \
.option("mergeSchema", "true") \
.saveAsTable(bronze_table)
return bronze_df.count()
# Silver Layer: Cleansed and conformed
def process_to_silver(bronze_table: str, silver_table: str):
"""Transform bronze to silver with quality checks."""
bronze_df = spark.read.table(bronze_table)
# Data quality checks
quality_checks = bronze_df \
.withColumn("is_valid_date", col("sale_date").isNotNull()) \
.withColumn("is_valid_amount", col("amount") > 0) \
.withColumn("is_valid_customer", col("customer_id").isNotNull())
# Filter to valid records only
valid_df = quality_checks.filter(
col("is_valid_date") & col("is_valid_amount") & col("is_valid_customer")
)
# Log invalid records
invalid_count = quality_checks.filter(
~(col("is_valid_date") & col("is_valid_amount") & col("is_valid_customer"))
).count()
if invalid_count > 0:
print(f"Warning: {invalid_count} invalid records filtered out")
# Transform
silver_df = valid_df \
.withColumn("sale_date", to_date("sale_date")) \
.withColumn("amount", col("amount").cast("decimal(18,2)")) \
.withColumn("year", year("sale_date")) \
.withColumn("month", month("sale_date")) \
.withColumn("quarter", quarter("sale_date")) \
.select(
"sale_id",
"customer_id",
"product_id",
"sale_date",
"amount",
"quantity",
"year",
"month",
"quarter",
"_ingestion_timestamp"
)
# Merge into silver table (SCD Type 1)
if DeltaTable.isDeltaTable(spark, silver_table):
delta_table = DeltaTable.forName(spark, silver_table)
delta_table.alias("target") \
.merge(silver_df.alias("source"), "target.sale_id = source.sale_id") \
.whenMatchedUpdateAll() \
.whenNotMatchedInsertAll() \
.execute()
else:
silver_df.write.format("delta").saveAsTable(silver_table)
return silver_df.count()
# Gold Layer: Business aggregates
def build_gold_summary(silver_table: str, gold_table: str):
"""Build gold summary table for reporting."""
silver_df = spark.read.table(silver_table)
# Create business summary
gold_df = silver_df \
.groupBy("year", "month", "quarter", "product_id") \
.agg(
count("sale_id").alias("transaction_count"),
sum("amount").alias("total_revenue"),
avg("amount").alias("avg_transaction_value"),
countDistinct("customer_id").alias("unique_customers"),
sum("quantity").alias("units_sold")
) \
.withColumn("revenue_per_customer",
col("total_revenue") / col("unique_customers")
) \
.withColumn("_processed_at", current_timestamp())
# Overwrite gold table
gold_df.write \
.format("delta") \
.mode("overwrite") \
.saveAsTable(gold_table)
return gold_df.count()
# Execute pipeline
print(f"Bronze ingested: {ingest_to_bronze(config['source_path'], config['bronze_table'])} rows")
print(f"Silver processed: {process_to_silver(config['bronze_table'], config['silver_table'])} rows")
print(f"Gold summarized: {build_gold_summary(config['silver_table'], config['gold_table'])} rows")
Monitoring with Fabric Metrics
# Using the Fabric Python SDK for monitoring
from sempy import fabric
# Get workspace capacity utilization
capacity_metrics = fabric.get_capacity_metrics(
capacity_name="your-capacity",
timeframe="last_24h"
)
# Display CU utilization
for metric in capacity_metrics:
print(f"Time: {metric['timestamp']}")
print(f" CU Used: {metric['cu_used']:.2f}")
print(f" CU Available: {metric['cu_available']:.2f}")
print(f" Utilization: {metric['utilization_pct']:.1f}%")
# Get pipeline run history
pipeline_runs = fabric.get_pipeline_runs(
workspace="your-workspace",
pipeline="daily-ingestion",
days=7
)
# Analyze failures
failures = [r for r in pipeline_runs if r["status"] == "Failed"]
if failures:
print(f"\n{len(failures)} failures in the last 7 days:")
for f in failures:
print(f" {f['start_time']}: {f['error_message']}")
Best Practices for January 2024
- Enable Copilot selectively - Start with data engineering notebooks
- Monitor Direct Lake fallbacks - Set up alerts for fallback events
- Use intelligent schema mapping - Reduces manual pipeline work
- Leverage OneLake shortcuts - Avoid data duplication
- Implement medallion architecture - Bronze/Silver/Gold pattern
Coming Soon
Based on the Fabric roadmap:
- Git integration GA (expected Q1 2024)
- Enhanced workspace migration tools
- Improved cross-workspace lineage
- More Copilot capabilities
Conclusion
Microsoft Fabric’s January 2024 updates focus on polish and enterprise readiness. The improvements to Direct Lake, Copilot, and data pipelines address real production needs. Start incorporating these features into your data platform strategy.