Microsoft Fabric GA: The Unified Data Platform Revolution
I wrote “Microsoft Fabric GA: The Unified Data Platform Revolution” to share practical, production-minded guidance on this topic.
Microsoft Fabric reaching General Availability at Ignite 2023 (November 15-17) is the milestone that transforms the six-month public preview from an extended beta into a platform organisations can make long-term commitments to. Six months of preview is enough time to have done real work: I’ve built lakehouse pipelines on it, run notebooks on it, and watched the Capacity Metrics app tell me when I was over-consuming CUs during late-night data loads. GA doesn’t mean complete — there are still roadmap items in preview within the GA platform — but it means Microsoft is committed to backward compatibility and production SLAs for the core platform items. The workloads reaching GA: Lakehouse, Spark (Data Engineering), Data Factory, Warehouse, Power BI, and Real-Time Analytics. The workloads still in preview at GA: some Copilot features, some data science capabilities, Fabric Real-Time hub.
What is Microsoft Fabric?
Microsoft Fabric is an end-to-end analytics platform that brings together:
- Data Engineering (Data Factory, Spark)
- Data Science (ML models, experiments)
- Data Warehousing (Synapse warehouse)
- Real-time Analytics (KQL database)
- Business Intelligence (Power BI)
- Data Integration (Pipelines, Dataflows)
All built on a unified OneLake foundation.
The OneLake Revolution
OneLake is Fabric’s unified storage layer - think of it as OneDrive for data:
# Connecting to OneLake from Python
from azure.identity import DefaultAzureCredential
from azure.storage.filedatalake import DataLakeServiceClient
# OneLake uses the same APIs as Azure Data Lake Storage Gen2
account_url = "https://onelake.dfs.fabric.microsoft.com"
credential = DefaultAzureCredential()
service_client = DataLakeServiceClient(account_url, credential=credential)
# Access workspace as a filesystem
workspace_name = "my-workspace"
file_system_client = service_client.get_file_system_client(workspace_name)
# List items in workspace
paths = file_system_client.get_paths()
for path in paths:
print(f"{path.name} - {'Directory' if path.is_directory else 'File'}")
Key Fabric Components
1. Lakehouse
The Lakehouse combines data lake flexibility with warehouse structure:
-- Creating tables in Fabric Lakehouse
CREATE TABLE sales_data (
sale_id BIGINT,
product_id INT,
customer_id INT,
sale_date DATE,
quantity INT,
unit_price DECIMAL(10,2),
total_amount DECIMAL(10,2)
)
USING DELTA
PARTITIONED BY (sale_date);
-- Query with SQL
SELECT
DATE_TRUNC('month', sale_date) as month,
SUM(total_amount) as revenue
FROM sales_data
WHERE sale_date >= '2023-01-01'
GROUP BY DATE_TRUNC('month', sale_date)
ORDER BY month;
2. Warehouse
Full T-SQL data warehouse capabilities:
-- Fabric Warehouse supports T-SQL
CREATE TABLE dim_customer (
customer_key INT NOT NULL,
customer_id NVARCHAR(20),
customer_name NVARCHAR(100),
email NVARCHAR(200),
segment NVARCHAR(50),
valid_from DATETIME2,
valid_to DATETIME2,
is_current BIT
);
-- Create stored procedures
CREATE PROCEDURE usp_UpdateCustomerDimension
AS
BEGIN
-- SCD Type 2 implementation
UPDATE dim_customer
SET valid_to = GETDATE(),
is_current = 0
WHERE customer_id IN (
SELECT customer_id FROM staging_customer
WHERE customer_name != dim_customer.customer_name
)
AND is_current = 1;
INSERT INTO dim_customer
SELECT
NEXT VALUE FOR seq_customer_key,
customer_id,
customer_name,
email,
segment,
GETDATE(),
'9999-12-31',
1
FROM staging_customer s
WHERE NOT EXISTS (
SELECT 1 FROM dim_customer d
WHERE d.customer_id = s.customer_id AND d.is_current = 1
);
END;
3. Data Engineering with Spark
# PySpark in Fabric notebooks
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, sum, avg, window
# Spark session is pre-configured in Fabric
spark = SparkSession.builder.getOrCreate()
# Read from Lakehouse
df = spark.read.format("delta").load("Tables/sales_data")
# Perform transformations
daily_summary = df.groupBy(
window(col("sale_date"), "1 day").alias("date_window")
).agg(
sum("total_amount").alias("daily_revenue"),
avg("quantity").alias("avg_quantity"),
sum("quantity").alias("total_units")
)
# Write back to Lakehouse
daily_summary.write.format("delta").mode("overwrite").saveAsTable("daily_sales_summary")
# Or use Delta Lake time travel
historical_df = spark.read.format("delta").option("versionAsOf", 5).load("Tables/sales_data")
4. Real-time Analytics
// KQL for real-time analytics
SalesEvents
| where EventTime > ago(1h)
| summarize
TotalSales = sum(Amount),
TransactionCount = count(),
AvgOrderValue = avg(Amount)
by bin(EventTime, 5m)
| render timechart
// Create materialized view for dashboards
.create materialized-view HourlySalesSummary on table SalesEvents
{
SalesEvents
| summarize Sales = sum(Amount), Count = count() by bin(EventTime, 1h), Region
}
Migration Considerations
If you’re moving from existing platforms:
# Example: Migrating from Azure Synapse to Fabric
# Step 1: Export data to Parquet
synapse_df = spark.read.format("synapse").load("your_table")
synapse_df.write.format("parquet").save("abfss://container@storage.dfs.core.windows.net/migration/")
# Step 2: Create shortcut in OneLake (through Fabric UI)
# Or copy data directly
# Step 3: Create Delta table in Fabric
fabric_df = spark.read.format("parquet").load("Files/migration/")
fabric_df.write.format("delta").saveAsTable("migrated_table")
# Step 4: Validate data
source_count = synapse_df.count()
target_count = spark.table("migrated_table").count()
assert source_count == target_count, "Data count mismatch!"
What’s Next
Fabric GA is just the beginning. In the coming posts, we’ll explore:
- Licensing and capacity planning
- Governance and security features
- Copilot in Fabric
- Migration best practices
This is a transformative moment for data platforms. Whether you’re currently on Azure Synapse, Databricks, or on-premises solutions, Fabric deserves serious consideration for your data strategy.\n\n## Takeaways\n\nAdd a concise, personal takeaway and recommended next steps here.\n