Skip to content
Back to Blog
2 min read

Microsoft Fabric GA: The Unified Data Platform Revolution

I wrote “Microsoft Fabric GA: The Unified Data Platform Revolution” to share practical, production-minded guidance on this topic.

Microsoft Fabric reaching General Availability at Ignite 2023 (November 15-17) is the milestone that transforms the six-month public preview from an extended beta into a platform organisations can make long-term commitments to. Six months of preview is enough time to have done real work: I’ve built lakehouse pipelines on it, run notebooks on it, and watched the Capacity Metrics app tell me when I was over-consuming CUs during late-night data loads. GA doesn’t mean complete — there are still roadmap items in preview within the GA platform — but it means Microsoft is committed to backward compatibility and production SLAs for the core platform items. The workloads reaching GA: Lakehouse, Spark (Data Engineering), Data Factory, Warehouse, Power BI, and Real-Time Analytics. The workloads still in preview at GA: some Copilot features, some data science capabilities, Fabric Real-Time hub.

What is Microsoft Fabric?

Microsoft Fabric is an end-to-end analytics platform that brings together:

  • Data Engineering (Data Factory, Spark)
  • Data Science (ML models, experiments)
  • Data Warehousing (Synapse warehouse)
  • Real-time Analytics (KQL database)
  • Business Intelligence (Power BI)
  • Data Integration (Pipelines, Dataflows)

All built on a unified OneLake foundation.

The OneLake Revolution

OneLake is Fabric’s unified storage layer - think of it as OneDrive for data:

# Connecting to OneLake from Python
from azure.identity import DefaultAzureCredential
from azure.storage.filedatalake import DataLakeServiceClient

# OneLake uses the same APIs as Azure Data Lake Storage Gen2
account_url = "https://onelake.dfs.fabric.microsoft.com"
credential = DefaultAzureCredential()

service_client = DataLakeServiceClient(account_url, credential=credential)

# Access workspace as a filesystem
workspace_name = "my-workspace"
file_system_client = service_client.get_file_system_client(workspace_name)

# List items in workspace
paths = file_system_client.get_paths()
for path in paths:
    print(f"{path.name} - {'Directory' if path.is_directory else 'File'}")

Key Fabric Components

1. Lakehouse

The Lakehouse combines data lake flexibility with warehouse structure:

-- Creating tables in Fabric Lakehouse
CREATE TABLE sales_data (
    sale_id BIGINT,
    product_id INT,
    customer_id INT,
    sale_date DATE,
    quantity INT,
    unit_price DECIMAL(10,2),
    total_amount DECIMAL(10,2)
)
USING DELTA
PARTITIONED BY (sale_date);

-- Query with SQL
SELECT
    DATE_TRUNC('month', sale_date) as month,
    SUM(total_amount) as revenue
FROM sales_data
WHERE sale_date >= '2023-01-01'
GROUP BY DATE_TRUNC('month', sale_date)
ORDER BY month;

2. Warehouse

Full T-SQL data warehouse capabilities:

-- Fabric Warehouse supports T-SQL
CREATE TABLE dim_customer (
    customer_key INT NOT NULL,
    customer_id NVARCHAR(20),
    customer_name NVARCHAR(100),
    email NVARCHAR(200),
    segment NVARCHAR(50),
    valid_from DATETIME2,
    valid_to DATETIME2,
    is_current BIT
);

-- Create stored procedures
CREATE PROCEDURE usp_UpdateCustomerDimension
AS
BEGIN
    -- SCD Type 2 implementation
    UPDATE dim_customer
    SET valid_to = GETDATE(),
        is_current = 0
    WHERE customer_id IN (
        SELECT customer_id FROM staging_customer
        WHERE customer_name != dim_customer.customer_name
    )
    AND is_current = 1;

    INSERT INTO dim_customer
    SELECT
        NEXT VALUE FOR seq_customer_key,
        customer_id,
        customer_name,
        email,
        segment,
        GETDATE(),
        '9999-12-31',
        1
    FROM staging_customer s
    WHERE NOT EXISTS (
        SELECT 1 FROM dim_customer d
        WHERE d.customer_id = s.customer_id AND d.is_current = 1
    );
END;

3. Data Engineering with Spark

# PySpark in Fabric notebooks
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, sum, avg, window

# Spark session is pre-configured in Fabric
spark = SparkSession.builder.getOrCreate()

# Read from Lakehouse
df = spark.read.format("delta").load("Tables/sales_data")

# Perform transformations
daily_summary = df.groupBy(
    window(col("sale_date"), "1 day").alias("date_window")
).agg(
    sum("total_amount").alias("daily_revenue"),
    avg("quantity").alias("avg_quantity"),
    sum("quantity").alias("total_units")
)

# Write back to Lakehouse
daily_summary.write.format("delta").mode("overwrite").saveAsTable("daily_sales_summary")

# Or use Delta Lake time travel
historical_df = spark.read.format("delta").option("versionAsOf", 5).load("Tables/sales_data")

4. Real-time Analytics

// KQL for real-time analytics
SalesEvents
| where EventTime > ago(1h)
| summarize
    TotalSales = sum(Amount),
    TransactionCount = count(),
    AvgOrderValue = avg(Amount)
    by bin(EventTime, 5m)
| render timechart

// Create materialized view for dashboards
.create materialized-view HourlySalesSummary on table SalesEvents
{
    SalesEvents
    | summarize Sales = sum(Amount), Count = count() by bin(EventTime, 1h), Region
}

Migration Considerations

If you’re moving from existing platforms:

# Example: Migrating from Azure Synapse to Fabric
# Step 1: Export data to Parquet
synapse_df = spark.read.format("synapse").load("your_table")
synapse_df.write.format("parquet").save("abfss://container@storage.dfs.core.windows.net/migration/")

# Step 2: Create shortcut in OneLake (through Fabric UI)
# Or copy data directly

# Step 3: Create Delta table in Fabric
fabric_df = spark.read.format("parquet").load("Files/migration/")
fabric_df.write.format("delta").saveAsTable("migrated_table")

# Step 4: Validate data
source_count = synapse_df.count()
target_count = spark.table("migrated_table").count()
assert source_count == target_count, "Data count mismatch!"

What’s Next

Fabric GA is just the beginning. In the coming posts, we’ll explore:

  • Licensing and capacity planning
  • Governance and security features
  • Copilot in Fabric
  • Migration best practices

This is a transformative moment for data platforms. Whether you’re currently on Azure Synapse, Databricks, or on-premises solutions, Fabric deserves serious consideration for your data strategy.\n\n## Takeaways\n\nAdd a concise, personal takeaway and recommended next steps here.\n

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.