May 25, 2023 4 min read

Microsoft Fabric: The Biggest Data Launch Since SQL Server

Microsoft Fabric Azure Data Engineering Analytics Build

At Microsoft Build 2023, Satya Nadella announced Microsoft Fabric with a bold claim: “perhaps the biggest launch of a data product from Microsoft since the launch of SQL Server.” Having spent the past 48 hours exploring the preview, I think he might be right.

What is Microsoft Fabric?

Microsoft Fabric is a unified SaaS analytics platform that integrates:

Power BI - Business intelligence and visualization
Data Factory - Data integration and ETL
Synapse Data Engineering - Spark-based data engineering
Synapse Data Warehouse - SQL-based warehousing
Synapse Data Science - ML and data science workloads
Synapse Real-Time Analytics - Streaming and time-series data
Data Activator - Event-driven actions and alerts

All of this sits on top of OneLake - a single, unified data lake for your entire organization.

Why This Matters

The promise is simple: one platform, one experience, one security model, one governance layer, one capacity model.

If you’ve built data platforms on Azure, you know the pain:

Separate services with different permission models
Data copied between storage accounts
Multiple admin experiences
Complex capacity planning

Fabric aims to solve all of this.

OneLake: The Foundation

OneLake is the most significant architectural decision. It’s a single, organization-wide data lake that:

Automatically provisions with your Fabric tenant
Uses Delta Lake format by default
Enables shortcuts to existing ADLS, S3, or Dataverse data
Applies consistent governance across all data

# In a Fabric notebook, all data is accessible via OneLake paths
df = spark.read.format("delta").load("abfss://workspace@onelake.dfs.fabric.microsoft.com/lakehouse/Tables/sales")

# You don't need to manage storage accounts or credentials
# OneLake handles it all

The shortcut feature is clever - you can create references to data in existing storage without copying it. This enables gradual migration without disruption.

The Lakehouse Experience

Fabric’s Lakehouse combines the best of data lakes and data warehouses:

Write with Spark:

# Data engineering in notebooks
from pyspark.sql.functions import col, year, month

raw_df = spark.read.format("delta").load("Files/raw/sales/")

transformed = raw_df \
    .withColumn("year", year(col("sale_date"))) \
    .withColumn("month", month(col("sale_date"))) \
    .filter(col("amount") > 0)

transformed.write \
    .mode("overwrite") \
    .format("delta") \
    .saveAsTable("Tables/sales_curated")

Query with SQL:

-- SQL analytics endpoint provides instant T-SQL access
-- No data copy, same Delta tables
SELECT
    year,
    month,
    SUM(amount) as total_sales,
    COUNT(DISTINCT customer_id) as unique_customers
FROM sales_curated
GROUP BY year, month
ORDER BY year, month

Visualize in Power BI:

Power BI connects directly to the SQL endpoint. Changes to your Lakehouse tables appear immediately in reports - no import refresh needed.

Data Factory Redesigned

The Data Factory experience in Fabric is modernized:

Data Pipelines:

Familiar orchestration concepts from ADF
Native integration with Fabric artifacts
Simplified linked services (OneLake handles connections)

Dataflows Gen2:

Power Query-based transformations
Direct output to Lakehouse tables
Improved performance with Lakehouse staging

// Dataflow Gen2 M expression
let
    Source = Web.Contents("https://api.example.com/sales"),
    Data = Json.Document(Source),
    ToTable = Table.FromList(Data, Splitter.SplitByNothing()),
    Expanded = Table.ExpandRecordColumn(ToTable, "Column1", {"id", "date", "amount", "customer_id"})
in
    Expanded

// Output directly to: Lakehouse > Tables > raw_sales

Data Warehouse

Fabric includes a cloud-native data warehouse with:

Full T-SQL support
Automatic distribution and indexing
Cross-database queries
No cluster management

-- Create a warehouse table with no infrastructure concern
CREATE TABLE dim_customer (
    customer_key INT NOT NULL,
    customer_id NVARCHAR(50) NOT NULL,
    customer_name NVARCHAR(200),
    segment NVARCHAR(50),
    created_date DATE
);

-- Load from Lakehouse using COPY
COPY INTO dim_customer
FROM 'abfss://workspace@onelake.dfs.fabric.microsoft.com/lakehouse/Tables/customers'
WITH (FILE_TYPE = 'PARQUET');

Real-Time Analytics

For streaming and time-series workloads, Real-Time Analytics provides:

KQL (Kusto Query Language) databases
Sub-second ingestion latency
Direct integration with Event Hubs and Kafka
Real-time dashboards

// KQL query for streaming data analysis
sales_events
| where ingestion_time() > ago(1h)
| summarize
    event_count = count(),
    total_amount = sum(amount),
    avg_amount = avg(amount)
    by bin(event_time, 5m)
| render timechart

Pricing Model

Fabric uses a unified capacity model:

One SKU (Capacity Units) covers all workloads
No separate charges for storage, compute, or different engines
Pause/resume capacity for cost control
Auto-scale within capacity limits

This is a significant simplification. Instead of managing:

Azure Synapse dedicated pool DWUs
Databricks DBUs
Power BI Premium capacity
Azure Data Factory DIUs

You manage one capacity that covers everything.

What I Like

Unified Experience: One portal, one security model, one capacity. The cognitive load reduction is significant.

OneLake: A single data lake with consistent governance is what enterprises have been building manually for years.

Lakehouse Architecture: Write with Spark, query with SQL, visualize with Power BI - on the same data, no copying.

SaaS Model: No cluster management, auto-updates, built-in security.

Concerns

Maturity: This is preview. Some features are rough edges, some are incomplete.

Vendor Lock-in: Fabric is deeply Microsoft-native. Multi-cloud strategies become harder.

Migration Complexity: Existing Azure data platforms won’t migrate overnight.

Pricing at Scale: Need to see how capacity pricing works for large workloads.

Migration Thinking

If you’re running:

Power BI Premium + Azure Synapse: Fabric is a natural consolidation
Pure Databricks: Stay there; the Spark experience is more mature
Azure Data Factory + ADLS + Synapse Serverless: Fabric simplifies this significantly

For new projects, I’d start in Fabric if your organization is Microsoft-centric.

Getting Started

Fabric is available in preview:

Enable Fabric in your Power BI admin portal
Create a Fabric capacity (or use trial)
Create a workspace with Fabric enabled
Start with a Lakehouse - it’s the foundation

My Take

Microsoft Fabric is Microsoft’s answer to the modern data platform. It’s ambitious, it’s opinionated, and it’s deeply integrated. For organizations committed to the Microsoft ecosystem, this could be the platform that finally delivers the “single pane of glass” promise.

I’m cautiously optimistic. The architecture is sound. The execution in preview is promising. The real test will be production workloads at scale.

More deep dives coming as I explore each component.