Microsoft Fabric: The Biggest Data Launch Since SQL Server
At Microsoft Build 2023, Satya Nadella announced Microsoft Fabric with a bold claim: “perhaps the biggest launch of a data product from Microsoft since the launch of SQL Server.” Having spent the past 48 hours exploring the preview, I think he might be right.
What is Microsoft Fabric?
Microsoft Fabric is a unified SaaS analytics platform that integrates:
- Power BI - Business intelligence and visualization
- Data Factory - Data integration and ETL
- Synapse Data Engineering - Spark-based data engineering
- Synapse Data Warehouse - SQL-based warehousing
- Synapse Data Science - ML and data science workloads
- Synapse Real-Time Analytics - Streaming and time-series data
- Data Activator - Event-driven actions and alerts
All of this sits on top of OneLake - a single, unified data lake for your entire organization.
Why This Matters
The promise is simple: one platform, one experience, one security model, one governance layer, one capacity model.
If you’ve built data platforms on Azure, you know the pain:
- Separate services with different permission models
- Data copied between storage accounts
- Multiple admin experiences
- Complex capacity planning
Fabric aims to solve all of this.
OneLake: The Foundation
OneLake is the most significant architectural decision. It’s a single, organization-wide data lake that:
- Automatically provisions with your Fabric tenant
- Uses Delta Lake format by default
- Enables shortcuts to existing ADLS, S3, or Dataverse data
- Applies consistent governance across all data
# In a Fabric notebook, all data is accessible via OneLake paths
df = spark.read.format("delta").load("abfss://workspace@onelake.dfs.fabric.microsoft.com/lakehouse/Tables/sales")
# You don't need to manage storage accounts or credentials
# OneLake handles it all
The shortcut feature is clever - you can create references to data in existing storage without copying it. This enables gradual migration without disruption.
The Lakehouse Experience
Fabric’s Lakehouse combines the best of data lakes and data warehouses:
Write with Spark:
# Data engineering in notebooks
from pyspark.sql.functions import col, year, month
raw_df = spark.read.format("delta").load("Files/raw/sales/")
transformed = raw_df \
.withColumn("year", year(col("sale_date"))) \
.withColumn("month", month(col("sale_date"))) \
.filter(col("amount") > 0)
transformed.write \
.mode("overwrite") \
.format("delta") \
.saveAsTable("Tables/sales_curated")
Query with SQL:
-- SQL analytics endpoint provides instant T-SQL access
-- No data copy, same Delta tables
SELECT
year,
month,
SUM(amount) as total_sales,
COUNT(DISTINCT customer_id) as unique_customers
FROM sales_curated
GROUP BY year, month
ORDER BY year, month
Visualize in Power BI:
Power BI connects directly to the SQL endpoint. Changes to your Lakehouse tables appear immediately in reports - no import refresh needed.
Data Factory Redesigned
The Data Factory experience in Fabric is modernized:
Data Pipelines:
- Familiar orchestration concepts from ADF
- Native integration with Fabric artifacts
- Simplified linked services (OneLake handles connections)
Dataflows Gen2:
- Power Query-based transformations
- Direct output to Lakehouse tables
- Improved performance with Lakehouse staging
// Dataflow Gen2 M expression
let
Source = Web.Contents("https://api.example.com/sales"),
Data = Json.Document(Source),
ToTable = Table.FromList(Data, Splitter.SplitByNothing()),
Expanded = Table.ExpandRecordColumn(ToTable, "Column1", {"id", "date", "amount", "customer_id"})
in
Expanded
// Output directly to: Lakehouse > Tables > raw_sales
Data Warehouse
Fabric includes a cloud-native data warehouse with:
- Full T-SQL support
- Automatic distribution and indexing
- Cross-database queries
- No cluster management
-- Create a warehouse table with no infrastructure concern
CREATE TABLE dim_customer (
customer_key INT NOT NULL,
customer_id NVARCHAR(50) NOT NULL,
customer_name NVARCHAR(200),
segment NVARCHAR(50),
created_date DATE
);
-- Load from Lakehouse using COPY
COPY INTO dim_customer
FROM 'abfss://workspace@onelake.dfs.fabric.microsoft.com/lakehouse/Tables/customers'
WITH (FILE_TYPE = 'PARQUET');
Real-Time Analytics
For streaming and time-series workloads, Real-Time Analytics provides:
- KQL (Kusto Query Language) databases
- Sub-second ingestion latency
- Direct integration with Event Hubs and Kafka
- Real-time dashboards
// KQL query for streaming data analysis
sales_events
| where ingestion_time() > ago(1h)
| summarize
event_count = count(),
total_amount = sum(amount),
avg_amount = avg(amount)
by bin(event_time, 5m)
| render timechart
Pricing Model
Fabric uses a unified capacity model:
- One SKU (Capacity Units) covers all workloads
- No separate charges for storage, compute, or different engines
- Pause/resume capacity for cost control
- Auto-scale within capacity limits
This is a significant simplification. Instead of managing:
- Azure Synapse dedicated pool DWUs
- Databricks DBUs
- Power BI Premium capacity
- Azure Data Factory DIUs
You manage one capacity that covers everything.
What I Like
Unified Experience: One portal, one security model, one capacity. The cognitive load reduction is significant.
OneLake: A single data lake with consistent governance is what enterprises have been building manually for years.
Lakehouse Architecture: Write with Spark, query with SQL, visualize with Power BI - on the same data, no copying.
SaaS Model: No cluster management, auto-updates, built-in security.
Concerns
Maturity: This is preview. Some features are rough edges, some are incomplete.
Vendor Lock-in: Fabric is deeply Microsoft-native. Multi-cloud strategies become harder.
Migration Complexity: Existing Azure data platforms won’t migrate overnight.
Pricing at Scale: Need to see how capacity pricing works for large workloads.
Migration Thinking
If you’re running:
- Power BI Premium + Azure Synapse: Fabric is a natural consolidation
- Pure Databricks: Stay there; the Spark experience is more mature
- Azure Data Factory + ADLS + Synapse Serverless: Fabric simplifies this significantly
For new projects, I’d start in Fabric if your organization is Microsoft-centric.
Getting Started
Fabric is available in preview:
- Enable Fabric in your Power BI admin portal
- Create a Fabric capacity (or use trial)
- Create a workspace with Fabric enabled
- Start with a Lakehouse - it’s the foundation
My Take
Microsoft Fabric is Microsoft’s answer to the modern data platform. It’s ambitious, it’s opinionated, and it’s deeply integrated. For organizations committed to the Microsoft ecosystem, this could be the platform that finally delivers the “single pane of glass” promise.
I’m cautiously optimistic. The architecture is sound. The execution in preview is promising. The real test will be production workloads at scale.
More deep dives coming as I explore each component.