OneLake: The Foundation of Microsoft Fabric
OneLake is the most architecturally significant component of Microsoft Fabric. It represents Microsoft’s answer to the fragmented storage landscape that has plagued enterprise data platforms. Today, I will explore what OneLake is, how it works, and why it matters.
What is OneLake?
OneLake is a single, unified, logical data lake for your entire organization. Think of it as “OneDrive for data” - automatically provisioned when you enable Fabric, with no storage accounts to create or manage.
# Traditional Azure Storage Model
# Multiple storage accounts, multiple configurations
storage_accounts = [
"adlsrawdata", # Raw data landing
"adlscurated", # Curated/transformed data
"adlsserving", # Serving layer
"adlsml", # ML artifacts
]
# OneLake Model
# One logical lake, organized by workspaces
onelake = {
"organization": "contoso.onelake.dfs.fabric.microsoft.com",
"workspaces": [
"Sales Analytics",
"Marketing Data",
"Finance Reporting",
"Data Science Lab"
]
}
OneLake Architecture
┌─────────────────────────────────────────────────────────────┐
│ OneLake │
│ (Organization Level) │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Workspace │ │ Workspace │ │ Workspace │ │
│ │ Sales │ │ Marketing │ │ Finance │ │
│ ├─────────────┤ ├─────────────┤ ├─────────────┤ │
│ │ Lakehouse A │ │ Lakehouse C │ │ Warehouse E │ │
│ │ Lakehouse B │ │ Lakehouse D │ │ Lakehouse F │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ Delta Lake Format │
│ (Parquet + Transaction Log) │
└─────────────────────────────────────────────────────────────┘
Key OneLake Features
1. Automatic Provisioning
# No infrastructure code needed
# OneLake is automatically available when Fabric is enabled
# Access pattern for Spark
lakehouse_path = "abfss://workspace@onelake.dfs.fabric.microsoft.com/lakehouse.Lakehouse/Tables/sales"
# Read data - no storage account keys, no SAS tokens
df = spark.read.format("delta").load(lakehouse_path)
2. Delta Lake by Default
All data in OneLake uses Delta Lake format:
# Write data to OneLake - automatically uses Delta
df.write \
.format("delta") \
.mode("overwrite") \
.saveAsTable("Tables/customers")
# Benefits of Delta Lake:
# - ACID transactions
# - Schema enforcement
# - Time travel
# - Efficient updates (MERGE)
3. Unified Security
# Security is managed through Fabric workspace roles
# No need to configure:
# - Storage account RBAC
# - ACLs on folders
# - SAS token policies
# Workspace roles map to data access:
workspace_roles = {
"Admin": "Full control of workspace and all items",
"Member": "Edit all items, share items",
"Contributor": "Edit all items",
"Viewer": "View all items, cannot edit"
}
4. Shortcuts
Shortcuts are a game-changing feature that allows you to reference external data without copying:
# Create a shortcut to existing ADLS Gen2 data
# This appears as a folder in your Lakehouse but data stays in place
shortcut_definition = {
"name": "external_sales",
"target": {
"adlsGen2": {
"location": "https://existingstorageaccount.dfs.core.windows.net/",
"path": "/raw/sales/"
}
}
}
# After creating the shortcut, access it like native OneLake data
df = spark.read.format("delta").load("Files/external_sales/")
Shortcuts support:
- Azure Data Lake Storage Gen2
- Amazon S3
- Google Cloud Storage (coming)
- Dataverse
OneLake File Structure
# Lakehouse structure in OneLake
workspace/
└── lakehouse.Lakehouse/
├── Tables/ # Managed Delta tables
│ ├── customers/
│ │ ├── _delta_log/
│ │ └── *.parquet
│ └── orders/
│ ├── _delta_log/
│ └── *.parquet
└── Files/ # Unmanaged files (any format)
├── raw/
│ └── data.csv
└── staging/
└── temp.json
Accessing OneLake
From Spark Notebooks
# Relative paths within the Lakehouse
df = spark.read.format("delta").table("sales")
# Absolute OneLake paths
df = spark.read.format("delta").load(
"abfss://workspace@onelake.dfs.fabric.microsoft.com/lakehouse.Lakehouse/Tables/sales"
)
From T-SQL (SQL Endpoint)
-- Lakehouse tables appear automatically in the SQL endpoint
SELECT * FROM lakehouse.dbo.sales;
-- Query across multiple Lakehouses
SELECT * FROM lakehouse1.dbo.customers c
JOIN lakehouse2.dbo.orders o ON c.customer_id = o.customer_id;
From Power BI
// Direct Lake mode - no import, no DirectQuery limitations
// Power BI reads directly from Delta tables in OneLake
From External Tools
# Use Azure Storage SDK with OneLake endpoint
from azure.storage.filedatalake import DataLakeServiceClient
service = DataLakeServiceClient(
account_url="https://onelake.dfs.fabric.microsoft.com",
credential=DefaultAzureCredential()
)
# Access workspace as container, lakehouse as directory
file_system = service.get_file_system_client("workspace-name")
directory = file_system.get_directory_client("lakehouse.Lakehouse/Files")
Migration Considerations
If you are moving from ADLS Gen2 to OneLake:
# Option 1: Use shortcuts (no data movement)
# Best for: Large datasets, gradual migration
# Option 2: Copy data using pipelines
# Best for: Clean break, new governance
# Option 3: Hybrid approach
# Use shortcuts for historical data
# Write new data directly to OneLake
Best Practices
- Organize by Workspace: Each business domain gets its own workspace
- Use Tables for Structured Data: Leverage Delta table management
- Use Files for Landing Zones: Raw files before transformation
- Leverage Shortcuts: Avoid copying data when possible
- Plan for Cross-Workspace Access: Use workspace roles carefully
OneLake is the foundation that makes Fabric’s unified experience possible. Tomorrow, I will explore the Lakehouse - the primary artifact you will build on top of OneLake.