2 min read
OneLake: The Foundation of Microsoft Fabric
I wrote “OneLake: The Foundation of Microsoft Fabric” to share practical, production-minded guidance on this topic.
What is OneLake?
OneLake is a single, unified, logical data lake for your entire organization. Think of it as “OneDrive for data” - automatically provisioned when you enable Fabric, with no storage accounts to create or manage.
# Traditional Azure Storage Model
# Multiple storage accounts, multiple configurations
storage_accounts = [
"adlsrawdata", # Raw data landing
"adlscurated", # Curated/transformed data
"adlsserving", # Serving layer
"adlsml", # ML artifacts
]
# OneLake Model
# One logical lake, organized by workspaces
onelake = {
"organization": "contoso.onelake.dfs.fabric.microsoft.com",
"workspaces": [
"Sales Analytics",
"Marketing Data",
"Finance Reporting",
"Data Science Lab"
]
}
OneLake Architecture
┌─────────────────────────────────────────────────────────────┐
│ OneLake │
│ (Organization Level) │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Workspace │ │ Workspace │ │ Workspace │ │
│ │ Sales │ │ Marketing │ │ Finance │ │
│ ├─────────────┤ ├─────────────┤ ├─────────────┤ │
│ │ Lakehouse A │ │ Lakehouse C │ │ Warehouse E │ │
│ │ Lakehouse B │ │ Lakehouse D │ │ Lakehouse F │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ Delta Lake Format │
│ (Parquet + Transaction Log) │
└─────────────────────────────────────────────────────────────┘
Key OneLake Features
1. Automatic Provisioning
# No infrastructure code needed
# OneLake is automatically available when Fabric is enabled
# Access pattern for Spark
lakehouse_path = "abfss://workspace@onelake.dfs.fabric.microsoft.com/lakehouse.Lakehouse/Tables/sales"
# Read data - no storage account keys, no SAS tokens
df = spark.read.format("delta").load(lakehouse_path)
2. Delta Lake by Default
All data in OneLake uses Delta Lake format:
# Write data to OneLake - automatically uses Delta
df.write \
.format("delta") \
.mode("overwrite") \
.saveAsTable("Tables/customers")
# Benefits of Delta Lake:
# - ACID transactions
# - Schema enforcement
# - Time travel
# - Efficient updates (MERGE)
3. Unified Security
# Security is managed through Fabric workspace roles
# No need to configure:
# - Storage account RBAC
# - ACLs on folders
# - SAS token policies
# Workspace roles map to data access:
workspace_roles = {
"Admin": "Full control of workspace and all items",
"Member": "Edit all items, share items",
"Contributor": "Edit all items",
"Viewer": "View all items, cannot edit"
}
4. Shortcuts
Shortcuts are a game-changing feature that allows you to reference external data without copying:
# Create a shortcut to existing ADLS Gen2 data
# This appears as a folder in your Lakehouse but data stays in place
shortcut_definition = {
"name": "external_sales",
"target": {
"adlsGen2": {
"location": "https://existingstorageaccount.dfs.core.windows.net/",
"path": "/raw/sales/"
}
}
}
# After creating the shortcut, access it like native OneLake data
df = spark.read.format("delta").load("Files/external_sales/")
Shortcuts support:
- Azure Data Lake Storage Gen2
- Amazon S3
- Google Cloud Storage (coming)
- Dataverse
OneLake File Structure
# Lakehouse structure in OneLake
workspace/
└── lakehouse.Lakehouse/
├── Tables/ # Managed Delta tables
│ ├── customers/
│ │ ├── _delta_log/
│ │ └── *.parquet
│ └── orders/
│ ├── _delta_log/
│ └── *.parquet
└── Files/ # Unmanaged files (any format)
├── raw/
│ └── data.csv
└── staging/
└── temp.json
Accessing OneLake
From Spark Notebooks
# Relative paths within the Lakehouse
df = spark.read.format("delta").table("sales")
# Absolute OneLake paths
df = spark.read.format("delta").load(
"abfss://workspace@onelake.dfs.fabric.microsoft.com/lakehouse.Lakehouse/Tables/sales"
)
From T-SQL (SQL Endpoint)
-- Lakehouse tables appear automatically in the SQL endpoint
SELECT * FROM lakehouse.dbo.sales;
-- Query across multiple Lakehouses
SELECT * FROM lakehouse1.dbo.customers c
JOIN lakehouse2.dbo.orders o ON c.customer_id = o.customer_id;
From Power BI
// Direct Lake mode - no import, no DirectQuery limitations
// Power BI reads directly from Delta tables in OneLake
From External Tools
# Use Azure Storage SDK with OneLake endpoint
from azure.storage.filedatalake import DataLakeServiceClient
service = DataLakeServiceClient(
account_url="https://onelake.dfs.fabric.microsoft.com",
credential=DefaultAzureCredential()
)
# Access workspace as container, lakehouse as directory
file_system = service.get_file_system_client("workspace-name")
directory = file_system.get_directory_client("lakehouse.Lakehouse/Files")
Migration Considerations
If you are moving from ADLS Gen2 to OneLake:
# Option 1: Use shortcuts (no data movement)
# Best for: Large datasets, gradual migration
# Option 2: Copy data using pipelines
# Best for: Clean break, new governance
# Option 3: Hybrid approach
# Use shortcuts for historical data
# Write new data directly to OneLake
Best Practices
- Organize by Workspace: Each business domain gets its own workspace
- Use Tables for Structured Data: Leverage Delta table management
- Use Files for Landing Zones: Raw files before transformation
- Leverage Shortcuts: Avoid copying data when possible
- Plan for Cross-Workspace Access: Use workspace roles carefully
OneLake is the foundation that makes Fabric’s unified experience possible. Tomorrow, I will explore the Lakehouse - the primary artifact you will build on top of OneLake.
Resources
- OneLake Documentation
- OneLake Shortcuts
- OneLake Security\n\n## Takeaways\n\nAdd a concise, personal takeaway and recommended next steps here.\n