May 3, 2023 2 min read

OneLake: The Foundation of Microsoft Fabric

Microsoft Fabric OneLake Data Lake Azure Data Architecture

OneLake is the most architecturally significant component of Microsoft Fabric. It represents Microsoft’s answer to the fragmented storage landscape that has plagued enterprise data platforms. Today, I will explore what OneLake is, how it works, and why it matters.

What is OneLake?

OneLake is a single, unified, logical data lake for your entire organization. Think of it as “OneDrive for data” - automatically provisioned when you enable Fabric, with no storage accounts to create or manage.

# Traditional Azure Storage Model
# Multiple storage accounts, multiple configurations
storage_accounts = [
    "adlsrawdata",      # Raw data landing
    "adlscurated",      # Curated/transformed data
    "adlsserving",      # Serving layer
    "adlsml",           # ML artifacts
]

# OneLake Model
# One logical lake, organized by workspaces
onelake = {
    "organization": "contoso.onelake.dfs.fabric.microsoft.com",
    "workspaces": [
        "Sales Analytics",
        "Marketing Data",
        "Finance Reporting",
        "Data Science Lab"
    ]
}

OneLake Architecture

┌─────────────────────────────────────────────────────────────┐
│                        OneLake                               │
│                  (Organization Level)                        │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │  Workspace  │  │  Workspace  │  │  Workspace  │         │
│  │   Sales     │  │  Marketing  │  │   Finance   │         │
│  ├─────────────┤  ├─────────────┤  ├─────────────┤         │
│  │ Lakehouse A │  │ Lakehouse C │  │ Warehouse E │         │
│  │ Lakehouse B │  │ Lakehouse D │  │ Lakehouse F │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
├─────────────────────────────────────────────────────────────┤
│                    Delta Lake Format                         │
│                   (Parquet + Transaction Log)                │
└─────────────────────────────────────────────────────────────┘

Key OneLake Features

1. Automatic Provisioning

# No infrastructure code needed
# OneLake is automatically available when Fabric is enabled

# Access pattern for Spark
lakehouse_path = "abfss://workspace@onelake.dfs.fabric.microsoft.com/lakehouse.Lakehouse/Tables/sales"

# Read data - no storage account keys, no SAS tokens
df = spark.read.format("delta").load(lakehouse_path)

2. Delta Lake by Default

All data in OneLake uses Delta Lake format:

# Write data to OneLake - automatically uses Delta
df.write \
    .format("delta") \
    .mode("overwrite") \
    .saveAsTable("Tables/customers")

# Benefits of Delta Lake:
# - ACID transactions
# - Schema enforcement
# - Time travel
# - Efficient updates (MERGE)

3. Unified Security

# Security is managed through Fabric workspace roles
# No need to configure:
# - Storage account RBAC
# - ACLs on folders
# - SAS token policies

# Workspace roles map to data access:
workspace_roles = {
    "Admin": "Full control of workspace and all items",
    "Member": "Edit all items, share items",
    "Contributor": "Edit all items",
    "Viewer": "View all items, cannot edit"
}

4. Shortcuts

Shortcuts are a game-changing feature that allows you to reference external data without copying:

# Create a shortcut to existing ADLS Gen2 data
# This appears as a folder in your Lakehouse but data stays in place

shortcut_definition = {
    "name": "external_sales",
    "target": {
        "adlsGen2": {
            "location": "https://existingstorageaccount.dfs.core.windows.net/",
            "path": "/raw/sales/"
        }
    }
}

# After creating the shortcut, access it like native OneLake data
df = spark.read.format("delta").load("Files/external_sales/")

Shortcuts support:

Azure Data Lake Storage Gen2
Amazon S3
Google Cloud Storage (coming)
Dataverse

OneLake File Structure

# Lakehouse structure in OneLake
workspace/
└── lakehouse.Lakehouse/
    ├── Tables/           # Managed Delta tables
    │   ├── customers/
    │   │   ├── _delta_log/
    │   │   └── *.parquet
    │   └── orders/
    │       ├── _delta_log/
    │       └── *.parquet
    └── Files/            # Unmanaged files (any format)
        ├── raw/
        │   └── data.csv
        └── staging/
            └── temp.json

Accessing OneLake

From Spark Notebooks

# Relative paths within the Lakehouse
df = spark.read.format("delta").table("sales")

# Absolute OneLake paths
df = spark.read.format("delta").load(
    "abfss://workspace@onelake.dfs.fabric.microsoft.com/lakehouse.Lakehouse/Tables/sales"
)

From T-SQL (SQL Endpoint)

-- Lakehouse tables appear automatically in the SQL endpoint
SELECT * FROM lakehouse.dbo.sales;

-- Query across multiple Lakehouses
SELECT * FROM lakehouse1.dbo.customers c
JOIN lakehouse2.dbo.orders o ON c.customer_id = o.customer_id;

From Power BI

// Direct Lake mode - no import, no DirectQuery limitations
// Power BI reads directly from Delta tables in OneLake

From External Tools

# Use Azure Storage SDK with OneLake endpoint
from azure.storage.filedatalake import DataLakeServiceClient

service = DataLakeServiceClient(
    account_url="https://onelake.dfs.fabric.microsoft.com",
    credential=DefaultAzureCredential()
)

# Access workspace as container, lakehouse as directory
file_system = service.get_file_system_client("workspace-name")
directory = file_system.get_directory_client("lakehouse.Lakehouse/Files")

Migration Considerations

If you are moving from ADLS Gen2 to OneLake:

# Option 1: Use shortcuts (no data movement)
# Best for: Large datasets, gradual migration

# Option 2: Copy data using pipelines
# Best for: Clean break, new governance

# Option 3: Hybrid approach
# Use shortcuts for historical data
# Write new data directly to OneLake

Best Practices

Organize by Workspace: Each business domain gets its own workspace
Use Tables for Structured Data: Leverage Delta table management
Use Files for Landing Zones: Raw files before transformation
Leverage Shortcuts: Avoid copying data when possible
Plan for Cross-Workspace Access: Use workspace roles carefully

OneLake is the foundation that makes Fabric’s unified experience possible. Tomorrow, I will explore the Lakehouse - the primary artifact you will build on top of OneLake.