July 5, 2023 1 min read

OneLake Explorer: Navigating Your Data Lake in Fabric

Microsoft Fabric OneLake Data Lake Data Engineering

OneLake is the foundation of Microsoft Fabric - a single, unified data lake for your entire organization. Today we’ll explore OneLake Explorer, the tool that lets you browse and manage your data lake like a local file system.

What is OneLake?

OneLake is automatically provisioned with your Fabric tenant:

# OneLake characteristics
onelake_features = {
    "single_instance": "One OneLake per tenant",
    "hierarchical_namespace": "Like ADLS Gen2",
    "format": "Delta Lake by default",
    "access": "ABFS protocol compatible",
    "storage": "Built-in, no separate provisioning"
}

# OneLake structure
"""
OneLake (Tenant)
├── Workspace A
│   ├── Lakehouse 1
│   │   ├── Files/
│   │   └── Tables/
│   └── Lakehouse 2
├── Workspace B
│   └── Lakehouse 3
└── Workspace C
    └── Warehouse 1
"""

Installing OneLake Explorer

OneLake Explorer is a Windows application that mounts OneLake as a drive:

# Download OneLake Explorer
# Go to: https://www.microsoft.com/en-us/download/details.aspx?id=105367

# Or install via winget
winget install Microsoft.OneLake

# After installation:
# 1. Sign in with your Microsoft account
# 2. OneLake appears in File Explorer
# 3. Access path: OneLake - {TenantName}

Navigating OneLake

Once installed, OneLake appears as a drive in Windows Explorer:

OneLake - Contoso/
├── data-engineering-sandbox/
│   └── sales_lakehouse/
│       ├── Files/
│       │   ├── raw/
│       │   │   └── sales_2023.csv
│       │   └── processed/
│       └── Tables/
│           ├── customers/
│           │   ├── _delta_log/
│           │   └── part-00000.parquet
│           └── orders/
└── analytics-workspace/
    └── reporting_lakehouse/

Accessing OneLake Programmatically

Beyond the explorer, access OneLake via code:

# In Fabric notebooks (automatic authentication)
df = spark.read.format("delta").load("Tables/customers")

# Using the ABFS path explicitly
abfs_path = "abfss://sales_lakehouse@onelake.dfs.fabric.microsoft.com/Tables/customers"
df = spark.read.format("delta").load(abfs_path)

# List files in a directory
files = dbutils.fs.ls("Files/raw/")
for file in files:
    print(f"{file.name}: {file.size} bytes")

# From external Python (using Azure Identity)
from azure.identity import DefaultAzureCredential
from azure.storage.filedatalake import DataLakeServiceClient

credential = DefaultAzureCredential()
service_client = DataLakeServiceClient(
    account_url="https://onelake.dfs.fabric.microsoft.com",
    credential=credential
)

# Access a workspace (container equivalent)
file_system_client = service_client.get_file_system_client(
    file_system="sales_lakehouse"  # workspace/lakehouse
)

# List paths
paths = file_system_client.get_paths(path="Files/raw")
for path in paths:
    print(path.name)

Common Operations with OneLake Explorer

Uploading Files

# Via Explorer:
# 1. Navigate to Files/ folder
# 2. Drag and drop files
# 3. Files appear immediately in Fabric

# Via code (in notebook):
# For small files, use dbutils
dbutils.fs.cp("file:/tmp/local_file.csv", "Files/uploads/local_file.csv")

# For larger uploads, consider Data Pipeline

Downloading Files

# Via Explorer:
# 1. Navigate to the file
# 2. Copy to local drive

# Via code:
dbutils.fs.cp("Files/exports/report.csv", "file:/tmp/report.csv")

Managing Delta Tables

# Delta tables in OneLake have this structure:
"""
Tables/
└── customers/
    ├── _delta_log/
    │   ├── 00000000000000000000.json
    │   ├── 00000000000000000001.json
    │   └── _last_checkpoint
    ├── part-00000-xxx.snappy.parquet
    ├── part-00001-xxx.snappy.parquet
    └── part-00002-xxx.snappy.parquet
"""

# Don't manually edit these files!
# Use Spark or SQL to manage Delta tables

# View table history
history_df = spark.sql("DESCRIBE HISTORY customers")
history_df.show()

# Vacuum old files (be careful!)
spark.sql("VACUUM customers RETAIN 168 HOURS")

OneLake Shortcuts

Shortcuts let you access external data without copying:

# Shortcut types:
shortcut_sources = {
    "onelake": "Another Fabric workspace/lakehouse",
    "adls_gen2": "Azure Data Lake Storage Gen2",
    "s3": "Amazon S3",
    "dataverse": "Dataverse tables"
}

# Creating a shortcut via API (Python SDK)
# Note: Usually done via UI, but possible programmatically

# Shortcut appears like regular folder/table
# Data stays in original location
# No data movement or duplication

Creating Shortcuts via UI

1. In your Lakehouse, right-click Tables or Files
2. Select "New shortcut"
3. Choose source type:
   - OneLake (another Fabric location)
   - Azure Data Lake Storage Gen2
   - Amazon S3
4. Configure connection and path
5. Name the shortcut
6. Click "Create"

OneLake API Access

Access OneLake via REST API:

import requests
from azure.identity import DefaultAzureCredential

# Get token
credential = DefaultAzureCredential()
token = credential.get_token("https://storage.azure.com/.default")

# OneLake REST endpoint
base_url = "https://onelake.dfs.fabric.microsoft.com"
workspace = "sales_lakehouse"
path = "Files/raw"

# List files
response = requests.get(
    f"{base_url}/{workspace}?recursive=false&resource=filesystem",
    headers={
        "Authorization": f"Bearer {token.token}",
        "x-ms-version": "2021-06-08"
    },
    params={"directory": path}
)

print(response.json())

Performance Considerations

# OneLake performance tips:
performance_tips = {
    "file_sizes": "Target 100MB-1GB files for optimal performance",
    "partitioning": "Use appropriate partition columns",
    "caching": "Leverage Fabric's built-in caching",
    "shortcuts": "Shortcuts have network latency for external data"
}

# Example: Optimize file sizes during write
df.repartition(10).write \
    .format("delta") \
    .option("maxRecordsPerFile", 1000000) \
    .mode("overwrite") \
    .save("Tables/optimized_table")

Security and Access Control

OneLake inherits Fabric’s security model:

# Access is controlled at workspace level
# Item-level permissions for finer control

security_model = {
    "authentication": "Microsoft Entra ID",
    "authorization": "Workspace roles + item permissions",
    "network": "Fabric Private Links (preview)",
    "encryption": "At-rest and in-transit (automatic)"
}

# For shortcuts to external data:
# - ADLS Gen2: Uses stored credentials or SPN
# - S3: Uses IAM credentials
# - Credentials stored securely in Fabric

Troubleshooting OneLake Explorer

Common issues and solutions:

troubleshooting = {
    "explorer_not_showing": {
        "cause": "Sign-in issue",
        "solution": "Sign out and sign in again"
    },
    "empty_workspace": {
        "cause": "Workspace not Fabric-enabled",
        "solution": "Ensure workspace is on Fabric capacity"
    },
    "slow_performance": {
        "cause": "Large directory listings",
        "solution": "Navigate directly to subdirectories"
    },
    "permission_denied": {
        "cause": "Insufficient workspace access",
        "solution": "Request appropriate workspace role"
    }
}

Tomorrow we’ll create our first Lakehouse and understand the structure in detail.