5 min read
OneLake Explorer: Navigating Your Data Lake in Fabric
OneLake is the foundation of Microsoft Fabric - a single, unified data lake for your entire organization. Today we’ll explore OneLake Explorer, the tool that lets you browse and manage your data lake like a local file system.
What is OneLake?
OneLake is automatically provisioned with your Fabric tenant:
# OneLake characteristics
onelake_features = {
"single_instance": "One OneLake per tenant",
"hierarchical_namespace": "Like ADLS Gen2",
"format": "Delta Lake by default",
"access": "ABFS protocol compatible",
"storage": "Built-in, no separate provisioning"
}
# OneLake structure
"""
OneLake (Tenant)
├── Workspace A
│ ├── Lakehouse 1
│ │ ├── Files/
│ │ └── Tables/
│ └── Lakehouse 2
├── Workspace B
│ └── Lakehouse 3
└── Workspace C
└── Warehouse 1
"""
Installing OneLake Explorer
OneLake Explorer is a Windows application that mounts OneLake as a drive:
# Download OneLake Explorer
# Go to: https://www.microsoft.com/en-us/download/details.aspx?id=105367
# Or install via winget
winget install Microsoft.OneLake
# After installation:
# 1. Sign in with your Microsoft account
# 2. OneLake appears in File Explorer
# 3. Access path: OneLake - {TenantName}
Navigating OneLake
Once installed, OneLake appears as a drive in Windows Explorer:
OneLake - Contoso/
├── data-engineering-sandbox/
│ └── sales_lakehouse/
│ ├── Files/
│ │ ├── raw/
│ │ │ └── sales_2023.csv
│ │ └── processed/
│ └── Tables/
│ ├── customers/
│ │ ├── _delta_log/
│ │ └── part-00000.parquet
│ └── orders/
└── analytics-workspace/
└── reporting_lakehouse/
Accessing OneLake Programmatically
Beyond the explorer, access OneLake via code:
# In Fabric notebooks (automatic authentication)
df = spark.read.format("delta").load("Tables/customers")
# Using the ABFS path explicitly
abfs_path = "abfss://sales_lakehouse@onelake.dfs.fabric.microsoft.com/Tables/customers"
df = spark.read.format("delta").load(abfs_path)
# List files in a directory
files = dbutils.fs.ls("Files/raw/")
for file in files:
print(f"{file.name}: {file.size} bytes")
# From external Python (using Azure Identity)
from azure.identity import DefaultAzureCredential
from azure.storage.filedatalake import DataLakeServiceClient
credential = DefaultAzureCredential()
service_client = DataLakeServiceClient(
account_url="https://onelake.dfs.fabric.microsoft.com",
credential=credential
)
# Access a workspace (container equivalent)
file_system_client = service_client.get_file_system_client(
file_system="sales_lakehouse" # workspace/lakehouse
)
# List paths
paths = file_system_client.get_paths(path="Files/raw")
for path in paths:
print(path.name)
Common Operations with OneLake Explorer
Uploading Files
# Via Explorer:
# 1. Navigate to Files/ folder
# 2. Drag and drop files
# 3. Files appear immediately in Fabric
# Via code (in notebook):
# For small files, use dbutils
dbutils.fs.cp("file:/tmp/local_file.csv", "Files/uploads/local_file.csv")
# For larger uploads, consider Data Pipeline
Downloading Files
# Via Explorer:
# 1. Navigate to the file
# 2. Copy to local drive
# Via code:
dbutils.fs.cp("Files/exports/report.csv", "file:/tmp/report.csv")
Managing Delta Tables
# Delta tables in OneLake have this structure:
"""
Tables/
└── customers/
├── _delta_log/
│ ├── 00000000000000000000.json
│ ├── 00000000000000000001.json
│ └── _last_checkpoint
├── part-00000-xxx.snappy.parquet
├── part-00001-xxx.snappy.parquet
└── part-00002-xxx.snappy.parquet
"""
# Don't manually edit these files!
# Use Spark or SQL to manage Delta tables
# View table history
history_df = spark.sql("DESCRIBE HISTORY customers")
history_df.show()
# Vacuum old files (be careful!)
spark.sql("VACUUM customers RETAIN 168 HOURS")
OneLake Shortcuts
Shortcuts let you access external data without copying:
# Shortcut types:
shortcut_sources = {
"onelake": "Another Fabric workspace/lakehouse",
"adls_gen2": "Azure Data Lake Storage Gen2",
"s3": "Amazon S3",
"dataverse": "Dataverse tables"
}
# Creating a shortcut via API (Python SDK)
# Note: Usually done via UI, but possible programmatically
# Shortcut appears like regular folder/table
# Data stays in original location
# No data movement or duplication
Creating Shortcuts via UI
1. In your Lakehouse, right-click Tables or Files
2. Select "New shortcut"
3. Choose source type:
- OneLake (another Fabric location)
- Azure Data Lake Storage Gen2
- Amazon S3
4. Configure connection and path
5. Name the shortcut
6. Click "Create"
OneLake API Access
Access OneLake via REST API:
import requests
from azure.identity import DefaultAzureCredential
# Get token
credential = DefaultAzureCredential()
token = credential.get_token("https://storage.azure.com/.default")
# OneLake REST endpoint
base_url = "https://onelake.dfs.fabric.microsoft.com"
workspace = "sales_lakehouse"
path = "Files/raw"
# List files
response = requests.get(
f"{base_url}/{workspace}?recursive=false&resource=filesystem",
headers={
"Authorization": f"Bearer {token.token}",
"x-ms-version": "2021-06-08"
},
params={"directory": path}
)
print(response.json())
Performance Considerations
# OneLake performance tips:
performance_tips = {
"file_sizes": "Target 100MB-1GB files for optimal performance",
"partitioning": "Use appropriate partition columns",
"caching": "Leverage Fabric's built-in caching",
"shortcuts": "Shortcuts have network latency for external data"
}
# Example: Optimize file sizes during write
df.repartition(10).write \
.format("delta") \
.option("maxRecordsPerFile", 1000000) \
.mode("overwrite") \
.save("Tables/optimized_table")
Security and Access Control
OneLake inherits Fabric’s security model:
# Access is controlled at workspace level
# Item-level permissions for finer control
security_model = {
"authentication": "Microsoft Entra ID",
"authorization": "Workspace roles + item permissions",
"network": "Fabric Private Links (preview)",
"encryption": "At-rest and in-transit (automatic)"
}
# For shortcuts to external data:
# - ADLS Gen2: Uses stored credentials or SPN
# - S3: Uses IAM credentials
# - Credentials stored securely in Fabric
Troubleshooting OneLake Explorer
Common issues and solutions:
troubleshooting = {
"explorer_not_showing": {
"cause": "Sign-in issue",
"solution": "Sign out and sign in again"
},
"empty_workspace": {
"cause": "Workspace not Fabric-enabled",
"solution": "Ensure workspace is on Fabric capacity"
},
"slow_performance": {
"cause": "Large directory listings",
"solution": "Navigate directly to subdirectories"
},
"permission_denied": {
"cause": "Insufficient workspace access",
"solution": "Request appropriate workspace role"
}
}
Tomorrow we’ll create our first Lakehouse and understand the structure in detail.