August 2, 2024 2 min read

Fabric Mirroring GA: Real-Time Database Replication

Microsoft Fabric Mirroring Database Real-Time OneLake

Database mirroring in Microsoft Fabric is now generally available. This feature enables near real-time replication of operational databases into OneLake, eliminating traditional ETL for many scenarios. Let’s explore how it works and when to use it.

What is Mirroring?

Mirroring creates a continuously synchronized copy of your operational database in OneLake as Delta Lake tables:

Source Database (OLTP)
        │
        │  Change Data Capture
        ▼
   Mirroring Service
        │
        │  Near Real-Time
        ▼
  OneLake (Delta Lake)
        │
        ├── Lakehouse SQL Endpoint
        ├── Spark Access
        └── Power BI Direct Lake

Supported Sources

As of GA:

Azure SQL Database: Full support
Azure SQL Managed Instance: Full support
Azure Cosmos DB: Analytical store integration
Snowflake: Cross-cloud preview

Setting Up Azure SQL Mirroring

Prerequisites

-- 1. Ensure compatibility level
ALTER DATABASE [YourDB] SET COMPATIBILITY_LEVEL = 150;

-- 2. Enable change tracking (required)
ALTER DATABASE [YourDB] SET CHANGE_TRACKING = ON
(CHANGE_RETENTION = 7 DAYS, AUTO_CLEANUP = ON);

-- 3. Enable change tracking for each table
ALTER TABLE [dbo].[Customers]
ENABLE CHANGE_TRACKING WITH (TRACK_COLUMNS_UPDATED = ON);

ALTER TABLE [dbo].[Orders]
ENABLE CHANGE_TRACKING WITH (TRACK_COLUMNS_UPDATED = ON);

-- 4. Check current state
SELECT
    DB_NAME() AS DatabaseName,
    t.name AS TableName,
    ct.is_track_columns_updated_on
FROM sys.change_tracking_tables ct
JOIN sys.tables t ON ct.object_id = t.object_id;

Create the Mirror

import requests
from azure.identity import DefaultAzureCredential

def create_sql_mirror(
    workspace_id: str,
    display_name: str,
    sql_connection_string: str,
    tables_to_mirror: list[str]
) -> dict:
    """Create a mirrored database from Azure SQL."""

    credential = DefaultAzureCredential()
    token = credential.get_token("https://api.fabric.microsoft.com/.default")

    headers = {
        "Authorization": f"Bearer {token.token}",
        "Content-Type": "application/json"
    }

    payload = {
        "displayName": display_name,
        "definition": {
            "parts": [
                {
                    "path": "mirroring.json",
                    "payload": {
                        "sourceType": "AzureSqlDatabase",
                        "connection": {
                            "connectionString": sql_connection_string,
                            "authentication": "ManagedIdentity"
                        },
                        "tables": [
                            {"schema": "dbo", "name": table}
                            for table in tables_to_mirror
                        ],
                        "syncSettings": {
                            "mode": "Continuous",
                            "initialSyncType": "Full"
                        }
                    }
                }
            ]
        }
    }

    response = requests.post(
        f"https://api.fabric.microsoft.com/v1/workspaces/{workspace_id}/mirroredDatabases",
        headers=headers,
        json=payload
    )

    return response.json()

# Usage
mirror = create_sql_mirror(
    workspace_id="your-workspace-id",
    display_name="Sales-Mirror",
    sql_connection_string="your-connection-string",
    tables_to_mirror=["Customers", "Orders", "Products", "OrderItems"]
)

Understanding Sync Behavior

Initial Sync

The first synchronization captures all existing data:

def monitor_initial_sync(workspace_id: str, mirror_id: str):
    """Monitor initial sync progress."""

    while True:
        status = get_mirror_status(workspace_id, mirror_id)

        print(f"Status: {status['syncStatus']}")
        print(f"Tables completed: {status['tablesCompleted']}/{status['totalTables']}")
        print(f"Rows synced: {status['rowsSynced']:,}")

        if status['syncStatus'] == 'Running':
            break
        elif status['syncStatus'] == 'Failed':
            raise Exception(f"Initial sync failed: {status['error']}")

        time.sleep(30)

Continuous Sync

After initial sync, changes flow continuously:

Source Transaction
       │
       ├── Write to source database
       │
       ├── Change tracking captures change
       │
       ├── Mirroring service polls changes (every few seconds)
       │
       └── Delta Lake append/update in OneLake

Typical latency: 30 seconds to 2 minutes

Handling Schema Changes

# Supported schema changes (automatic)
supported_changes = [
    "Add column (nullable)",
    "Add table (with change tracking)",
    "Modify column data type (compatible)",
]

# Unsupported (requires mirror recreation)
unsupported_changes = [
    "Drop column",
    "Rename column",
    "Change primary key",
    "Drop table",
]

def handle_schema_change(mirror_id: str, change_type: str):
    """Handle schema changes in mirrored database."""

    if change_type in ["add_column", "add_table"]:
        # Automatic - just wait for sync
        print("Schema change will sync automatically")
        return

    if change_type in ["drop_column", "rename_column"]:
        # Requires intervention
        print("Warning: This change requires mirror recreation")
        print("Steps:")
        print("1. Pause mirror")
        print("2. Apply schema change to source")
        print("3. Delete and recreate mirror")
        return

Querying Mirrored Data

Via SQL Endpoint

-- Connect to the Lakehouse SQL endpoint
-- Mirrored tables appear as regular tables

SELECT
    c.CustomerName,
    COUNT(o.OrderID) as OrderCount,
    SUM(o.TotalAmount) as TotalRevenue
FROM dbo.Customers c
JOIN dbo.Orders o ON c.CustomerID = o.CustomerID
WHERE o.OrderDate >= DATEADD(month, -3, GETDATE())
GROUP BY c.CustomerName
ORDER BY TotalRevenue DESC;

Via Spark

# In a Fabric notebook
from pyspark.sql import SparkSession

# Read from mirrored table
customers_df = spark.read.format("delta").load(
    "abfss://workspace@onelake.dfs.fabric.microsoft.com/lakehouse/Tables/Customers"
)

orders_df = spark.read.format("delta").load(
    "abfss://workspace@onelake.dfs.fabric.microsoft.com/lakehouse/Tables/Orders"
)

# Join and analyze
analysis = customers_df.join(
    orders_df,
    customers_df.CustomerID == orders_df.CustomerID
).groupBy("CustomerName").agg(
    count("OrderID").alias("order_count"),
    sum("TotalAmount").alias("total_revenue")
)

display(analysis)

Via Power BI Direct Lake

# Power BI semantic model connects directly to mirrored data
# No import needed - uses Direct Lake mode

# Benefits:
# - Sub-second query response
# - Always up-to-date data
# - No data movement
# - Reduced refresh times

Performance Considerations

def estimate_mirror_performance(
    source_db_size_gb: float,
    transaction_rate_per_second: float,
    avg_row_size_bytes: int
) -> dict:
    """Estimate mirroring performance metrics."""

    # Initial sync estimate
    initial_sync_hours = source_db_size_gb / 50  # ~50 GB/hour typical

    # Continuous sync latency
    if transaction_rate_per_second < 100:
        latency_seconds = 30
    elif transaction_rate_per_second < 1000:
        latency_seconds = 60
    else:
        latency_seconds = 120

    # OneLake storage
    storage_gb = source_db_size_gb * 1.2  # Delta overhead

    # Daily change data volume
    daily_changes_gb = (
        transaction_rate_per_second * 86400 * avg_row_size_bytes
    ) / 1e9

    return {
        "initial_sync_estimate_hours": initial_sync_hours,
        "typical_latency_seconds": latency_seconds,
        "onelake_storage_gb": storage_gb,
        "daily_change_volume_gb": daily_changes_gb,
        "recommended_capacity": "F8" if source_db_size_gb < 100 else "F16"
    }

Best Practices

Enable change tracking early: Do this before creating the mirror
Start with non-critical tables: Test with less important data first
Monitor sync lag: Set up alerts for latency spikes
Plan for schema changes: Document the process for your team
Use appropriate capacity: Under-provisioning causes throttling

When Not to Use Mirroring

Consider alternatives when:

Sub-second latency required: Use event streaming instead
Complex transformations needed: Traditional ETL may be better
Source doesn’t support change tracking: Some databases can’t be mirrored
Network constraints: Large data volumes over slow connections

Conclusion

Mirroring simplifies the data pipeline from operational databases to analytics. With GA, it’s ready for production use with Azure SQL Database and related sources.

Start by identifying tables that would benefit from near real-time analytics access, set up change tracking, and create your first mirror. The reduction in ETL complexity alone makes it worthwhile.