March 12, 2022 2 min read

Delta Sharing: Secure Data Exchange Across Organizations

Azure Databricks Delta Sharing Data Exchange Open Source

Delta Sharing is an open protocol for secure data sharing, enabling organizations to share data without copying it. Built on Delta Lake, it works across clouds and platforms.

Delta Sharing allows you to:

Share live data without copying
Control access with revocable tokens
Support any client (Python, Spark, Power BI, etc.)
Share across organizations and cloud providers

The protocol is open source, meaning recipients don’t need Databricks to access shared data.

Architecture Overview

┌─────────────────┐         ┌─────────────────┐
│  Data Provider  │         │  Data Recipient │
│  (Databricks)   │         │  (Any Client)   │
│                 │         │                 │
│  ┌───────────┐  │   REST  │  ┌───────────┐  │
│  │Delta Table│◄─┼────API──┼──│  Client   │  │
│  └───────────┘  │         │  └───────────┘  │
│        ▲        │         │                 │
│  Access Control │         │  - Python       │
│  & Auditing     │         │  - Spark        │
│                 │         │  - Power BI     │
└─────────────────┘         │  - pandas       │
                            └─────────────────┘

-- Create a share
CREATE SHARE customer_analytics
COMMENT 'Customer analytics data for partners';

-- Add tables to the share
ALTER SHARE customer_analytics
ADD TABLE production.analytics.customer_segments;

ALTER SHARE customer_analytics
ADD TABLE production.analytics.purchase_patterns
PARTITION (region = 'US');  -- Share specific partitions only

-- Add a schema (all tables in schema)
ALTER SHARE customer_analytics
ADD SCHEMA production.public_metrics;

Create Recipients

-- Create a recipient (external organization)
CREATE RECIPIENT partner_company
COMMENT 'Analytics partner - Contoso Inc.';

-- Get the activation link to send to recipient
DESCRIBE RECIPIENT partner_company;
-- Returns an activation link they use to get their credential

-- For managed recipients (other Databricks workspaces)
CREATE RECIPIENT internal_team
USING ID 'aws:us-west-2:workspace-12345';

Grant Access

-- Grant access to the share
GRANT SELECT ON SHARE customer_analytics TO RECIPIENT partner_company;

-- View current grants
SHOW GRANTS ON SHARE customer_analytics;

-- Revoke access
REVOKE SELECT ON SHARE customer_analytics FROM RECIPIENT partner_company;

Consuming Shared Data

Python Client

import delta_sharing

# Load the profile file (received from data provider)
profile_file = "config.share"

# Create a sharing client
client = delta_sharing.SharingClient(profile_file)

# List available shares
shares = client.list_shares()
for share in shares:
    print(f"Share: {share.name}")

# List schemas in a share
schemas = client.list_schemas(delta_sharing.Share(name="customer_analytics"))
for schema in schemas:
    print(f"Schema: {schema.name}")

# List tables in a schema
tables = client.list_tables(
    delta_sharing.Schema(name="public_metrics", share="customer_analytics")
)
for table in tables:
    print(f"Table: {table.name}")

# Load a table as pandas DataFrame
df = delta_sharing.load_as_pandas(
    f"{profile_file}#customer_analytics.public_metrics.daily_summary"
)
print(df.head())

# Load as Spark DataFrame
spark_df = delta_sharing.load_as_spark(
    f"{profile_file}#customer_analytics.public_metrics.daily_summary"
)
spark_df.show()

Apache Spark

# Configure Spark with Delta Sharing
spark = SparkSession.builder \
    .config("spark.jars.packages", "io.delta:delta-sharing-spark_2.12:0.6.0") \
    .getOrCreate()

# Read shared table
df = spark.read.format("deltaSharing") \
    .load("config.share#customer_analytics.public_metrics.daily_summary")

df.show()

# Query shared data directly
spark.sql("""
    CREATE TABLE IF NOT EXISTS shared_data
    USING deltaSharing
    LOCATION 'config.share#customer_analytics.public_metrics.daily_summary'
""")

spark.sql("SELECT * FROM shared_data WHERE date > '2022-01-01'").show()

Power BI

# Generate a Power BI compatible sharing link
# In Databricks notebook:
share_url = f"""
https://{workspace_url}/api/2.0/delta-sharing/shares/customer_analytics/schemas/public_metrics/tables/daily_summary
"""

# In Power BI:
# 1. Get Data -> Web
# 2. Enter the share URL
# 3. Use Bearer token authentication with the recipient token

Share only specific data partitions:

-- Share only US region data
ALTER SHARE regional_data
ADD TABLE production.sales.transactions
PARTITION (region = 'US');

-- Share multiple partitions
ALTER SHARE regional_data
ADD TABLE production.sales.transactions
PARTITION (region = 'US');

ALTER SHARE regional_data
ADD TABLE production.sales.transactions
PARTITION (region = 'CA');

-- Share with date range (share recent data only)
ALTER SHARE recent_data
ADD TABLE production.sales.transactions
PARTITION (date >= '2022-01-01');

Share computed results without exposing raw data:

-- Create a view with aggregated/anonymized data
CREATE VIEW production.shared.customer_summary AS
SELECT
    customer_segment,
    region,
    COUNT(*) as customer_count,
    AVG(lifetime_value) as avg_ltv,
    SUM(total_orders) as total_orders
FROM production.analytics.customer_details
GROUP BY customer_segment, region;

-- Share the view
ALTER SHARE analytics_share
ADD TABLE production.shared.customer_summary;

-- Recipients see aggregated data, not individual customer records

Time-Limited Access

Implement expiring shares:

from datetime import datetime, timedelta
import schedule
import time

def check_share_expiration():
    """Revoke expired shares"""

    expiring_shares = spark.sql("""
        SELECT
            share_name,
            recipient_name,
            expiration_date
        FROM governance.sharing.share_metadata
        WHERE expiration_date <= current_date()
        AND status = 'active'
    """).collect()

    for share in expiring_shares:
        # Revoke access
        spark.sql(f"""
            REVOKE SELECT ON SHARE {share['share_name']}
            FROM RECIPIENT {share['recipient_name']}
        """)

        # Update status
        spark.sql(f"""
            UPDATE governance.sharing.share_metadata
            SET status = 'expired'
            WHERE share_name = '{share['share_name']}'
            AND recipient_name = '{share['recipient_name']}'
        """)

        print(f"Revoked: {share['share_name']} from {share['recipient_name']}")

# Run daily
schedule.every().day.at("00:00").do(check_share_expiration)

Monitoring and Auditing

Track sharing activity:

-- View sharing audit logs
SELECT
    event_time,
    action_name,
    request_params.share_name,
    request_params.recipient_name,
    user_identity.email,
    response.status_code
FROM system.access.audit
WHERE action_name LIKE '%Share%'
ORDER BY event_time DESC;

-- Track data access by recipients
SELECT
    event_time,
    action_name,
    request_params.table_name,
    source_ip_address,
    user_agent
FROM system.access.audit
WHERE service_name = 'deltaSharing'
AND action_name = 'getTableData'
ORDER BY event_time DESC;

Usage Analytics

def generate_sharing_report():
    """Generate monthly sharing usage report"""

    report = spark.sql("""
        SELECT
            share_name,
            recipient_name,
            table_name,
            COUNT(*) as access_count,
            SUM(bytes_read) as total_bytes_read,
            MIN(event_time) as first_access,
            MAX(event_time) as last_access
        FROM system.access.audit
        WHERE service_name = 'deltaSharing'
        AND event_time >= date_trunc('month', current_date())
        GROUP BY share_name, recipient_name, table_name
    """)

    return report

# Send weekly reports
report_df = generate_sharing_report()
report_df.write.mode("overwrite").saveAsTable("governance.reports.sharing_usage")

Security Best Practices

Token Management

def rotate_recipient_tokens():
    """Rotate tokens for all recipients periodically"""

    recipients = spark.sql("SHOW RECIPIENTS").collect()

    for recipient in recipients:
        # Rotate token
        spark.sql(f"ALTER RECIPIENT {recipient['name']} ROTATE TOKEN")

        # Notify recipient of new token
        send_token_notification(recipient['name'], recipient['email'])

        print(f"Rotated token for: {recipient['name']}")

# Schedule monthly rotation

IP Restrictions

-- Create recipient with IP restrictions
CREATE RECIPIENT secure_partner
COMMENT 'Partner with IP restriction'
PROPERTIES (
    'allowed_ip_ranges' = '10.0.0.0/8,192.168.1.0/24'
);

Data Minimization

-- Share only necessary columns
CREATE VIEW production.shared.minimal_customer AS
SELECT
    customer_id,  -- Anonymized ID
    segment,
    region,
    signup_year  -- Generalized date
FROM production.sales.customers;

-- Don't share: email, phone, address, full name, exact dates

Share data across cloud providers:

# Provider on Azure Databricks sharing to recipient on AWS
# The protocol works identically regardless of cloud

# Recipient on AWS configures their Spark:
spark = SparkSession.builder \
    .config("spark.hadoop.fs.azure.account.key.{account}.dfs.core.windows.net",
            "not-needed-for-delta-sharing") \
    .getOrCreate()

# Read shared data (data stays on Azure, accessed via REST API)
df = spark.read.format("deltaSharing") \
    .load("azure_provider.share#share_name.schema.table")

# The delta-sharing protocol handles cross-cloud access
# No need for direct storage access

Building a Data Marketplace

Create an internal data marketplace:

class DataMarketplace:
    def __init__(self, spark):
        self.spark = spark

    def register_product(self, product_name, tables, description, owner):
        """Register a new data product for sharing"""

        # Create share
        self.spark.sql(f"""
            CREATE SHARE IF NOT EXISTS {product_name}
            COMMENT '{description}'
        """)

        # Add tables
        for table in tables:
            self.spark.sql(f"""
                ALTER SHARE {product_name} ADD TABLE {table}
            """)

        # Register in catalog
        self.spark.sql(f"""
            INSERT INTO governance.marketplace.products VALUES (
                '{product_name}',
                '{description}',
                '{owner}',
                current_timestamp(),
                'active'
            )
        """)

    def request_access(self, product_name, requester, justification):
        """Submit access request for a data product"""

        self.spark.sql(f"""
            INSERT INTO governance.marketplace.access_requests VALUES (
                uuid(),
                '{product_name}',
                '{requester}',
                '{justification}',
                current_timestamp(),
                'pending'
            )
        """)

        # Notify product owner
        notify_owner(product_name, requester, justification)

    def approve_access(self, request_id):
        """Approve an access request"""

        request = self.spark.sql(f"""
            SELECT * FROM governance.marketplace.access_requests
            WHERE request_id = '{request_id}'
        """).first()

        # Create recipient and grant access
        self.spark.sql(f"""
            CREATE RECIPIENT IF NOT EXISTS {request['requester']}
        """)

        self.spark.sql(f"""
            GRANT SELECT ON SHARE {request['product_name']}
            TO RECIPIENT {request['requester']}
        """)

        # Update request status
        self.spark.sql(f"""
            UPDATE governance.marketplace.access_requests
            SET status = 'approved'
            WHERE request_id = '{request_id}'
        """)

Conclusion

Delta Sharing revolutionizes how organizations exchange data. By enabling secure, live data sharing without copying, it reduces data duplication, ensures freshness, and simplifies governance.

Key benefits:

Share data without ETL or copying
Open protocol works with any client
Fine-grained access control
Complete audit trail
Cross-cloud and cross-platform support

Whether sharing with partners, customers, or between internal teams, Delta Sharing provides a modern approach to data exchange.

What is Delta Sharing?

Architecture Overview

Setting Up Delta Sharing

Enable Delta Sharing in Unity Catalog