January 8, 2021 1 min read

Databricks SQL Analytics: BI on the Lakehouse

Databricks SQL Analytics brings SQL analytics to the lakehouse. Query Delta Lake with familiar SQL, visualize with built-in dashboards, and connect to BI tools.

SQL Endpoints

-- Create SQL endpoint (via UI or API)
-- SQL endpoints provide serverless compute for queries

-- Query Delta tables
SELECT
    product_category,
    SUM(revenue) as total_revenue,
    COUNT(DISTINCT customer_id) as unique_customers
FROM delta.`/mnt/data/sales`
WHERE order_date >= '2021-01-01'
GROUP BY product_category
ORDER BY total_revenue DESC

SQL Endpoint Configuration

{
    "name": "analytics-endpoint",
    "cluster_size": "Medium",
    "min_num_clusters": 1,
    "max_num_clusters": 10,
    "auto_stop_mins": 30,
    "enable_photon": true,
    "spot_instance_policy": "COST_OPTIMIZED",
    "channel": "CHANNEL_NAME_CURRENT"
}

Query History and Optimization

-- View query history
SELECT
    query_id,
    query_text,
    execution_time_ms,
    rows_produced,
    bytes_read
FROM system.query_history
WHERE start_time >= current_date() - INTERVAL 7 DAYS
ORDER BY execution_time_ms DESC
LIMIT 20

-- Analyze query plan
EXPLAIN FORMATTED
SELECT * FROM sales WHERE product_id = 'ABC123'

Photon Engine

Photon accelerates SQL queries on Delta Lake:

-- Photon automatically optimizes:
-- - Joins
-- - Aggregations
-- - Filters on Parquet/Delta

-- Check Photon usage
SELECT
    query_id,
    photon_enabled,
    photon_cpu_time_ms,
    total_time_ms
FROM system.query_history
WHERE start_time >= current_date()

Built-in Visualizations

-- Query results can be visualized directly
-- Supported chart types:
-- - Bar, Line, Area charts
-- - Pie, Scatter plots
-- - Maps
-- - Tables with formatting
-- - Counters and gauges

SELECT
    DATE_TRUNC('month', order_date) as month,
    SUM(revenue) as monthly_revenue
FROM sales
WHERE order_date >= '2020-01-01'
GROUP BY 1
ORDER BY 1
-- Visualize as Line Chart

Dashboards

# Create dashboard via API
import requests

dashboard = {
    "name": "Sales Dashboard",
    "tags": ["sales", "monthly"],
    "widgets": [
        {
            "visualization_id": "viz_123",
            "options": {
                "position": {"col": 0, "row": 0, "size_x": 6, "size_y": 4}
            }
        }
    ]
}

response = requests.post(
    f"{workspace_url}/api/2.0/preview/sql/dashboards",
    headers={"Authorization": f"Bearer {token}"},
    json=dashboard
)

Alerts

-- Create alert on query results
-- Alert when daily revenue drops below threshold

SELECT SUM(revenue) as daily_revenue
FROM sales
WHERE order_date = current_date()

-- Alert configuration:
-- Trigger when: daily_revenue < 10000
-- Frequency: Every 1 hour
-- Notify: email, Slack, PagerDuty

Connect BI Tools

Power BI

// Connection string
Server: adb-xxxxx.azuredatabricks.net
HTTP Path: /sql/1.0/endpoints/xxxxx
Authentication: Azure AD

Tableau

Server: adb-xxxxx.azuredatabricks.net
Port: 443
HTTP Path: /sql/1.0/endpoints/xxxxx
Authentication: Personal Access Token

Query Parameters

-- Parameterized queries
SELECT *
FROM sales
WHERE product_category = {{ category }}
  AND order_date BETWEEN {{ start_date }} AND {{ end_date }}
  AND revenue >= {{ min_revenue }}

-- Parameters appear as input fields in the UI

Access Control

-- Grant query access
GRANT SELECT ON TABLE sales TO `analysts@company.com`;
GRANT USAGE ON SCHEMA analytics TO `analysts@company.com`;

-- Grant dashboard access
GRANT VIEW ON DASHBOARD `Sales Dashboard` TO `managers@company.com`;

Cost Optimization

Tip	Benefit
Use Photon	2-8x faster queries
Auto-stop	No idle costs
Spot instances	Lower compute costs
Query caching	Avoid redundant scans

Databricks SQL: warehouse speed on lakehouse data.