3 min read
Databricks SQL Analytics: BI on the Lakehouse
Databricks SQL Analytics brings SQL analytics to the lakehouse. Query Delta Lake with familiar SQL, visualize with built-in dashboards, and connect to BI tools.
SQL Endpoints
-- Create SQL endpoint (via UI or API)
-- SQL endpoints provide serverless compute for queries
-- Query Delta tables
SELECT
product_category,
SUM(revenue) as total_revenue,
COUNT(DISTINCT customer_id) as unique_customers
FROM delta.`/mnt/data/sales`
WHERE order_date >= '2021-01-01'
GROUP BY product_category
ORDER BY total_revenue DESC
SQL Endpoint Configuration
{
"name": "analytics-endpoint",
"cluster_size": "Medium",
"min_num_clusters": 1,
"max_num_clusters": 10,
"auto_stop_mins": 30,
"enable_photon": true,
"spot_instance_policy": "COST_OPTIMIZED",
"channel": "CHANNEL_NAME_CURRENT"
}
Query History and Optimization
-- View query history
SELECT
query_id,
query_text,
execution_time_ms,
rows_produced,
bytes_read
FROM system.query_history
WHERE start_time >= current_date() - INTERVAL 7 DAYS
ORDER BY execution_time_ms DESC
LIMIT 20
-- Analyze query plan
EXPLAIN FORMATTED
SELECT * FROM sales WHERE product_id = 'ABC123'
Photon Engine
Photon accelerates SQL queries on Delta Lake:
-- Photon automatically optimizes:
-- - Joins
-- - Aggregations
-- - Filters on Parquet/Delta
-- Check Photon usage
SELECT
query_id,
photon_enabled,
photon_cpu_time_ms,
total_time_ms
FROM system.query_history
WHERE start_time >= current_date()
Built-in Visualizations
-- Query results can be visualized directly
-- Supported chart types:
-- - Bar, Line, Area charts
-- - Pie, Scatter plots
-- - Maps
-- - Tables with formatting
-- - Counters and gauges
SELECT
DATE_TRUNC('month', order_date) as month,
SUM(revenue) as monthly_revenue
FROM sales
WHERE order_date >= '2020-01-01'
GROUP BY 1
ORDER BY 1
-- Visualize as Line Chart
Dashboards
# Create dashboard via API
import requests
dashboard = {
"name": "Sales Dashboard",
"tags": ["sales", "monthly"],
"widgets": [
{
"visualization_id": "viz_123",
"options": {
"position": {"col": 0, "row": 0, "size_x": 6, "size_y": 4}
}
}
]
}
response = requests.post(
f"{workspace_url}/api/2.0/preview/sql/dashboards",
headers={"Authorization": f"Bearer {token}"},
json=dashboard
)
Alerts
-- Create alert on query results
-- Alert when daily revenue drops below threshold
SELECT SUM(revenue) as daily_revenue
FROM sales
WHERE order_date = current_date()
-- Alert configuration:
-- Trigger when: daily_revenue < 10000
-- Frequency: Every 1 hour
-- Notify: email, Slack, PagerDuty
Connect BI Tools
Power BI
// Connection string
Server: adb-xxxxx.azuredatabricks.net
HTTP Path: /sql/1.0/endpoints/xxxxx
Authentication: Azure AD
Tableau
Server: adb-xxxxx.azuredatabricks.net
Port: 443
HTTP Path: /sql/1.0/endpoints/xxxxx
Authentication: Personal Access Token
Query Parameters
-- Parameterized queries
SELECT *
FROM sales
WHERE product_category = {{ category }}
AND order_date BETWEEN {{ start_date }} AND {{ end_date }}
AND revenue >= {{ min_revenue }}
-- Parameters appear as input fields in the UI
Access Control
-- Grant query access
GRANT SELECT ON TABLE sales TO `analysts@company.com`;
GRANT USAGE ON SCHEMA analytics TO `analysts@company.com`;
-- Grant dashboard access
GRANT VIEW ON DASHBOARD `Sales Dashboard` TO `managers@company.com`;
Cost Optimization
| Tip | Benefit |
|---|---|
| Use Photon | 2-8x faster queries |
| Auto-stop | No idle costs |
| Spot instances | Lower compute costs |
| Query caching | Avoid redundant scans |
Databricks SQL: warehouse speed on lakehouse data.