1 min read
Azure Synapse Dedicated SQL Pools (Preview): Enterprise Data Warehousing
I wrote “Azure Synapse Dedicated SQL Pools (Preview): Enterprise Data Warehousing” to share practical, production-minded guidance on this topic.
Note: While dedicated SQL pools are mature (evolved from Azure SQL DW), the Synapse unified experience is still in preview.
Architecture
┌──────────────────────────────────────────────┐
│ Control Node │
│ (Query parsing, optimization) │
└─────────────────────┬────────────────────────┘
│
┌─────────────────┼─────────────────┐
│ │ │
┌───┴───┐ ┌───┴───┐ ┌───┴───┐
│Compute│ │Compute│ │Compute│
│Node 1 │ │Node 2 │ │Node N │
└───┬───┘ └───┬───┘ └───┬───┘
│ │ │
┌───┴────────────────┴────────────────┴───┐
│ Azure Storage │
│ (60 distributions across nodes) │
└─────────────────────────────────────────┘
Creating a Pool
-- In Synapse workspace
CREATE DATABASE mywarehouse
COLLATE Latin1_General_100_BIN2_UTF8;
-- Scale DWUs (Data Warehouse Units)
ALTER DATABASE mywarehouse
MODIFY (SERVICE_OBJECTIVE = 'DW1000c');
Table Distribution
Hash Distribution
-- Best for large fact tables
CREATE TABLE FactSales (
SalesKey BIGINT NOT NULL,
DateKey INT NOT NULL,
ProductKey INT NOT NULL,
CustomerKey INT NOT NULL,
Amount DECIMAL(18,2)
)
WITH (
DISTRIBUTION = HASH(CustomerKey),
CLUSTERED COLUMNSTORE INDEX
);
Round Robin
-- For staging tables or when no good hash column
CREATE TABLE StgSales (
SalesKey BIGINT,
DateKey INT,
Amount DECIMAL(18,2)
)
WITH (
DISTRIBUTION = ROUND_ROBIN,
HEAP
);
Replicated
-- For small dimension tables (< 2GB)
CREATE TABLE DimProduct (
ProductKey INT NOT NULL,
ProductName NVARCHAR(100),
Category NVARCHAR(50)
)
WITH (
DISTRIBUTION = REPLICATE,
CLUSTERED COLUMNSTORE INDEX
);
Best Practices
Distribution Key Selection
-- Good: High cardinality, evenly distributed, frequently joined
DISTRIBUTION = HASH(CustomerKey) -- millions of customers
-- Bad: Low cardinality, skewed data
DISTRIBUTION = HASH(Country) -- few countries, uneven data
Columnstore Indexes
-- Default for fact tables - best compression and performance
CREATE TABLE FactSales (...)
WITH (CLUSTERED COLUMNSTORE INDEX);
-- Check index health
SELECT
object_name(object_id) AS table_name,
avg_fragmentation_in_percent
FROM sys.dm_pdw_dms_external_work
WHERE type = 'COLUMNSTORE_REBUILD';
Loading Data
COPY Command (Recommended)
COPY INTO FactSales
FROM 'https://mystorageaccount.blob.core.windows.net/data/sales/*.parquet'
WITH (
FILE_TYPE = 'PARQUET',
CREDENTIAL = (IDENTITY = 'Managed Identity')
);
PolyBase
-- External data source
CREATE EXTERNAL DATA SOURCE AzureBlob
WITH (
TYPE = HADOOP,
LOCATION = 'wasbs://data@mystorageaccount.blob.core.windows.net',
CREDENTIAL = MyBlobCredential
);
-- External file format
CREATE EXTERNAL FILE FORMAT ParquetFormat
WITH (FORMAT_TYPE = PARQUET);
-- External table
CREATE EXTERNAL TABLE ExtSales (
SalesKey BIGINT,
Amount DECIMAL(18,2)
)
WITH (
LOCATION = '/sales/',
DATA_SOURCE = AzureBlob,
FILE_FORMAT = ParquetFormat
);
-- Load via CTAS
CREATE TABLE FactSales
WITH (DISTRIBUTION = HASH(CustomerKey))
AS SELECT * FROM ExtSales;
Performance Tuning
-- Check data skew
DBCC PDW_SHOWSPACEUSED('FactSales');
-- Explain query plan
EXPLAIN SELECT * FROM FactSales WHERE CustomerKey = 123;
-- Update statistics
UPDATE STATISTICS FactSales;
-- Rebuild indexes
ALTER INDEX ALL ON FactSales REBUILD;
Pause and Resume
# Pause to save costs
az synapse sql pool pause --name mypool --workspace-name myws --resource-group myRG
# Resume when needed
az synapse sql pool resume --name mypool --workspace-name myws --resource-group myRG
Dedicated SQL pools deliver warehouse-scale analytics performance.\n\n## Takeaways\n\nAdd a concise, personal takeaway and recommended next steps here.\n