Back to Blog
2 min read

Azure Synapse Serverless SQL (Preview): Query Your Data Lake

Azure Synapse Analytics is currently in preview. One of its most exciting features is Serverless SQL (also called SQL on-demand), which lets you query your data lake files with T-SQL. No provisioning, pay per query.

Note: Synapse is in public preview with GA expected in late 2020. Features may change.

Basic Query

SELECT TOP 100 *
FROM OPENROWSET(
    BULK 'https://mydatalake.dfs.core.windows.net/raw/sales/*.parquet',
    FORMAT = 'PARQUET'
) AS sales

CSV Files

SELECT *
FROM OPENROWSET(
    BULK 'https://mydatalake.dfs.core.windows.net/raw/customers/*.csv',
    FORMAT = 'CSV',
    PARSER_VERSION = '2.0',
    HEADER_ROW = TRUE,
    FIELDTERMINATOR = ',',
    ROWTERMINATOR = '\n'
) WITH (
    customer_id INT,
    name VARCHAR(100),
    email VARCHAR(256),
    created_date DATE
) AS customers

Creating Views

CREATE VIEW curated.vw_monthly_sales
AS
SELECT
    year(sale_date) AS year,
    month(sale_date) AS month,
    region,
    SUM(amount) AS total_sales
FROM OPENROWSET(
    BULK 'https://mydatalake.dfs.core.windows.net/raw/sales/**',
    FORMAT = 'PARQUET'
) AS sales
GROUP BY year(sale_date), month(sale_date), region

External Tables

CREATE EXTERNAL DATA SOURCE DataLake
WITH (LOCATION = 'https://mydatalake.dfs.core.windows.net/curated/');

CREATE EXTERNAL FILE FORMAT ParquetFormat
WITH (FORMAT_TYPE = PARQUET);

CREATE EXTERNAL TABLE curated.sales (
    sale_id BIGINT,
    customer_id INT,
    amount DECIMAL(18,2),
    sale_date DATE
)
WITH (
    LOCATION = 'sales/',
    DATA_SOURCE = DataLake,
    FILE_FORMAT = ParquetFormat
);

Cost Management

Pricing: ~$5 per TB scanned

Optimize by:

  1. Using Parquet (columnar, compressed)
  2. Partitioning by date or other filter columns
  3. Selecting only needed columns
  4. Filtering early in queries
-- Good: Selective query
SELECT sale_date, amount
FROM curated.sales
WHERE sale_date >= '2020-01-01'

-- Bad: Full table scan
SELECT * FROM curated.sales

Serverless SQL democratizes data lake access for SQL users.

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.