6 min read
Result Set Caching for Query Acceleration in Synapse
Result Set Caching for Query Acceleration in Synapse
Result set caching in Azure Synapse stores query results for reuse, dramatically improving performance for repeated queries. This is especially valuable for dashboards and reports that run the same queries frequently.
Understanding Result Set Caching
When enabled, Synapse:
- Stores query results in the control node
- Returns cached results for identical queries
- Automatically invalidates cache when data changes
- Provides sub-second response for cached queries
Enabling Result Set Caching
At Database Level
-- Enable result set caching for the database
ALTER DATABASE [AnalyticsDB]
SET RESULT_SET_CACHING ON;
-- Check current setting
SELECT name, is_result_set_caching_on
FROM sys.databases
WHERE name = 'AnalyticsDB';
At Session Level
-- Disable caching for current session (useful for testing)
SET RESULT_SET_CACHING OFF;
-- Re-enable for session
SET RESULT_SET_CACHING ON;
Verifying Cache Usage
Check if Query Hit Cache
-- Run a query
SELECT
Region,
SUM(Amount) as TotalSales
FROM dbo.FactSales
GROUP BY Region;
-- Check cache status
SELECT
request_id,
command,
result_cache_hit
FROM sys.dm_pdw_exec_requests
WHERE command LIKE '%FactSales%'
ORDER BY submit_time DESC;
View Cache Statistics
-- Get cache statistics
SELECT
COUNT(*) as CachedQueries,
SUM(CASE WHEN result_cache_hit = 1 THEN 1 ELSE 0 END) as CacheHits,
SUM(CASE WHEN result_cache_hit = 0 THEN 1 ELSE 0 END) as CacheMisses
FROM sys.dm_pdw_exec_requests
WHERE submit_time > DATEADD(hour, -24, GETDATE())
AND status = 'Completed';
Cache Behavior
Automatic Cache Invalidation
The cache is automatically invalidated when:
- Underlying data changes (INSERT, UPDATE, DELETE)
- Table schema changes
- Statistics are updated
- Permissions change
-- This invalidates the cache for affected tables
INSERT INTO dbo.FactSales (OrderId, Amount)
VALUES (1001, 500.00);
-- Next query will miss cache
SELECT SUM(Amount) FROM dbo.FactSales;
Cache Lifetime
-- Cached results expire after 48 hours of inactivity
-- To manually clear cache:
DBCC DROPRESULTSETCACHE;
-- Check cache size
SELECT
cache_object_count,
space_used_mb = space_used_bytes / 1024 / 1024
FROM sys.dm_pdw_nodes_db_file_space_usage
WHERE database_id = DB_ID()
AND file_id = 65537; -- Cache file ID
Cache-Friendly Query Patterns
Consistent Query Text
-- These queries WON'T share cache (different text)
SELECT * FROM dbo.FactSales WHERE Region = 'East';
SELECT * FROM dbo.FactSales WHERE Region = 'East'; -- Extra space
select * from dbo.FactSales where Region = 'East'; -- Different case
-- Use consistent formatting for cache hits
SELECT * FROM dbo.FactSales WHERE Region = 'East';
Parameterized Queries
-- Each different value creates a new cache entry
SELECT * FROM dbo.FactSales WHERE Region = 'East';
SELECT * FROM dbo.FactSales WHERE Region = 'West';
-- Both are cached separately
Deterministic Queries
-- Non-deterministic functions prevent caching
SELECT *, GETDATE() as QueryTime -- Won't be cached
FROM dbo.FactSales;
-- Use deterministic alternatives
SELECT *, '2021-10-22' as ReportDate -- Will be cached
FROM dbo.FactSales;
Best Practices for Caching
Dashboard Queries
-- Dashboard-style queries benefit most from caching
-- Aggregated results over large datasets
-- Good candidate for caching
SELECT
YEAR(OrderDate) as Year,
MONTH(OrderDate) as Month,
Region,
COUNT(*) as OrderCount,
SUM(Amount) as Revenue
FROM dbo.FactSales
WHERE OrderDate >= '2021-01-01'
GROUP BY YEAR(OrderDate), MONTH(OrderDate), Region;
Pre-Warming Cache
-- Run common queries during off-peak hours to warm cache
DECLARE @sql NVARCHAR(MAX);
-- Query 1: Daily summary
SET @sql = '
SELECT
CAST(OrderDate AS DATE) as Date,
SUM(Amount) as DailyRevenue
FROM dbo.FactSales
WHERE OrderDate >= DATEADD(day, -30, CAST(GETDATE() AS DATE))
GROUP BY CAST(OrderDate AS DATE);
';
EXEC sp_executesql @sql;
-- Query 2: Regional summary
SET @sql = '
SELECT
Region,
COUNT(*) as Orders,
SUM(Amount) as Revenue
FROM dbo.FactSales
WHERE OrderDate >= DATEADD(day, -7, CAST(GETDATE() AS DATE))
GROUP BY Region;
';
EXEC sp_executesql @sql;
Monitoring Cache Effectiveness
-- Create view for cache monitoring
CREATE VIEW vw_CacheMetrics
AS
SELECT
CAST(submit_time AS DATE) as QueryDate,
COUNT(*) as TotalQueries,
SUM(CASE WHEN result_cache_hit = 1 THEN 1 ELSE 0 END) as CacheHits,
SUM(CASE WHEN result_cache_hit = 0 THEN 1 ELSE 0 END) as CacheMisses,
CAST(SUM(CASE WHEN result_cache_hit = 1 THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS DECIMAL(5,2)) as HitRatePercent,
AVG(CASE WHEN result_cache_hit = 1 THEN total_elapsed_time END) as AvgCacheHitTime,
AVG(CASE WHEN result_cache_hit = 0 THEN total_elapsed_time END) as AvgCacheMissTime
FROM sys.dm_pdw_exec_requests
WHERE status = 'Completed'
AND command NOT LIKE '%sys.%'
GROUP BY CAST(submit_time AS DATE);
-- Query the metrics
SELECT * FROM vw_CacheMetrics
ORDER BY QueryDate DESC;
Limitations
Queries That Cannot Be Cached
-- These queries won't be cached:
-- 1. Non-deterministic functions
SELECT *, NEWID() as UniqueId FROM dbo.FactSales;
SELECT *, GETDATE() as CurrentTime FROM dbo.FactSales;
-- 2. User-defined functions
SELECT *, dbo.MyFunction(Amount) FROM dbo.FactSales;
-- 3. Queries with TOP without ORDER BY
SELECT TOP 100 * FROM dbo.FactSales;
-- 4. Queries returning > 10GB results
SELECT * FROM dbo.VeryLargeTable;
Cache Size Limits
- Maximum single result: 10 GB
- Total cache size: Limited by control node memory
- Maximum cache entries: Based on available space
Troubleshooting
Why Isn’t My Query Cached?
-- Check query details
SELECT
request_id,
command,
result_cache_hit,
status,
error_id
FROM sys.dm_pdw_exec_requests
WHERE request_id = 'QID1234';
-- Check for errors
SELECT *
FROM sys.dm_pdw_errors
WHERE request_id = 'QID1234';
Cache Miss Reasons
-- Analyze cache miss patterns
SELECT
command,
COUNT(*) as Executions,
SUM(CASE WHEN result_cache_hit = 1 THEN 1 ELSE 0 END) as Hits,
MAX(submit_time) as LastRun
FROM sys.dm_pdw_exec_requests
WHERE status = 'Completed'
AND submit_time > DATEADD(hour, -24, GETDATE())
GROUP BY command
HAVING SUM(CASE WHEN result_cache_hit = 0 THEN 1 ELSE 0 END) >
SUM(CASE WHEN result_cache_hit = 1 THEN 1 ELSE 0 END)
ORDER BY Executions DESC;
Combining with Other Features
With Materialized Views
-- Materialized views + result set caching = maximum performance
-- Create materialized view
CREATE MATERIALIZED VIEW mv_DailySales
WITH (DISTRIBUTION = HASH(Region))
AS
SELECT
CAST(OrderDate AS DATE) as SalesDate,
Region,
SUM(Amount) as TotalAmount,
COUNT(*) as OrderCount
FROM dbo.FactSales
GROUP BY CAST(OrderDate AS DATE), Region;
-- Query the view (benefits from both MV and caching)
SELECT * FROM mv_DailySales WHERE SalesDate >= '2021-10-01';
Best Practices Summary
- Enable at database level - Default for interactive workloads
- Use consistent query text - Exact match required
- Avoid non-deterministic functions - Use literals instead
- Pre-warm cache - Run common queries in advance
- Monitor hit rates - Track cache effectiveness
- Combine with materialized views - Maximum performance
Conclusion
Result set caching is a powerful feature for accelerating dashboard and reporting workloads in Synapse. By understanding its behavior and limitations, you can design query patterns that maximize cache hits and deliver sub-second response times.
Tomorrow, we’ll explore workload management for balancing different types of queries.