1 min read
Mastering Distributed Queries in Citus
I wrote “Mastering Distributed Queries in Citus” to share practical, production-minded guidance on this topic.
Types of Distributed Queries
1. Router Queries (Single-Shard)
Queries that target a specific shard based on the distribution column:
-- This query routes to a single shard
SELECT * FROM orders WHERE tenant_id = 42;
-- Explain shows it's a router query
EXPLAIN SELECT * FROM orders WHERE tenant_id = 42;
-- Task Count: 1 (single shard)
2. Scatter-Gather Queries
Queries that run on all shards and aggregate results:
-- Runs on all shards in parallel
SELECT
DATE_TRUNC('day', created_at) as day,
COUNT(*) as order_count,
SUM(amount) as total_amount
FROM orders
GROUP BY DATE_TRUNC('day', created_at)
ORDER BY day;
3. Co-located Joins
Joins between tables with the same distribution column:
-- Orders and order_items distributed by tenant_id
-- This join happens locally on each shard
SELECT
o.id,
o.created_at,
SUM(oi.quantity * oi.unit_price) as total
FROM orders o
JOIN order_items oi ON o.tenant_id = oi.tenant_id
AND o.id = oi.order_id
WHERE o.tenant_id = 42
GROUP BY o.id, o.created_at;
Analyzing Query Performance
-- Enable timing and analyze
\timing on
EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT
tenant_id,
COUNT(*) as order_count
FROM orders
WHERE created_at >= '2022-07-01'
GROUP BY tenant_id;
Optimizing Distributed Queries
Push Down Filters
-- Good: Filter pushes down to workers
SELECT * FROM orders
WHERE tenant_id = 42
AND created_at >= '2022-07-01';
-- Avoid: Functions on distribution column prevent routing
SELECT * FROM orders
WHERE tenant_id::text = '42'; -- Bad!
Use CTEs Wisely
-- CTEs can be optimized in Citus 11+
WITH recent_orders AS (
SELECT * FROM orders
WHERE created_at >= CURRENT_DATE - INTERVAL '7 days'
)
SELECT tenant_id, COUNT(*)
FROM recent_orders
GROUP BY tenant_id;
Parallel DML
-- Insert from SELECT runs in parallel
INSERT INTO order_archive (tenant_id, order_id, data)
SELECT
tenant_id,
id,
row_to_json(orders.*)::jsonb
FROM orders
WHERE created_at < '2022-01-01';
Monitoring Distributed Queries
-- View active distributed queries
SELECT * FROM citus_stat_activity
WHERE query NOT LIKE '%citus_stat_activity%';
-- Check query statistics
SELECT
queryid,
query,
calls,
mean_time,
total_time
FROM citus_stat_statements
ORDER BY total_time DESC
LIMIT 10;
Understanding these query patterns helps you leverage Citus effectively for high-performance distributed PostgreSQL applications.\n\n## Takeaways\n\nAdd a concise, personal takeaway and recommended next steps here.\n