2 min read
Understanding Hyperscale Citus Architecture
Hyperscale Citus transforms PostgreSQL into a distributed database through a coordinator-worker architecture. Understanding this architecture is crucial for designing efficient distributed applications.
The Coordinator-Worker Model
In a Citus cluster, there are two types of nodes:
- Coordinator Node: The entry point for all queries. It stores metadata about data distribution and routes queries to workers.
- Worker Nodes: Store actual data shards and execute distributed query fragments.
How Data is Distributed
-- Create a distributed table
CREATE TABLE events (
event_id bigserial,
tenant_id int,
event_type text,
event_data jsonb,
created_at timestamptz DEFAULT now()
);
-- Distribute the table by tenant_id
SELECT create_distributed_table('events', 'tenant_id');
When you distribute a table, Citus:
- Creates 32 shards by default
- Assigns each shard to a worker node
- Routes queries based on the distribution column
Query Flow
-- This query is routed to a single shard
SELECT * FROM events WHERE tenant_id = 42;
-- This query runs on all shards in parallel
SELECT event_type, COUNT(*)
FROM events
GROUP BY event_type;
Viewing Cluster Metadata
-- See all distributed tables
SELECT * FROM citus_tables;
-- View shard placements
SELECT
shardid,
nodename,
nodeport
FROM pg_dist_shard_placement
WHERE shardid IN (
SELECT shardid FROM pg_dist_shard
WHERE logicalrelid = 'events'::regclass
);
Scaling the Cluster
# Add more worker nodes to your cluster
az cosmosdb postgres cluster update \
--name mypostgrescluster \
--resource-group myResourceGroup \
--node-count 4
After adding nodes, rebalance your shards:
SELECT rebalance_table_shards();
Understanding this architecture helps you make informed decisions about data modeling and query optimization in your distributed PostgreSQL deployments.