July 2, 2022 1 min read

Understanding Hyperscale Citus Architecture

Azure Citus PostgreSQL Architecture Distributed Systems

Hyperscale Citus transforms PostgreSQL into a distributed database through a coordinator-worker architecture. Understanding this architecture is crucial for designing efficient distributed applications.

The Coordinator-Worker Model

In a Citus cluster, there are two types of nodes:

Coordinator Node: The entry point for all queries. It stores metadata about data distribution and routes queries to workers.
Worker Nodes: Store actual data shards and execute distributed query fragments.

How Data is Distributed

-- Create a distributed table
CREATE TABLE events (
    event_id bigserial,
    tenant_id int,
    event_type text,
    event_data jsonb,
    created_at timestamptz DEFAULT now()
);

-- Distribute the table by tenant_id
SELECT create_distributed_table('events', 'tenant_id');

When you distribute a table, Citus:

Creates 32 shards by default
Assigns each shard to a worker node
Routes queries based on the distribution column

Query Flow

-- This query is routed to a single shard
SELECT * FROM events WHERE tenant_id = 42;

-- This query runs on all shards in parallel
SELECT event_type, COUNT(*)
FROM events
GROUP BY event_type;

Viewing Cluster Metadata

-- See all distributed tables
SELECT * FROM citus_tables;

-- View shard placements
SELECT
    shardid,
    nodename,
    nodeport
FROM pg_dist_shard_placement
WHERE shardid IN (
    SELECT shardid FROM pg_dist_shard
    WHERE logicalrelid = 'events'::regclass
);

Scaling the Cluster

# Add more worker nodes to your cluster
az cosmosdb postgres cluster update \
    --name mypostgrescluster \
    --resource-group myResourceGroup \
    --node-count 4

After adding nodes, rebalance your shards:

SELECT rebalance_table_shards();

Understanding this architecture helps you make informed decisions about data modeling and query optimization in your distributed PostgreSQL deployments.