Skip to content

Distributed SQL Databases

When single-server PostgreSQL reaches its limits, distributed databases provide horizontal scaling. Each has different trade-offs between consistency, complexity, and workload suitability.

Comparison

Feature Citus CockroachDB Greenplum YugabyteDB
Base PG extension Custom (PG wire) PG fork Custom (PG wire)
Consistency Per-shard Serializable Per-segment Serializable
Best for Multi-tenant SaaS Geo-distributed OLTP Analytics/DWH Distributed OLTP
OLTP Good Good Poor Good
OLAP Good Moderate Excellent Moderate
Complexity Low (extension) Medium High (separate product) Medium

Citus (PostgreSQL Extension)

Acquired by Microsoft (2019). Transforms single PostgreSQL into distributed database.

Table Types

  • Distributed: Hash-sharded across workers by distribution column
  • Reference: Replicated to all workers (dimension/lookup tables)
  • Local: On coordinator only
-- Create distributed table
SELECT create_distributed_table('events', 'tenant_id');

-- Queries automatically routed to relevant shard
SELECT count(*) FROM events WHERE tenant_id = 42;

-- Cross-shard queries supported but slower
SELECT tenant_id, count(*) FROM events GROUP BY tenant_id;

Since Citus 11.0: Queries can run from any node. citus.shard_replication_factor default 1 (no fault tolerance).

CockroachDB

PostgreSQL-compatible distributed SQL. Multi-master, automatic sharding, serializable isolation.

# 3-node cluster
cockroach start --certs-dir=certs --advertise-addr=node1 --join=node1,node2,node3
cockroach init --certs-dir=certs --host=node1

Geo-distributed: 9 nodes across 3 regions. Requires time sync (500ms drift tolerance). Multi-master: writes on any region visible on all others.

Greenplum MPP

Open-source MPP RDBMS based on PostgreSQL. Columnar storage, massively parallel processing.

Architecture

  • Master host: Entry point, SQL coordinator, no user data
  • Segment hosts: 2-8 PostgreSQL segments per host, primary + mirror
  • Interconnect: Dedicated high-speed network

Storage Types

Type Best For
Heap Concurrent UPDATE/DELETE/INSERT
Append-Optimized (AO) Batch loads + analytics, rarely updated
Row-oriented Wide SELECT, mixed workloads
Column-oriented Narrow SELECT on wide tables, aggregations

Performance Rules

  • Speed = CPU cores / data ratio per node
  • 1 query = max 1 core per segment
  • Cluster speed = slowest segment - balanced distribution critical
  • Design insert-only model; truncate daily partition before loading

OLTP vs OLAP

OLTP: Normalized, high insert/update rate, many concurrent users, small transactions, indexed for fast access. Three-tier architecture.

OLAP: Historical data reports, columnar storage, denormalized fact tables with dimensions, infrequent updates, fast aggregation reads.

MPP (Massive Parallel Processing)

Data and compute distributed across leader + worker nodes. Shared-nothing architecture.

Pros: Fast large-data processing, easy horizontal scaling, fault tolerance via mirroring. Cons: High resource requirements, poor for many simple queries, segment distribution imbalance.

ClickHouse

Column-oriented OLAP database (C++, Yandex). Vectorized query processing, efficient compression, built-in replication/sharding. Rich SQL dialect with JSON, IP, UUID, Array types. Integration with Kafka, MySQL, PostgreSQL.

Other Solutions

  • YugabyteDB: PG-compatible distributed SQL with serializable isolation
  • Postgres-BDR / Bucardo: Multi-master using logical replication
  • Shardman (Postgres Pro): FDW-based sharding, hash-only distribution, significant limitations
  • Arenadata DB: Russian commercial Greenplum distribution

Key Facts

  • PostgreSQL is NOT designed for big data: transactions/WAL impede large operations, no built-in sharding
  • Citus is the simplest path from single PostgreSQL to distributed
  • CockroachDB provides strongest consistency (serializable) across distributed nodes
  • Greenplum excels at analytics but is poor for OLTP
  • Sharding key must have high cardinality and uniform distribution

Gotchas

  • Citus: shard_replication_factor = 1 (default) means no fault tolerance for individual shards
  • CockroachDB: 500ms time drift tolerance - NTP synchronization required
  • Greenplum: 1 core per query per segment limits fine-grained query parallelism
  • Shardman: no ALTER TABLE (except owner/NULL/ADD column), no foreign keys, SELECT-only transport
  • Multi-master replication has complex conflict resolution - avoid unless truly needed

See Also