Skip to content

Crunchy Data Operator Overview

jon-funk edited this page Jan 23, 2025 · 3 revisions

Crunchy Data Operator

  • The OpenShift platform has an operator installed that is managed by the Platform Services team, this operator runs on a control loop where it constantly polls all PostgresCluster workload kinds in the cluster it's installed in. The PostgresCluster kind is a Custom Resource Definition (CRD)
  • All teams (including us) can create PostgresCluster objects in our namespaces and the operator will detect creation/changes and in response will provision new workloads to construct and manage the postgres cluster

Architecture

image

Each PostgresCluster CRD created in a namespace results in the following workloads:

  • Two or more statefulsets which have 1 pod each and 4 containers
    • Pod contains:
      • The patroni postgres container itself
      • A container to manage certs
      • pgbackrest (the backup creation and management solution)
      • a configuration container for pgbackrest
    • A volume to store postgres data
    • A volume to store postgres WAL transaction logs
  • A statefulset to host repositories for managing backups, see backup docs

Regarding the statefulsets running the pods containing the database(s), one will be classified as a leader / primary and generally recieve the majority of traffic, while the secondaries can* receive read traffic, and are "on standby" in the event the leader has an outage so that traffic can failover to a secondary as it becomes a new leader

Reference: https://access.crunchydata.com/documentation/postgres-operator/latest/architecture/high-availability

Leader Management

As part of patroni, the postgres containers will determine who's the leader during start up using the raft algorithm

You can determine which statefulset is the leader/primary by checking the logs:

A secondary (failover) outputs:

2025-01-23 20:56:42,017 INFO: no action. I am (postgres-crunchy-dev-db-tmv8-0), a secondary, and following a leader (postgres-crunchy-dev-db-b2wh-0)

A leader outputs:

2025-01-23 20:57:21,985 INFO: no action. I am (postgres-crunchy-dev-db-b2wh-0), the leader with the lock

Notes

GitHub Repository: https://github.com/CrunchyData/postgres-operator

Docs start: https://access.crunchydata.com/documentation/postgres-operator/latest/quickstart

Raft algorithm details and visualization: https://raft.github.io/