使用repmgr实现PostgreSQL主备高可用性

发表时间: 2021-11-17 19:01

前面已经在两台CentOS 7服务器上从源码编译安装了PostgreSQL，当前这两台服务器的PostgreSQL数据库是相互独立，它们之间没有任何关系。本文将介绍基于repmgr的postgresql主备高可用方案，使用repmgr将这两个PostgreSQL设置为一个基本的复制集群，其占用一个作为主服务(primary)，另一个作为备用服务(standby)。

repmgr是一个用于管理PostgreSQL集群的复制和故障转移的开源工具套件，它增强了PostgreSQL内建的热备功能，可以使用命令工具设置备用服务器、监控复制和执行任务管理，进行故障转移或者手动切换。

1.安装环境和准备工作

两台CentOS 7服务器如下:

192.168.100.151    node1192.168.100.152    node2

前面已经在这两台服务器上编译安装了PostgreSQL 13.5。要使用repmgr进行主备切换(switchover)，需要用到node1和node2之间postgres用户的ssh无密码使用key直接登录，这个需要提前配置好。此外，postgres用户还需要无密执行systemd启动和停止数据服务。

echo "postgres ALL = (root) NOPASSWD:/usr/bin/systemctl start pgserver,/usr/bin/systemctl restart pgserver,/usr/bin/systemctl stop pgserver,/usr/bin/systemctl reload pgserver,/usr/bin/systemctl status pgserver" | sudo tee /etc/sudoers.d/postgres

下载并解压缩repmgr的源码:

tar -zxvf repmgr-5.3.0.tar.gz

安装相关工具及相关依赖:

yum install -y flex libselinux-devel libxml2-devel libxslt-devel openssl-devel pam-devel readline-devel

2.编译安装repmgr

接下来在node1和node2这两台服务器上从源码编译安装repmgr。进入到源码解压缩路径完成编译安装:

cd repmgr-5.3.0export PG_CONFIG=/usr/local/pgsql/bin/pg_config./configuremakemake install

编译安装后，在PostgreSQL安装目录的bin目录中会多出repmgr和repmgrd两个二进制文件:

ls /usr/local/pgsql/bin | grep repmgrrepmgrrepmgrd

3.repmgr配置

3.1 PostgreSQL的配置

这里选择node1作为主库，主库PostgreSQL的配置文件
/home/postgres/data/postgresql.conf中关于复制的配置需要做一下调整:

max_wal_senders = 10max_replication_slots = 10wal_level = replicahot_standby = onarchive_mode = onarchive_command = '/bin/true'wal_log_hints=onshared_preload_libraries = 'repmgr'

修改完成后重启数据库服务systemctl restart pgserver，确保没有错误发生。

下面是一个可供参考的完整的postgresql.conf配置示例文件:

data_directory = '/home/postgres/data'hba_file = '/home/postgres/data/pg_hba.conf'ident_file = '/home/postgres/data/pg_ident.conf'external_pid_file = '/home/postgres/data/postmaster.pid'listen_addresses = '*'port = 5432max_connections = 150superuser_reserved_connections = 10authentication_timeout = 30spassword_encryption = md5shared_buffers = 512MBdynamic_shared_memory_type = posixwal_level = replicafsync = onwal_log_hints = onmax_wal_size = 1GBmin_wal_size = 80MBwal_keep_size = 1GBarchive_mode = onarchive_command = '/bin/true'max_wal_senders = 10max_replication_slots = 10hot_standby = onlog_destination = 'stderr'logging_collector = onlog_directory = 'log'log_filename = 'postgresql-%a.log'log_truncate_on_rotation = onlog_rotation_age = 1dlog_rotation_size = 0log_line_prefix = '%m [%p] 'log_timezone = 'PRC'row_security = ondatestyle = 'iso, mdy'timezone = 'PRC'lc_messages = 'en_US.UTF-8'lc_monetary = 'zh_CN.utf8'lc_numeric = 'zh_CN.UTF-8'lc_time = 'zh_CN.UTF-8'default_text_search_config = 'pg_catalog.english'shared_preload_libraries = 'repmgr'

3.2 创建repmgr用户和数据库

接下来在主服务器node1的PostgreSQL上创建一个专用超级用户repmgr和一个专用的数据库repmgr。

/usr/local/pgsql/bin/createuser -s repmgr/usr/local/pgsql/bin/createdb repmgr -O repmgr/usr/local/pgsql/bin/psql -p5432 -h127.0.0.1 -Upostgres -c 'ALTER USER repmgr SET search_path TO repmgr, "$user", public;'

repmgr将安装repmgr扩展，它将创建一个包含repmgr元数据表以及其他函数和视图的repmgr schema。

3.3 在pg_hba.conf中配置认证

在主服务器node1的
/home/postgres/data/pg_hba.conf中配置repmgr用户的认证权限，确保其有适当的权限，并且能够以复制模式连接:

local   replication   repmgr                              trusthost    replication   repmgr      127.0.0.1/32            trusthost    replication   repmgr      192.168.100.151/32       trusthost    replication   repmgr      192.168.100.152/32       trustlocal   repmgr        repmgr                              trusthost    repmgr        repmgr      127.0.0.1/32            trusthost    repmgr        repmgr      192.168.100.151/32       trusthost    repmgr        repmgr      192.168.100.152/32       trust

修改完配置需要重启服务sudo systemctl restart pgserver。

3.4 备用节点node2上的准备工作

在备库node2上停止PostgreSQL服务，并删除其数据目录:

systemctl stop pgserverrm -rf /home/postgres/data/*

从测试一下从备节点node2到主节点node1的连接，确保可以连接上:

/usr/local/pgsql/bin/psql 'host=node1 user=repmgr dbname=repmgr connect_timeout=2'

因为这里是第一次初始主备PostgreSQL服务，我们将node1服务器作为主库，将node2服务器作为从库。

3.5 创建repmgr.conf配置文件

在node1和node2上创建
/usr/local/pgsql/repmgr.conf配置文件

node1的repmgr.conf:

node_id=1node_name=node1conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2'data_directory='/home/postgres/data'pg_bindir='/usr/local/pgsql/bin/'ssh_options='-q -o ConnectTimeout=10'failover='automatic'promote_command='/usr/local/pgsql/bin/repmgr standby promote -f /usr/local/pgsql/repmgr.conf --log-to-file'follow_command='/usr/local/pgsql/bin/repmgr standby follow -f /usr/local/pgsql/repmgr.conf --log-to-file --upstream-node-id=%n'service_start_command='sudo systemctl start pgserver'service_stop_command='sudo systemctl stop pgserver'service_restart_command='sudo systemctl restart pgserver'service_reload_command='sudo systemctl reload pgserver'

node2的repmgr.conf:

node_id=2node_name=node2conninfo='host=node2 user=repmgr dbname=repmgr connect_timeout=2'data_directory='/home/postgres/data'pg_bindir='/usr/local/pgsql/bin/'ssh_options='-q -o ConnectTimeout=10'failover='automatic'promote_command='/usr/local/pgsql/bin/repmgr standby promote -f /usr/local/pgsql/repmgr.conf --log-to-file'follow_command='/usr/local/pgsql/bin/repmgr standby follow -f /usr/local/pgsql/repmgr.conf --log-to-file --upstream-node-id=%n'service_start_command='sudo systemctl start pgserver'service_stop_command='sudo systemctl stop pgserver'service_restart_command='sudo systemctl restart pgserver'service_reload_command='sudo systemctl reload pgserver'

3.6 注册主节点

为了使repmgr支持复制集群，主节点必须注册到repmgr。这将安装repmgr扩展和元数据对象，并为主服务器添加一个元数据记录。在node1上使用repmgr命令将node1注册为主库:

su postgres/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf primary register --forceINFO: connecting to primary database...NOTICE: attempting to install extension "repmgr"NOTICE: "repmgr" extension successfully installedNOTICE: primary node record (ID: 1) registered

查看一下此时集群状态，只有node1这一个主节点:

/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string----+-------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------- 1  | node1 | primary | * running |          | default  | 100      | 1        | host=node1 user=repmgr dbname=repmgr connect_timeout=2

repmgr元数据表中的数据如下:

/usr/local/pgsql/bin/psql -p5432 -h127.0.0.1 -Urepmgr\xExpanded display is on.repmgr=# SELECT * FROM repmgr.nodes;-[ RECORD 1 ]----+-------------------------------------------------------node_id          | 1upstream_node_id |active           | tnode_name        | node1type             | primarylocation         | defaultpriority         | 100conninfo         | host=node1 user=repmgr dbname=repmgr connect_timeout=2repluser         | repmgrslot_name        |config_file      | /usr/local/pgsql/repmgr.conf

一个PostgreSQL复制集群中的每个服务器都将拥有自己的记录。在使用repmgrd时，当节点状态或角色发生变化时，upstream_node_id、active和type字段会更新。

3.7 克隆生成备节点数据

在备节点node2上，测试(dry run)一下能否clone主库的数据:

su postgres/usr/local/pgsql/bin/repmgr -h node1 -U repmgr -d repmgr -f /usr/local/pgsql/repmgr.conf standby clone --dry-runNOTICE: destination directory "/home/postgres/data" providedINFO: connecting to source nodeDETAIL: connection string is: host=node1 user=repmgr dbname=repmgrDETAIL: current installation size is 31 MBINFO: "repmgr" extension is installed in database "repmgr"INFO: replication slot usage not requested;  no replication slot will be set up for this standbyINFO: parameter "max_wal_senders" set to 10NOTICE: checking for available walsenders on the source node (2 required)INFO: sufficient walsenders available on the source nodeDETAIL: 2 required, 10 availableNOTICE: checking replication connections can be made to the source server (2 required)INFO: required number of replication connections could be made to the source serverDETAIL: 2 replication connections requiredNOTICE: standby will attach to upstream node 1HINT: consider using the -c/--fast-checkpoint optionINFO: would execute:  /usr/local/pgsql/bin/pg_basebackup -l "repmgr base backup"  -D /home/postgres/data -h node1 -p 5432 -U repmgr -X streamINFO: all prerequisites for "standby clone" are met

以上测试没有问题的话，在从库node2上运行下面的命令clone主库node1上的数据:

/usr/local/pgsql/bin/repmgr -h node1 -U repmgr -d repmgr -f /usr/local/pgsql/repmgr.conf standby cloneNOTICE: destination directory "/home/postgres/data" providedINFO: connecting to source nodeDETAIL: connection string is: host=node1 user=repmgr dbname=repmgrDETAIL: current installation size is 31 MBINFO: replication slot usage not requested;  no replication slot will be set up for this standbyNOTICE: checking for available walsenders on the source node (2 required)NOTICE: checking replication connections can be made to the source server (2 required)INFO: checking and correcting permissions on existing directory "/home/postgres/data"NOTICE: starting backup (using pg_basebackup)...HINT: this may take some time; consider using the -c/--fast-checkpoint optionINFO: executing:  /usr/local/pgsql/bin/pg_basebackup -l "repmgr base backup"  -D /home/postgres/data -h node1 -p 5432 -U repmgr -X streamNOTICE: standby clone (using pg_basebackup) completeNOTICE: you can now start your PostgreSQL serverHINT: for example: sudo systemctl start pgserverHINT: after starting the server, you need to register this standby with "repmgr standby register"

实际上使用了pg_basebackup命令clone了主节点的数据目录文件。主节点数据目录中的配置文件也都被复制到了备节点的数据目录中，包括postgresql.conf, postgresql.auto.conf, pg_hba.conf和pg_ident.conf。如果不需要针对备节点定制修改这些配置的话，就可以启动备节点的数据库服务了:

systemctl start pgserver

3.9 验证复制是否正常工作

连接到主节点数据库执行下面的查询:

/usr/local/pgsql/bin/psql -p5432 -h127.0.0.1 -Urepmgrrepmgr=# \xExpanded display is on.repmgr=# SELECT * FROM pg_stat_replication;-[ RECORD 1 ]----+------------------------------pid              | 3636usesysid         | 16384usename          | repmgrapplication_name | node2client_addr      | 192.168.100.152client_hostname  |client_port      | 44160backend_start    | 2021-11-14 14:48:17.856758+08backend_xmin     |state            | streamingsent_lsn         | 0/40002D0write_lsn        | 0/40002D0flush_lsn        | 0/40002D0replay_lsn       | 0/40002D0write_lag        |flush_lag        |replay_lag       |sync_priority    | 0sync_state       | asyncreply_time       | 2021-11-14 14:50:48.201894+08

这表明之前克隆的备用节点(在application_name字段中显示的node2)已经从IP地址192.168.100.152连接到主节点。

连接到备库执行下面的查询:

/usr/local/pgsql/bin/psql -p5432 -h127.0.0.1 -UrepmgrsELECT * FROM pg_stat_wal_receiver;-[ RECORD 1 ]---------+----------------------------------pid                   | 2927status                | streamingreceive_start_lsn     | 0/5000000receive_start_tli     | 1written_lsn           | 0/50000A0flushed_lsn           | 0/50000A0received_tli          | 1last_msg_send_time    | 2021-11-14 14:55:55.416634+08last_msg_receipt_time | 2021-11-14 14:55:55.415109+08latest_end_lsn        | 0/50000A0latest_end_time       | 2021-11-17 14:55:55.408763+08slot_name             |sender_host           | node1sender_port           | 5432

3.9 注册备节点

在node2上使用repmgr命令将node2注册为备节点:

su postgres/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf standby register --forceINFO: connecting to local node "node2" (ID: 2)INFO: connecting to primary databaseWARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)INFO: standby registration completeNOTICE: standby node "node2" (ID: 2) successfully registered

查看一下此时集群状态，node2作为备节点被加入到了集群中:

/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string----+-------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------- 1  | node1 | primary | * running |          | default  | 100      | 1        | host=node1 user=repmgr dbname=repmgr connect_timeout=2 2  | node2 | standby |   running | node1    | default  | 100      | 1        | host=node2 user=repmgr dbname=repmgr connect_timeout=2

3.10 主备手动切换测试

现在主节点node1和备节点node2都正常运行，在备节点node1上执行切换操作，手动将node1切换为主节点:

/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf standby switchoverNOTICE: executing switchover on node "node2" (ID: 2)NOTICE: attempting to pause repmgrd on 2 nodesNOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standbyNOTICE: stopping current primary node "node1" (ID: 1)NOTICE: issuing CHECKPOINT on node "node1" (ID: 1)DETAIL: executing server command "sudo systemctl stop pgserver"INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")NOTICE: current primary has been cleanly shut down at location 0/6000028NOTICE: promoting standby to primaryDETAIL: promoting server "node2" (ID: 2) using pg_promote()NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to completeNOTICE: STANDBY PROMOTE successfulDETAIL: server "node2" (ID: 2) was successfully promoted to primaryWARNING: node "node1" attached in state "startup"INFO: waiting for node "node1" (ID: 1) to connect to new primary; 1 of max 60 attempts (parameter "node_rejoin_timeout")DETAIL: node "node2" (ID: 1) is currently attached to its upstream node in state "startup"NOTICE: node "node2" (ID: 2) promoted to primary, node "node1" (ID: 1) demoted to standbyNOTICE: switchover was successfulDETAIL: node "node2" is now primary and node "node1" is attached as standbyNOTICE: STANDBY SWITCHOVER has completed successfully

上面的命令成功的将备节点node2切换为主节点，主节点node1重启后作为新的备节点加入到集群:

/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster showID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string----+-------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------- 1  | node1 | standby |   running | node2    | default  | 100      | 1        | host=node1 user=repmgr dbname=repmgr connect_timeout=2 2  | node2 | primary | * running |          | default  | 100      | 2        | host=node2 user=repmgr dbname=repmgr connect_timeout=2

4.使用repmgrd实现自动故障转移

repmgrd是一个管理和监控守护进程，运行在PostgreSQL复制集群的每个节点上,可以自动执行故障转移和更新备用节点等操作以跟上新的主节点，并提供关于每个备用服务器状态的监视信息。

4.1 启动repmgrd

在node1和node2上创建repmgrd的systemd配置文件
/etc/systemd/system/repmgrd.service：

[Unit]Description=A replication manager, and failover management tool for PostgreSQLAfter=syslog.targetAfter=network.targetAfter=pgserver.service[Service]Type=forkingUser=postgresGroup=postgres# PID filePIDFile=/home/postgres/data/repmgrd.pid# Location of repmgr conf file:Environment=REPMGRDCONF=/usr/local/pgsql/repmgr.confEnvironment=PIDFILE=/home/postgres/data/repmgrd.pid# Where to send early-startup messages from the server# This is normally controlled by the global default set by systemd# StandardOutput=syslogExecStart=/usr/local/pgsql/bin/repmgrd -f ${REPMGRDCONF} -p ${PIDFILE} -d --verboseExecStop=/usr/bin/kill -TERM $MAINPIDExecReload=/usr/bin/kill -HUP $MAINPID# Give a reasonable amount of time for the server to start up/shut downTimeoutSec=300[Install]WantedBy=multi-user.target

在node1和node2上启动repmgrd服务:

systemctl enable repmgrd --now

4.2 自动故障转移测试

当前集群的状态如下:

/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string----+-------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------- 1  | node1 | standby |   running | node2    | default  | 100      | 2        | host=node1 user=repmgr dbname=repmgr connect_timeout=2 2  | node2 | primary | * running |          | default  | 100      | 2        | host=node2 user=repmgr dbname=repmgr connect_timeout=2

当前node2为主节点，现在模拟故障，将node2上数据库服务停掉:

systemctl stop pgserver

等一小会儿，node1会成为新的主节点，node2会被标记为failed状态:

/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string----+-------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------- 1  | node1 | primary | * running |          | default  | 100      | 3        | host=node1 user=repmgr dbname=repmgr connect_timeout=2 2  | node2 | primary | - failed  | ?        | default  | 100      |          | host=node2 user=repmgr dbname=repmgr connect_timeout=2

此时如果将node2的数据库服务重新起来，就会出现下面的状态，这个也是这种主备高可用方案的缺点，出现了两个主节点，相当于发生了脑裂。

/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string----+-------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------- 1  | node1 | primary | * running |          | default  | 100      | 3        | host=node1 user=repmgr dbname=repmgr connect_timeout=2 2  | node2 | primary | ! running |          | default  | 100      | 2        | host=node2 user=repmgr dbname=repmgr connect_timeout=2

node2的状态是不对的，将其服务再次停止，并重新以备节点的角色加入到集群:

systemctl stop pgserver/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf node rejoin  -d 'host=node1 dbname=repmgr user=repmgr' --force-rewind/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string----+-------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------- 1  | node1 | primary | * running |          | default  | 100      | 3        | host=node1 user=repmgr dbname=repmgr connect_timeout=2 2  | node2 | standby |   running | node1    | default  | 100      | 2        | host=node2 user=repmgr dbname=repmgr connect_timeout=2

5.总结

本文中使用repmgr实现PostgreSQL主备高可用的方案的缺点就是，如果集群的网络出现脑裂(网络分区)，就会出现两个节点都认为自己是主节点的情况，如果两个节点都同时承载了业务读写，就会出现数据不一致的情况。一般造成脑裂的主要原因是主备之间网络问题导致备库认为主库故障自动切换为主库，这就造成出现了双主的情况，另一个原因是人工进行主备切换后，由将原来的主节点启动(人工启动或服务器宕机后开机启动启动)。发生脑裂后通常需要人工介入，并且介入越早越好，如果可以忽略脑裂期间分叉造成的部分数据丢失，可以通过pg_rewind将其中一个节点重新作为备节点加入集群，恢复单主的集群状态。

关于脑裂的问题，repmgr还提供了witness节点的方案，witness节点作为一个普通的PostgreSQL实例，它不是流复制集群的一部分，其目的是，如果发生故障转移时，提供证据证明是主服务器本身故障，还是网络原因造成的脑裂。关于repmgr中witness的部分，本文没有涉及。

参考

https://www.postgresql.org/docs/13/admin.html
https://repmgr.org/docs/5.3/index.html
https://repmgr.org/docs/5.3/quickstart.html