PostgreSQL- Streaming Replication - Five Steps to Configure HA in CentOS, RHEL, Debian & Ubuntu

MSquare Sys

8 Nov 202214:04

Summary

TLDRThis video tutorial guides viewers through the process of setting up PostgreSQL streaming replication for high availability. It covers key steps, including configuring a master and standby node, creating a replication user, modifying PostgreSQL configuration files, and verifying replication status. The tutorial also explains how to perform a base backup using `pg_basebackup`, restart PostgreSQL services, and monitor replication synchronization using queries like `pg_stat_replication` and `pg_stat_wal_receiver`. The setup is demonstrated across various Linux distributions and PostgreSQL versions, offering a comprehensive overview of ensuring data redundancy and seamless failover in a PostgreSQL environment.

Takeaways

😀 PostgreSQL streaming replication allows real-time data replication from a primary (master) server to a secondary (standby) server for high availability.
😀 PostgreSQL versions 12, 13, and 14+ are supported for this replication setup.
😀 Linux-based systems, including RHEL/CentOS 7+, Debian 9+, and Ubuntu 18+, are suitable environments for PostgreSQL streaming replication.
😀 The master server should have `listen_addresses = '*'` set in `postgresql.conf` to allow connections from the standby server.
😀 A dedicated replication user (`replicator`) needs to be created on the master server to handle replication tasks.
😀 The `pg_hba.conf` file on the master server should be updated to allow the standby server's IP address to connect using the `replicator` user.
😀 Use the `pg_basebackup` tool to create a base backup of the primary database and copy it to the standby server's data directory.
😀 The standby server is marked with a `standby.signal` file and configured with the `primary_conninfo` parameter pointing to the master server.
😀 Once configured, start the PostgreSQL service on the standby server to begin streaming replication.
😀 Replication status can be monitored using the `pg_stat_replication` and `pg_stat_wal_receiver` views on the master and standby servers, respectively.
😀 Troubleshooting replication issues involves checking PostgreSQL configuration files, network connectivity, and reviewing PostgreSQL logs for errors.

Q & A

What is the purpose of PostgreSQL streaming replication?
-PostgreSQL streaming replication is used to replicate data in real-time from a master node to a standby (slave) node. This setup ensures high availability and data redundancy, allowing the standby node to take over if the master node fails.
What are the system requirements for setting up PostgreSQL streaming replication?
-PostgreSQL 12, 13, or 14+ is required. The setup should be done on Linux servers, such as RHEL/CentOS 7+, Debian 9+, or Ubuntu 18+. You will need two nodes: one master and one standby.
What is the role of the 'replicator' user in the replication setup?
-The 'replicator' user is created on the master node and is responsible for enabling replication. It has the REPLICATION role and allows the standby node to connect to the master and receive data changes.
How do you configure the master node for replication?
-On the master node, you need to modify the 'postgresql.conf' file to allow connections, create the 'replicator' user, and configure the 'pg_hba.conf' file to permit the standby node's IP address. After these changes, restart PostgreSQL.
Why is it important to configure the 'pg_hba.conf' file on the master node?
-The 'pg_hba.conf' file controls client authentication for PostgreSQL. For replication to work, the master node needs to allow connections from the standby node. The file must be updated to grant the replicator user access from the standby node's IP address.
What is the 'pg_basebackup' command used for in the replication process?
-The 'pg_basebackup' command is used to take a base backup of the master node's data and copy it to the standby node. This is the initial step in setting up replication, ensuring the standby node has an exact copy of the master's data.
What files need to be configured on the standby node to enable replication?
-On the standby node, the 'postgres.auto.conf' (or 'recovery.conf' in older versions) file must be configured to include the primary connection information, such as the master's IP address and port, as well as the replication user credentials.
How can you verify that replication is working correctly between the master and standby nodes?
-You can verify replication by checking the 'pg_stat_replication' view on the master node and the 'pg_stat_wal_receiver' view on the standby node. Both should show the same Log Sequence Number (LSN), indicating that data is being replicated in real time.
What command is used to check the replication status on the master node?
-To check the replication status on the master node, you can run the SQL query 'SELECT * FROM pg_stat_replication;'. This will show details about the replication connection, including the current Log Sequence Number (LSN).
How do you start the PostgreSQL service on the standby node after configuring it for replication?
-After configuring the standby node, use the command 'pg_ctl start -D /path/to/standby/data' to start PostgreSQL and begin replication. Ensure that the 'postgres.auto.conf' file is properly configured with the master's connection information.