(Draft)[]
Overview of Datanode HA configuration[]
Because table rows can be distributed among multiple datanodes, they need to be configured with their backups. In this page, we use synchronous replication of PostgreSQL 9.1. Asynchronous replication has a chance to loose updates, which may lead to data inconsistency among the datanodes.
Now, we start wil the situation where datanodes are running on node01 and node02. We configure each slave on node02 and node01 respectively and then kill one of the datanode and failover to its slave.
Configuration Outline[]
We configure datanode slaves as follows:
Slave of the Datanode on node01 (datanode1)[]
- Slave runs on node02.
- Slave's working directory is /home/postgresxc/pgxc/datanode1_slave on node02.
- We need WAL archive transfered to node02's /home/postgresxc/pgxc/datanode1_arclog
- Slave will run in hot-standby mode.
- hot-standby will use the port 20010.
Slave of the Datanode on node02 (datanode2)[]
- Slave runs on node01.
- Slave's working directory is /home/postgresxc/pgxc/datanode2_slave on node01.
- We need WAL archive transfered to node01's /home/postgresxc/pgxc/datanode2_arclog
- Slave will run in hot-standby mode.
- hot-standby will use the port 20010.
Configure datanode1's master[]
Obtain base backup of the datanode1[]
Configure datanode1 master for hot standby[]
[main]$ ssh node01 [node01]$ cat >> /home/postgresxc/pgxc/datanode/postgresql.conf << EOF wal_level = hot_standby archive_mode = on archive_command = 'rsync %p node02:/home/postgresxc/pgxc/datanode1_arclog/%f' max_wal_senders = 10 EOF [node01]$
Note that max_wal_senders parameter value is just an example.
Configure pg_hba.conf of the master'[]
[main]$ ssh node01 [node01]$ cat >> /home/postgresxc/pgxc/datanode/pg_hba.conf << EOF host replication postgresxc 192.168.1.0/24 trust EOF [node01]$ exit [main]$
Please use approripate value for 192.168.1.1/24 to reflect your network configuration.
Then load this change.
[main]$ ssh node01 [node01]$ pg_ctl reload -D /home/postgresxc/pgxc/datanode [node01]$ exit [main]
Configure datanode1's slave[]
Obtain base backup of datanode1[]
[main]$ ssh node02 [node02]$ pg_basebackup -p 20006 -h node01 -D /home/postgresxc/pgxc/datanode1_slave [node02]$ exit [main]$
Run datanode1 standby server in asynchronous replication mode[]
Configure recovery.conf file for datanode1 standby[]
[main]$ ssh node02 [node02]$ cat >> /home/postgresxc/pgxc/datanode1_slave/recovery.conf << EOF standby_mode = on primary_conninfo = 'host = node01 port = 20006 user = postgresxc application_name = datanode1_slave' restore_command = 'cp /home/postgresxc/pgxc/datanode1_arclog/%f %p' archive_cleanup_command = 'pg_archivecleanup /home/postgresxc/pgxc/datanode1_arclog %r' EOF [node02]$ exit [main]$
Note that pg_archivecleanup is contrib module and you may have to build it separately.
Configure datanode1 standby[]
[main]$ ssh node02 [node02]$ cat >> /home/postgresxc/pgxc/datanode1_slave/postgresql.conf hot_standby = on port = 20010 EOF [node02]$ exit [main]$
Please note that /home/postgresxc/pgxc/datanode1_slave access privilege is 0700.
Start datanode1 standby[]
[main]$ ssh node02 [node02]$ pg_ctl start -Z datanode -D /home/postgresxc/pgxc/datanode1_slave [node02]$ exit [main]$
Change datanode1 slave to synchronous mode[]
[main]$ ssh node01 [node01]$ cat >> /home/postgresxc/pgxc/datanoded1/postgresql.conf << EOF synchronous_commit = on synchronous_standby_names = 'datanode1_slave' EOF [node01]$ pg_ctl reload -Z datanode -D /home/postgresxc/pgxc/datanode [node01]$ exit [main]$
How to stop datanode1[]
In synchronous replication mode, master will be blocked if the slave does not respond. So you should stop master first and then the slave.
[main]$ ssh node02 [node02]$ pg_ctl stop -Z datanode -D /home/postgresxc/pgxc/datanode1_slave [node02]$ exit [main]$ ssh node01 [node01]$ pg_ctl stop -Z datanode -D /home/postgresxc/pgxc/datanode [node02]$ exit [main]$
Or you can change replication to asynchronous mode and stop slave and master.
[main]$ ssh node01 [node01]$ cat >> /home/postgresxc/pgxc/datanode/postgresql.conf << EOF synchronous_standby_names = EOF [node01]$ pg_ctl reload -Z datanode -D /home/postgresxc/pgxc/datanode [node01]$ exit [main]$ ssh node02 [node02]$ pg_ctl stop -Z datanode -D /home/postgresxc/pgxc/datanode1_slave [node02]$ exit [main]$ ssh node01 [node01]$ pg_ctl stop -Z datanode -D /home/postgresxc/pgxc/datanode [node01]$ exit [main]
You may want to monitor the slave. If it crashes, then you need to change the master's replication to asynchronous mode to continue Postgres-XC operation.
Configure datanode2's slave[]
This is opposite to datanode1 slave configuration. This will be left to your exercise.
When Datanode1 Master Fails[]
When datanode1 master fails, you can promote datanode1 slave to the new master as follows:
[main]$ ssh node02 [node02]$ pg_ctl promote -Z datanode -D /home/postgresxc/pgxc/datanode1_slave [node02]$ exit [main]$
Please note that coord1 and coord2 don't know datanode1 now failed over. You should notify coord1 and coord2 as follows:
[main]$ psql -p 20004 -h node01 # ALTER NODE datanode1 WITH (host = 'node02', port = 20010); # select pgxc_pool_reload(); # \q [main]$ psql -p 20004 -h node02 # ALTER NODE datanode1 WITH (host = 'node02', port = 20010); # select pgxc_pool_reload(); # \q [main]$