"If no one has the leader key it runs health checks and takes over as leader."
I'm no expert at all on this stuff, but I do smell either a race condition (if other nodes comes alive and 'goes to see who owns the leader key in etcd' before the node 'takes over as leader') or a longer-than-needed time without a leader (where the new node knows it wants to become the leader, but is running health checks)
The code relies on functionality in etcd to prevent a race condition. Using `prevExist=false` on acquiring the leader key, the set will fail if another node wins the race.
It simply relies on the Voting feature of ETCD (Raft) it's really simple to use locking with etcd, and etcd is really really stable.
However it would be easier to install etcd on every Postgres node and just make a golang library that sets the master of Postgres to the etcd master (etcd also has a leader). Also systemd would keep the overall system healthy. (that's what we at envisia do) Just have repeatedly check if the machine is the leader and if yes it sets the url of the currently running machine to a etcd key. So overall we need to use 3 Postgres machines and 1 could fail and we would still have voting, however thats just for a single master where we don't need to read from the slaves, however thats easily extendable.
The problem with etcd members on every Postgres node is that clusters fixed nodes or members. etcd doesn't function well in an environment where you could tear down / build up new nodes. Most of our Postgres service runs on AWS, and thus we must expect that any single node may vanish, and our system must replace that node. We tried running etcd alongside Postgres in an early prototype, but ran into issues with etcd cluster stability when destroying and recreating nodes. Thus, we opt for a stand alone etcd cluster distinct from the Postgres cluster.
You can set up a local etcd proxy to mitigate this. You'd run the proxy listening on localhost, and then have it connected to the stable etcd cluster elsewhere.
The proxy can find the cluster manually or use SRV records. Autoscale the Postgres machines as much as you want after that while leaving etcd on stable machines.
That's what we basically trying to do in the future, however that's really hard to do if you want to have a running etcd cluster with 5 nodes all the time. You would need to check if one etcd died, and then either promote a proxy to a etcd master or run a new machine (the later is only possible in clouds or virtual environments)
You can do that trivially with Mesos and have it always ensure 5 instances are running. Bonus points that it will run identically on bare metal and cross cloud which means less vendor lock in for you.
I'm no expert at all on this stuff, but I do smell either a race condition (if other nodes comes alive and 'goes to see who owns the leader key in etcd' before the node 'takes over as leader') or a longer-than-needed time without a leader (where the new node knows it wants to become the leader, but is running health checks)