*"If no one has the leader key it runs health checks and takes over as leader."*...

winsletts · on July 17, 2015

The code relies on functionality in etcd to prevent a race condition. Using `prevExist=false` on acquiring the leader key, the set will fail if another node wins the race.

The functionality in the code is here: https://github.com/compose/governor/blob/master/helpers/etcd...

The documentation for etcd is here: https://coreos.com/etcd/docs/latest/api.html#atomic-compare-...

Someone · on July 17, 2015

But then, isn't it not

"If no one has the leader key it runs health checks and takes over as leader."

but

"If no one has the leader key it takes over as leader, runs health checks, and starts functioning as leader."

? If so, I would do the health checks and then try to become the leader. Or do the 'health checks' involve other nodes?

merb · on July 17, 2015

It simply relies on the Voting feature of ETCD (Raft) it's really simple to use locking with etcd, and etcd is really really stable. However it would be easier to install etcd on every Postgres node and just make a golang library that sets the master of Postgres to the etcd master (etcd also has a leader). Also systemd would keep the overall system healthy. (that's what we at envisia do) Just have repeatedly check if the machine is the leader and if yes it sets the url of the currently running machine to a etcd key. So overall we need to use 3 Postgres machines and 1 could fail and we would still have voting, however thats just for a single master where we don't need to read from the slaves, however thats easily extendable.

Oh and here is the Compare and Swap (Atomic) functionality of etcd that he described: https://github.com/coreos/etcd/blob/master/Documentation/api...

winsletts · on July 17, 2015

The problem with etcd members on every Postgres node is that clusters fixed nodes or members. etcd doesn't function well in an environment where you could tear down / build up new nodes. Most of our Postgres service runs on AWS, and thus we must expect that any single node may vanish, and our system must replace that node. We tried running etcd alongside Postgres in an early prototype, but ran into issues with etcd cluster stability when destroying and recreating nodes. Thus, we opt for a stand alone etcd cluster distinct from the Postgres cluster.

robszumski · on July 17, 2015

You can set up a local etcd proxy to mitigate this. You'd run the proxy listening on localhost, and then have it connected to the stable etcd cluster elsewhere.

The proxy can find the cluster manually or use SRV records. Autoscale the Postgres machines as much as you want after that while leaving etcd on stable machines.

merb · on July 18, 2015

That's what we basically trying to do in the future, however that's really hard to do if you want to have a running etcd cluster with 5 nodes all the time. You would need to check if one etcd died, and then either promote a proxy to a etcd master or run a new machine (the later is only possible in clouds or virtual environments)

SEJeff · on July 26, 2015

You can do that trivially with Mesos and have it always ensure 5 instances are running. Bonus points that it will run identically on bare metal and cross cloud which means less vendor lock in for you.