Pg_paxos: High Availability Data Replication for PostgreSQL

theptip · on Nov 27, 2015

I'm always skeptical when I see a new Paxos implementation surface without rigorous consistency test results published front and center.

Per Google's "Paxos Made Live" paper [1], which opens with the wonderful understatement "Despite the existing literature in the field, building such a database proved to be non-trivial...", and finishes with "Despite the large body of literature in the field, algorithms dating back more then 15 years, and experience of our team (one of us has designed a similar system before and the others have built other types of complex systems in the past), it was significantly harder to build this system then originally anticipated..."

IOW if Google writes a 15-page paper about a problem being hard, you are going to need to provide a large body of evidence that your implementation is correct before I'll go near it.

[1]: http://www.cs.cmu.edu/~15-440/READINGS/paxos-made-live.pdf

rdtsc · on Nov 27, 2015

Definitely should at least unleash Aphyr's Knossos on it and see what happens.

But on Google thing. It is a good point, however, that was also 10 years ago. Sometimes things get easier in technology in that time frame.

takeda · on Nov 27, 2015

I'm wondering if they taken a look at BookKeeper[1] for this? It's basically a system on top of ZK to handle replication log in a decentralized way.

[1] https://bookkeeper.apache.org/index.html

Edit: It would be a nice to tell me why my my post have been downvoted. BookKeeeper is not a competing solution, but something that was written for that purpose and they could utilize it.

assface · on Nov 27, 2015

Does anybody actually use any of CitusData's stuff in production? I've heard that their free extensions are mostly junk and then when they don't work they try to get you to buy their proprietary components.

ozgune · on Dec 1, 2015

(Ozgun from Citus Data)

Both our open source extensions and the (currently) proprietary CitusDB product are heavily used in production. For example, https://blog.cloudflare.com/scaling-out-postgresql-for-cloud...

We're working on improving all our products, and we welcome your feedback.

Could you elaborate more on what you'd like to see? Please also feel free to use GitHub issues on our open source projects or hit me directly at ozgun at citusdata.com.

pella · on Nov 28, 2015

just a note: the next version (CitusDB 5.0) will be Open Source: https://twitter.com/frsyuki/status/667031289506062336

video: "Keynote by Umur Cubukcu of Citus Data at PGConf Silicon Valley 2015" : https://youtu.be/_nun2S6EdWo?t=513

---

CitusDB 5.0 Pre-Beta Signup

"CitusDB 5.0 transforms PostgreSQL into a horizontally scalable, distributed database. It enables you to perform both real-time inserts/updates and multi-core, parallel analytics, in a way that is transparent to the application. It will:

- Allow you to deploy CitusDB as a standard PostgreSQL extension

- Be compatible with the latest version of PostgreSQL, v9.5

- Have an open source community edition "

http://info.citusdata.com/CitusDB-5-Pre-Beta.html

_ondq · on Nov 28, 2015

"Have an open source community edition" is not the same as "will be Open Source".

rpedela · on Nov 27, 2015

Are there plans to support DDL in 9.5 with the new event trigger functionality? [1]

1. http://www.postgresql.org/docs/devel/static/functions-event-...

Palomides · on Nov 27, 2015

takeda · on Nov 27, 2015

The META.json says that it is LGPL 3.0, perhaps lack of LICENSE file was overlooked?

mslot · on Dec 3, 2015

Added a license file (PostgreSQL license).

powerbook5300CS · on Nov 27, 2015

How is this not part of Postgres proper already?

Sanddancer · on Nov 27, 2015

Distribution is one of those issues where an implementation has to be opinionated, and weigh out the pros and cons of each implementation. How do you handle partition events? How do you handle conflicts between two nodes, especially when resolving one conflict can generate other conflicts. How do you handle latency? How do you handle failover?

Extensions like this are how Postgres has traditionally handled interesting opinionated technologies. The mailing lists will mull over the various implementations, scratch their chin a lot, and finally come to a conclusion after much discussion. It's why the database is so strong and stable, and why some seemingly obvious features take a while to get into the core database.

gtirloni · on Nov 27, 2015

There should be sensible defaults for anyone deploying a small cluster. Right now it's a full research project to figure this out for PgSQL while for MariaDB/MySQL not so much (even though it might not be 'perfect' as many commentators here expect things to be with PgSQL).

detaro · on Nov 27, 2015

Because Postgres is conservative with new features and they prefer not having a feature (and the option to include it later) to being stuck supporting a not perfect attempt.

rdtsc · on Nov 27, 2015

Why would it be, postgres is not a distributed database?

jzd · on Nov 27, 2015

It is very difficult to do distribution right, and Postgres is extremely conservative when it comes to new features.

Stability, reliability, and performance are #1

rdtsc · on Nov 27, 2015

Yap agreed. Slapping something on just to claim "distributed", and then have users lose data, would tarnish the brand and would not be a good idea. (Not that it stopped other products from going that route).

Distributed stuff done right is difficult (as seen by Aphyr's Call Me Maybe tests, how few products survived despite high claims in their marketing).

Drdrdrq · on Nov 27, 2015

Genuinly curious: which products survived? All of the ones I read about had problems.

rdtsc · on Nov 27, 2015

Basho's Riak operating in non-last-write-wins mode didn't lose any writes so that is one solid off-the-shelf product that gets distribution right. Zookeeper performed well as well.

Others that failed redeemed themselves and fixed issues after the results came. I think etcd and Consul devs did that.

sargun · on Nov 28, 2015

Riak, Postgresql, and Zookeeper are the only ones that never lost data. Consul, and etcd were mostly correct.

threeseed · on Nov 28, 2015

PostgreSQL was only tested as a single node.

So it's meaningless even bringing it up.

kev009 · on Nov 28, 2015

It sure isn't. A DB client and server are distributed, even if not "scale out". It's very easy to get fundamentals wrong if you do not design for them.

threeseed · on Nov 28, 2015

It makes no sense to compare PostgreSQL (single node) with Riak, Cassandra etc (multiple nodes). And distributed generally means distributing your data amongst multiple nodes. I don't expect to see a Call Me Maybe test for Chrome even though by your definition it is distributed.

mslot · on Dec 3, 2015

pg_paxos is brand new and while the current release is reasonably stable and tested under various failure scenarios on EC2, it's also lacking in some areas, especially performance (think 10-100 writes/s). We will continue developing it to make it faster, easier to deploy, and address many interesting applications. Eventually we might submit it to postgres, but it's easier to develop it as an extension for now.