While I have high respect for the exposition in this article, and by extension Mr. Vogels work that the author makes a reference to, I think it's a tad too early declare eventual consistency a clear winner. In fact, I think it's too early to declare that a there is a contest in the first place. While there seems to be some level of agreement that partition tolerance is required for real world systems consistency vs. availability is more of a continuum instead of two discrete choices. I find that in day to day work some data requires a paxos level of consistency (say a configuration file containing your inter-cluster topology) and some data can be highly inconsistent (like activity data such as 'read' status of emails).
As a side-note, I would argue that all eventual consistent systems rely on some domain knowledge about the data that makes it possible for the system to reliably merge conflicting data. Such algorithms usually have the property that they only move state in one direction, emails only move from unread to read for example.
What I'm trying hard to figure out is how Consistent, Available and Partition Tolerance map onto Intelligence, Emotional Stability and Good Looks. Do we always require Good Looks and thus end up choosing between Intelligence and Emotional Stability or ...
I happen to agree, funnily enough. I only talked about the extreme ends of the spectrum. I suspect that C/A systems have a natural place, say, up to datacenter-sized systems. After that they founder on latency and partitions.
The big thing about Dynamo in particular isn't eventual consistency but configurable consistency. You can specify how much consistency you want and how long you will wait for it.
Great footnote. I'm undecided if I should tackle Clojure or Erlang next. Each one is excellent and interesting in its own way.
> A lot of writeups repeat a "nine nines", ie 99.9999999% reliability claim for Erlang-based Ericsson telephone switches owned by British Telecoms. This works out to 31 milliseconds of downtime per year, which hovers near the edge of measurability, not to say plausibility. I was present at a talk Armstrong gave in early 2010 during which he was asked about this. There was a little foot shuffling as he qualified it: it was actually 6 or so seconds of downtime in one device during a code update. Since BT had X devices over Y years, they calculated it as 31ms of average downtime per device per year. Or something like that. Either way it's an impressive feat.
Having done a lot of Erlang I am toying with Clojure at the moment. So far it is looking like a tool I will pull out for "big data" tasks, but I am having a real hard time resisting the urge to drop back into Erlang for big concurrency tasks. Right now my take on the two languages is that Clojure's sweet spot is as a more elegant way to take advantage of Java's SMP advantages but Erlang still wins for systems distributed across a large number of nodes (in this case it is OTP providing the win and not anything specific in Erlang that Clojure lacks, but this framework is still the best and most battle-tested thing out there for many-node systems...)
Hey -- very good questions. This article was a broad overview, and I wanted to get across the fundamental "physics" of concurrency more than specific solutions. Where I talked about solutions it was to illustrate how those principles operate.
I'm also not actually that knowledgeable about concurrency so I didn't want to say anything stupid. :D I'm more journalist than expert.
I've gotten many responses like yours, though. This subject probably deserves another round of research and a follow-on article or two.
> A Consistent/Available system means that reading and writing always works the way you expect, but if one node
fails the whole system stops working.
I've seen this mistaken statement several times now and it really bugs me. A consistent/available system can't tolerate partitions, i.e. a split in the network. However, it can certainly tolerate a single failure and remain consistent and available. That's the whole point. A trivial system with these properties is one that replicates every update to all available nodes and requires reads to get matching results from at least half of the nodes. Paxos is another example.
You've got a good point, and I'm not satisfied with that section. Can you think of a real-world example of a network partition? Maybe a parade that bisects the city?
"Think of a parliment that must have more than half of members present in order to hold a vote. If too many can't make it, say because a flood washes out the bridge, a quorum can't be formed and business can't proceed."
> A trivial system with these properties is one that replicates every update to all available nodes and requires reads to get matching results from at least half of the nodes.
Nope. You haven't thought this through.
One problem is that "all available nodes" is one of those fuzzy concepts that sounds reasonable but is actually impossible to actually do.
Say you send a write to 3 replicas, all of which you think are up. Only one acks the write within 10s. Did the other nodes fail before getting the write? You have no way to know. In the meantime, what do you do with the node that did ack your write? If you tell it to rollback, what happens if the other, "failed" nodes failed after performing the write, but before the ack? Or are they about to ack, if only you wait another 1s?
That's not really fair. Paxos provably addresses all of the missing ack problems you suggest and makes progress as long as a quorum of nodes can communicate with each other. That's why I mentioned it.
I'm also aware of the problems with two phase commit, namely the single point of failure. Funny enough, Paxos Commit is the solution to that problem.
So really, Paxos is a magic wand when it comes to available and consistent distributed systems. You shouldn't use it for every single update in your distributed database; it's way too much overhead. But somewhere in the system there's probably some algorithm that looks like Paxos handling some important piece of metadata or the system isn't truly fault-tolerant.
The link you gave is consistent with what I've said. Namely, the concept of quorum is a way to sacrifice availability for the minority of nodes in the presence of a partition but otherwise remain consistent. Paxos does precisely that. It's consistent and available in the absence of partitions, and it sacrifices availability for some in the presence of a partition.
By the way, the other possibility in the design space is to remain available but sacrifice consistency in the face of a partition. This is the eventual consistency camp.
Thanks for a nice overview. I have by and large managed to bypass concurrency issues so far, but these days an i7 CPU runs 8 threads and a run-of-the-mill dual socket Xeon could have 24 threads, so I suppose I'll just have to bite the bullet in order to get anywhere near the performance even a single box can supply.
Where would goroutines, from Google's Go (http://golang.org/ ), fit in here?
My understanding is that goroutines are pretty much the same as Erlang message passing, except that Erlang messages can work across multiple computers, while goroutines are restricted to one computer (same memory space). Is this correct?
Will this change with quantum computing and entanglement?
You can't avoid it by using electrons instead of street urchins. It's impossible for an event happening in one place (eg data changing inside one computer or process) to affect any other place (eg other computers or processes) until the information has had time to travel between them.
My understanding is that 'entanglement' is like having two urchins each draw a ball from a bag with a black/white pair. One urchin goes East, the other goes West.
Assuming they arrive at A and B, respectively, this may be useful somehow. But it cannot really communicate new information from A to B.
(Unless the balls are carried unobserved, and one of the recipients - by staring real hard ? - can somehow choose the color of the ball he is unwrapping.)
Your analogy is correct inasmuch as the urchins cannot communicate for the same reason the Alice and Bob, each with one particle of an entangle pair, cannot communicate. But it can be shown by very clever experiments that the future statistical observations of Alice and Bob cannot be explained by assuming each had a particle with properties fixed when the two parted ways. Entanglement is very subtle.
While I have high respect for the exposition in this article, and by extension Mr. Vogels work that the author makes a reference to, I think it's a tad too early declare eventual consistency a clear winner. In fact, I think it's too early to declare that a there is a contest in the first place. While there seems to be some level of agreement that partition tolerance is required for real world systems consistency vs. availability is more of a continuum instead of two discrete choices. I find that in day to day work some data requires a paxos level of consistency (say a configuration file containing your inter-cluster topology) and some data can be highly inconsistent (like activity data such as 'read' status of emails).
As a side-note, I would argue that all eventual consistent systems rely on some domain knowledge about the data that makes it possible for the system to reliably merge conflicting data. Such algorithms usually have the property that they only move state in one direction, emails only move from unread to read for example.
What I'm trying hard to figure out is how Consistent, Available and Partition Tolerance map onto Intelligence, Emotional Stability and Good Looks. Do we always require Good Looks and thus end up choosing between Intelligence and Emotional Stability or ...