A Dismal Guide to Concurrency

kaib · on April 13, 2010

Nice article.

While I have high respect for the exposition in this article, and by extension Mr. Vogels work that the author makes a reference to, I think it's a tad too early declare eventual consistency a clear winner. In fact, I think it's too early to declare that a there is a contest in the first place. While there seems to be some level of agreement that partition tolerance is required for real world systems consistency vs. availability is more of a continuum instead of two discrete choices. I find that in day to day work some data requires a paxos level of consistency (say a configuration file containing your inter-cluster topology) and some data can be highly inconsistent (like activity data such as 'read' status of emails).

As a side-note, I would argue that all eventual consistent systems rely on some domain knowledge about the data that makes it possible for the system to reliably merge conflicting data. Such algorithms usually have the property that they only move state in one direction, emails only move from unread to read for example.

What I'm trying hard to figure out is how Consistent, Available and Partition Tolerance map onto Intelligence, Emotional Stability and Good Looks. Do we always require Good Looks and thus end up choosing between Intelligence and Emotional Stability or ...

aristus · on April 14, 2010

I happen to agree, funnily enough. I only talked about the extreme ends of the spectrum. I suspect that C/A systems have a natural place, say, up to datacenter-sized systems. After that they founder on latency and partitions.

The big thing about Dynamo in particular isn't eventual consistency but configurable consistency. You can specify how much consistency you want and how long you will wait for it.

sjs · on April 13, 2010

Great footnote. I'm undecided if I should tackle Clojure or Erlang next. Each one is excellent and interesting in its own way.

> A lot of writeups repeat a "nine nines", ie 99.9999999% reliability claim for Erlang-based Ericsson telephone switches owned by British Telecoms. This works out to 31 milliseconds of downtime per year, which hovers near the edge of measurability, not to say plausibility. I was present at a talk Armstrong gave in early 2010 during which he was asked about this. There was a little foot shuffling as he qualified it: it was actually 6 or so seconds of downtime in one device during a code update. Since BT had X devices over Y years, they calculated it as 31ms of average downtime per device per year. Or something like that. Either way it's an impressive feat.

evgen · on April 13, 2010

Having done a lot of Erlang I am toying with Clojure at the moment. So far it is looking like a tool I will pull out for "big data" tasks, but I am having a real hard time resisting the urge to drop back into Erlang for big concurrency tasks. Right now my take on the two languages is that Clojure's sweet spot is as a more elegant way to take advantage of Java's SMP advantages but Erlang still wins for systems distributed across a large number of nodes (in this case it is OTP providing the win and not anything specific in Erlang that Clojure lacks, but this framework is still the best and most battle-tested thing out there for many-node systems...)

gtani · on April 13, 2010

what's up with

- erlang: R14 and somehting like Memory management threads

http://groups.google.com/group/ceug/browse_frm/thread/28375b...

- scala: lots: Akka, scalaz, lift threads, STM, Bagwell's data structures

- clojure: special purpose mutability: transients will be old hat, Hickey's working on "cells", (lately he's been coding reify/deftype/defrecord)

- F#: lots, I haven't been tracking, and i just saw the cheapest Vis studio 2010 is $800(?!)

- haskell (GHC) CHP's, STM, lots other stuff

(Suppose i oughter blog about)

gtani · on April 13, 2010

OK, let me try to be more helpful. Here's good backgrounders

http://www.tbray.org/ongoing/What/Technology/Concurrency/

http://www.sauria.com/blog/2009/10/05/the-cambrian-period-of...

http://groups.google.com/group/clojure/browse_thread/thread/...

http://blogs.azulsystems.com/cliff/2008/09/jvm-language-su.h...

http://wiki.jvmlangsummit.com/pdf/36_Click_fastbcs.pdf

---------------------

also recommended: Herlihy/Shavit book, blogs by Sutter and Duffy,

http://www.amazon.com/Art-Multiprocessor-Programming-Maurice...

http://herbsutter.com/

http://www.bluebytesoftware.com/blog/Default.aspx

aristus · on April 17, 2010

Hey -- very good questions. This article was a broad overview, and I wanted to get across the fundamental "physics" of concurrency more than specific solutions. Where I talked about solutions it was to illustrate how those principles operate.

I'm also not actually that knowledgeable about concurrency so I didn't want to say anything stupid. :D I'm more journalist than expert.

I've gotten many responses like yours, though. This subject probably deserves another round of research and a follow-on article or two.

Thanks!

dadkins · on April 13, 2010

> A Consistent/Available system means that reading and writing always works the way you expect, but if one node fails the whole system stops working.

I've seen this mistaken statement several times now and it really bugs me. A consistent/available system can't tolerate partitions, i.e. a split in the network. However, it can certainly tolerate a single failure and remain consistent and available. That's the whole point. A trivial system with these properties is one that replicates every update to all available nodes and requires reads to get matching results from at least half of the nodes. Paxos is another example.

aristus · on April 13, 2010

You've got a good point, and I'm not satisfied with that section. Can you think of a real-world example of a network partition? Maybe a parade that bisects the city?

dadkins · on April 13, 2010

A washed out bridge.

aristus · on April 13, 2010

I like it. We have a winner. Thank you!

"Think of a parliment that must have more than half of members present in order to hold a vote. If too many can't make it, say because a flood washes out the bridge, a quorum can't be formed and business can't proceed."

vorador · on April 13, 2010

Maybe an irc netsplit ?

Nwallins · on April 13, 2010

I think by real world, he meant meatspace.

jbellis · on April 13, 2010

> A trivial system with these properties is one that replicates every update to all available nodes and requires reads to get matching results from at least half of the nodes.

Nope. You haven't thought this through.

One problem is that "all available nodes" is one of those fuzzy concepts that sounds reasonable but is actually impossible to actually do.

Say you send a write to 3 replicas, all of which you think are up. Only one acks the write within 10s. Did the other nodes fail before getting the write? You have no way to know. In the meantime, what do you do with the node that did ack your write? If you tell it to rollback, what happens if the other, "failed" nodes failed after performing the write, but before the ack? Or are they about to ack, if only you wait another 1s?

(2PC can help but there are failure scenarios it can't handle, either. http://en.wikipedia.org/wiki/Two-phase_commit_protocol#Disad...)

Another way to think of the P in CAP is "Proceeds without potentially unbounded waits."

> Paxos is another example.

Paxos isn't a magic wand, either.

You should read http://pl.atyp.us/wordpress/?p=2521.

dadkins · on April 13, 2010

> Nope. You haven't thought this through.

That's not really fair. Paxos provably addresses all of the missing ack problems you suggest and makes progress as long as a quorum of nodes can communicate with each other. That's why I mentioned it.

I'm also aware of the problems with two phase commit, namely the single point of failure. Funny enough, Paxos Commit is the solution to that problem.

So really, Paxos is a magic wand when it comes to available and consistent distributed systems. You shouldn't use it for every single update in your distributed database; it's way too much overhead. But somewhere in the system there's probably some algorithm that looks like Paxos handling some important piece of metadata or the system isn't truly fault-tolerant.

The link you gave is consistent with what I've said. Namely, the concept of quorum is a way to sacrifice availability for the minority of nodes in the presence of a partition but otherwise remain consistent. Paxos does precisely that. It's consistent and available in the absence of partitions, and it sacrifices availability for some in the presence of a partition.

By the way, the other possibility in the design space is to remain available but sacrifice consistency in the face of a partition. This is the eventual consistency camp.

iheartmemcache · on April 13, 2010

Relevant complementary reading (Widefinder implemented in Clojure): http://meshy.org/2009/12/13/widefinder-2-with-clojure.html

daniel02216 · on April 13, 2010

This author is really good at analogies.

sharpn · on April 13, 2010

Yes, my favourite:

"In essence, transactional memory allows threads to ask for forgiveness instead of permission."

wheaties · on April 13, 2010

Good article. I'm still waiting to hear back so I can have y shot. I wanted to give STM a try too.

j-g-faustus · on April 13, 2010

Thanks for a nice overview. I have by and large managed to bypass concurrency issues so far, but these days an i7 CPU runs 8 threads and a run-of-the-mill dual socket Xeon could have 24 threads, so I suppose I'll just have to bite the bullet in order to get anywhere near the performance even a single box can supply.

Where would goroutines, from Google's Go (http://golang.org/ ), fit in here?

My understanding is that goroutines are pretty much the same as Erlang message passing, except that Erlang messages can work across multiple computers, while goroutines are restricted to one computer (same memory space). Is this correct?

aristus · on April 13, 2010

I don't know much about Google Go (or coroutines, for that matter), but from the description it sounds like you are correct.

jfmiller28 · on April 13, 2010

Its not only languages, but data structures that need to change. C.f. this video (http://vimeo.com/6624203) from ICFP '09

Oranj · on April 13, 2010

Can someone mirror it outside Facebook?

aristus · on April 13, 2010

http://carlos.bueno.org/2010/04/dismal-guide-to-concurrency....

The original link is viewable without needing to be logged-in to FB, as far as I can tell.

DTrejo · on April 13, 2010

Will this change with quantum computing and entanglement?

You can't avoid it by using electrons instead of street urchins. It's impossible for an event happening in one place (eg data changing inside one computer or process) to affect any other place (eg other computers or processes) until the information has had time to travel between them.

jessriedel · on April 13, 2010

No. Causality can only propagate at the speed of light, with or without quantum mechanics.

ableal · on April 13, 2010

My understanding is that 'entanglement' is like having two urchins each draw a ball from a bag with a black/white pair. One urchin goes East, the other goes West.

Assuming they arrive at A and B, respectively, this may be useful somehow. But it cannot really communicate new information from A to B.

(Unless the balls are carried unobserved, and one of the recipients - by staring real hard ? - can somehow choose the color of the ball he is unwrapping.)

jessriedel · on April 16, 2010

Your analogy is correct inasmuch as the urchins cannot communicate for the same reason the Alice and Bob, each with one particle of an entangle pair, cannot communicate. But it can be shown by very clever experiments that the future statistical observations of Alice and Bob cannot be explained by assuming each had a particle with properties fixed when the two parted ways. Entanglement is very subtle.

dalton · on April 13, 2010

tl;dr

gphil · on April 13, 2010

you missed out, this was worth reading

eru · on April 13, 2010

And while it's ok not to read stuff, he shouldn't add noise to the discussion. We don't care.

runT1ME · on April 13, 2010

Does...this article have e thesis? Or is this just sort of an overview of stuff...