I'm also wondering why this kind of intervention is necessary at all. The NoSQL ...

mcfunley · on Oct 6, 2010

I would speculate that it's a poorly-chosen shard key. MongoDB's built-in sharding uses range-based indexing. If you choose user_id as your shard key, and those are autoincrementing integers, then you're screwed if newer users tend to be more active on average than older ones.

jrockway · on Oct 6, 2010

Wait, people shard on a key other than something approximately random, like an sha1 hash!?

mcfunley · on Oct 6, 2010

Where I work (Etsy) we keep an index server that maps each user to a shard on an individual basis. There are a number of advantages to it. For example, if one user generated a ton of activity they could in theory be moved to their own server. Approximately random works for the initial assignment.

Flickr works the same way (not by coincidence, since we have several former Flickr engineers on staff).

jrockway · on Oct 6, 2010

Sounds like a good system. I've noticed that people tend to do things like shard based on even/odd, and then they realize that they need three databases.

I've never had either problem though... but if I ever need to shard I plan on doing it based on object ID. Then one request can be handled by multiple databases, "for free", increasing both throughput and response time.

fizx · on Oct 6, 2010

Even/odd isn't the end of the world, but you would then be best jumping to mod 4.

moe · on Oct 6, 2010

Actually for anonymous sharding (without a central index) a consistent hash is about the closest you can get to ideal distribution and flexibility. I haven't looked but I presume that's what mongo uses under the hood for their auto-sharding, too.

dmytton · on Oct 5, 2010

Load based splitting is on the MongoDB roadmap, but doesn't exist yet.

fizx · on Oct 6, 2010

Just because the feature exists, doesn't mean it works.

icey · on Oct 5, 2010

Which NoSQL store do you use?

houseabsolute · on Oct 5, 2010

http://labs.google.com/papers/bigtable.html

staunch · on Oct 6, 2010

That looks potentially useful in some cases. Can you send me the code/docs for that?

ddlatham · on Oct 6, 2010

http://hbase.apache.org/

ifesdjeen · on Oct 6, 2010

Does it work that well for production for you? Really, that's extremely interesting.

rb2k_ · on Oct 6, 2010

Also look at dynamo based systems like cassandra and riak (riak seems to have a better load balancing at the moment, cassandra is a bit more "bumpy")