Congratualations to the RethinkDB team on this! I'm sure it was no small feat to get RethinkDB running on Windows and between this and the Jepsen results, they are clearly doing very well right now.
Particularly concerning is this combination which makes me think I'll have to shard like I would with MySQL rather than relying on RethinkDB in certain situations:
I mean, don't get me wrong, its a hard problem to solve and overall I think the product will work out but it does seem to have some limits on its horizontal scaling and it seems like you are better off stopping somewhere in the 5-9 node range then scaling those vertically then sharding on clusters of nodes.
We're working hard on fixing every bug and performance issue in the system, so the explanation below isn't meant to shirk responsibility. That being said...
Every large system (including the Linux kernel, MySQL, Postgres, etc.) has a large number of bugs lurking under the hood. These bugs are often edge case scenarios that are specific to a given workload, hardware/software configuration, or some other aspects of a given user's system. I'd be careful about looking at the tracker and drawing conclusions -- in my experience bugs like these aren't representative of the overall experience of using the product. (That's not to say that RethinkDB is as stable the Linux kernel, I'm just using that to illustrate that bugs are unavoidable and don't necessarily color the experience of the average user).
Again I wanted to give a different perspective from the POV of a maintain of a large project. We're always working on fixing all important issues that prevent people from using Rethink, so I don't mean to imply in any way that we aren't responsible for these.
I don't mean to imply you weren't working hard or that these bugs were unusual for any complex project. I'm just getting the impression that the current version of RethinkDB you end up scaling like this if you were attempting to be safe:
1) Get to 5-9 Nodes
2) Vertically scale this cluster.
3) Shard via adding additional clusters [e.g. 5 Clusters of 7 Nodes] since ~30 nodes for certain workloads you run into performance issues.
Not quite true -- there are many deployments with >30 nodes in the wild. Some deployments with >30 nodes did run into issues, but I think a lot of that is extremely user-specific (and if you get to that level, RethinkDB engineers will help with all the custom issues you might run into).
I really don't like RQL. Why couldn't it just be a document flavored SQL? Having a DSL for writing in-database procedures is fine. But tying all external queries to a Builder is, from a purely aesthetic POV, pretty ugly IMO.
But that can be addressed down the line I'm sure. If nothing else by other people just building new DSLs on top of the Builder interfaces.
(Note: Yes, I realize the Builder supports map/reduce as well. In practice I'm skeptical how well that works in an on-demand fashion.)
Having only one fully-managed host available is a minor negative. Though we use Cloudant right now so that's a bit hypocritical perhaps.
Lack of integrated full-text search makes things (application-side) a lot more complex than Cloudant or a traditional RDBMS with integrated full-text search. Maybe not in a "look what I can do" prototype-ish phase, but definitely in a "what would Jepsen do to my system" sort of scenario (see the "Warning" here: https://www.rethinkdb.com/docs/elasticsearch/).
That means you're going to want to routinely compare and reindex. Which ugh.
That's just my 2c on why I'm not super excited about RethinkDB. Doesn't seem a whole lot to differentiate it from other NoSQL databases outside of being a better MongoDB. It doesn't integrate full-text-search like Cloudant. It doesn't have transactions. It's got this streaming thing. Which seems like a pretty niche feature. But maybe that's just me.
It is free though, so that's definitely something. If I were a cash-strapped startup that might be enough to move it near the top of my list. OTOH if I'm paying for a fully managed, hosted service that matters a whole lot less.
Before I reply let me clarify that I don't work there anymore and my thoughts could have diverged since leaving their corporate hive-mind collective of the sort that employees often claim not to represent.
> Why couldn't it just be a document flavored SQL? Having a DSL for writing in-database procedures is fine. But tying all external queries to a Builder is, from a purely aesthetic POV, pretty ugly IMO.
Having raw SQL in an application is insane. Being able to connect directly to a database and hand-rolling a query in SQL to look at stuff directly is not. But building such a query in Javascript like you do in the admin UI is fine too (actually, it's way more annoying than the Python or Ruby API's), with the benefit that you have only one way to learn to write queries, instead of two, and with the downside of not having infix operators and having to close many more parentheses and such. There might also be training costs if you have a semi-technical role that would have involved making SQL queries.
I don't believe in an aesthetic POV. Unless ease-of-use is the aesthetic.
> Yes, I realize the Builder supports map/reduce as well. In practice I'm skeptical how well that works in an on-demand fashion.
SQL databases already have aggregation queries, so there's nothing going on that they wouldn't already do. Map/reduce isn't, like, a big deal, other than in the mundane way that a table scan is generally a big deal.
> SQL databases already have aggregation queries, so there's nothing going on that they wouldn't already do.
That's true. I guess it depends on implementation. I was imagining something like CouchDB, which has to parse each document and run your map and reduce functions over them to build an index. Not something you'd want to do on-demand.
> Having raw SQL in an application is insane.
That's just like, your opinion man. We could argue all day on that one, but suffice it to say it's definitely one of those Novice -> Intermediate -> Expert -> Master circle-of-life things IME.
You can syntax check SQL. It's easily understood by most developers regardless of background. You probably won't have to spend more than 5 minutes getting up to speed with whatever driver you're using. There's never been an app that didn't require major refactoring after swapping out the database layer. There's just very little point in abstracting it IME. I've written a number of O/R Mappers in my time, including creating the second most popular Ruby one.
I wouldn't do it again. ;-) I'd use something like http://slick.typesafe.com/doc/3.1.1/introduction.html#plain-... write a few type-conversion protocols and call it a day. It's safe, simple, productive and can be maintained by anyone. Any downsides are pretty minor IMO.
The comment on aesthetics is just that the primary interface to the database has a "JavaScript First" feel to it. It's an odd choice (to me) to use a context bound Builder as opposed to an AST Builder you just pass to connection.execute or something.
> It's an odd choice (to me) to use a context bound Builder as opposed to an AST Builder you just pass to connection.execute or something.
But that's what RethinkDB's query builder is. You construct a free-standing AST, let's call it x, then call x.run(conn) or (IIRC) conn.run(x) to run it.
Edit: > You can syntax check ... O/R Mappers ...
It's not a dichotomy between SQL and O/R mappers here, and sorry if you weren't trying to imply that. An SQL query building API can easily be understood by developers that know SQL (correct me if I'm wrong), you don't need to syntax check it, and it survives refactorings just as well. And it's hard not to need to dynamically construct queries anyway.
> But that's what RethinkDB's query builder is. You construct a free-standing AST, let's call it x, then call x.run(conn) or (IIRC) conn.run(x) to run it.
That's not what http://rethinkdb.com/docs/quickstart/ says. It shows a connection that exposes context-bound Builder methods with trigger (insert, changes, run, etc) methods.
Though that is a little more high-level than I thought at first. You're right that not everything is a `run`.
It feels like a stretch to call it an AST but maybe that's just a question of taking a closer look at the wire protocol.
> An SQL query building API can easily be understood by developers that know SQL
It can, but it's more complex with more mental overhead. It's the difference between Active Record scopes and HQL. Where HQL is a flavor of SQL with extensions for object notation that can feel pretty similar to qualified database.table.column syntax, AR scopes are (I feel) objectively more complex.
> you don't need to syntax check it
You actually do. It's just your interpreter or compiler that does so. A compiler plugin (for native support) or embedded DSL such as LINQ or the Slick DSL can accomplish the same for SQL though.
> it's hard not to need to dynamically construct queries anyway
True, but IME a Builder based API is a poor choice for that unless the scope of the Builder usage is private to a single class. Once you start passing around a mutable builder you can easily end up with unintentional side-effects.
In SQL builders it's easy for example to override a sort clause to be incompatible with a previous aggregation if the builder doesn't go to a lot of care to make that impossible. And then you might well also end up in a situation where you need to discard previously defined scopes (such as the problematic aggregation). How do you do that?
IME once you let a builder escape a very narrow scope (class-level at the most) you create a great maintenance burden and create a new category of very difficult to avoid bugs.
Interpolation/materialization of terms into an AST is a much simpler, and encourages simpler designs (IME). IOW: Writing builders should be up to the user. If the "driver" provides it, it's too easy to fall into anti-pattern traps with unintended consequences.
That's just my opinion. But bottom line, the API is much too high-level for my taste. I'd prefer something much closer to Cloudant, with a well defined syntax (if you imagined the HTTP end-points query-parameters folded into the request body instead; query vs update APIs aren't super consistent on which goes where, eg: _deleted or _rev vs keys or include_docs).
Such a high-level interface for building queries being provided for the driver is fairly unique among the databases I'm familiar with. And while it seems like a nice affordance for JavaScript users, it feels like an anti-pattern for other platforms.
I don't mean to trash RethinkDB though. On the other end of the spectrum you have DynamoDB's very cumbersome API. If I had to pick between the two I'd take RethinkDB's any day. I just think a more formal syntax for other platforms would make it feel less JavaScript centric and more at home on other platforms.
> That's not what http://rethinkdb.com/docs/quickstart/ says. It shows a connection that exposes context-bound Builder methods with trigger (insert, changes, run, etc) methods.
I think the confusion stems from the fact that the Quickstart guide assumes that you're running queries in the Data Explorer, a web frontend for prototyping queries. In the Data Explorer, clicking the "Run" button is what triggers the execution of the AST.
If you wrote something like `r.table("tv_shows").insert(...)` in your application code, it wouldn't do anything except for returning an AST object. You can store that object, or call the `run(conn)` method on it to send it over a RethinkDB connection and execute it.
Note that the `r` object in these queries has no state. You can think of it as a namespace that serves as a starting point for building queries.
> That's not what http://rethinkdb.com/docs/quickstart/ says. It shows a connection that exposes context-bound Builder methods with trigger (insert, changes, run, etc) methods.
There isn't any sort of builder or any sort of context that anything is bound to. insert and changes aren't anything like run -- they build an AST that describes an insert query, and then you have to run them with run.
I prefer simple rowmapping to ORM making your queries for you any day. On previous java project, all the SQL was actually in XML files that were loaded by the repositories by which they were used. Debugging that was much easier (especially when it came to performance issues) than having an ORM because there was no ORM to reason (and possibly be wrong) about.
As far as cutting down on repetition with queries, I found that using some higher order functions (essentially templating, at that point) would clear that right up.
The DSL problem is probably relatively straightforwardly fixed (just write a DSL that translates to builder calls), but I think that RQL sprung up from a disdain of DSLs that were built by mongo (and probably in some part SQL), that often become overly complicated, hard to use, and sometimes use obtuse/hard-to-remember terminology.
The other missing-add-on points are valid of course, but they're just about as equally valid for any document store you can find. I don't speak for RethinkDB, but as far as I can see they do not claim to offer those things, so I don't know if it should be held against them that they don't. Of course, it makes sense that you're not super excited for it because of those missing features you want.
However, I think you'd be hardpressed to find another document store that has gotten the rest of the things it does as right as RDB has gotten them. RDB has iterated their way to a really high quality document store, yet (I find) does not get the press to match how well it performs (and how many big pitfalls it avoided).
That sounds pretty fair. I think the RQL Builder issue is a bit bigger than that since it would make writing your own DSLs a bit more awkward, with more overhead. I think a simple published protocol would be the better alternative. But even so, it's not insurmountable and there are databases with much much more awkward APIs. RethinkDB's isn't all that compelling IMO, but at least it's not actively hostile.
Cloudant gets most of the rest correct. Their (dedicated) clustering has been largely flawless IME. Their availability probably the best of any system I've used (database or otherwise). Very very impressive. And their integrated Lucene takes a whole category of issues out of the equation.
They don't have transactions either though. Which is a bigger issue than in RethinkDB since they also don't have joins. If you need to dematerialize a navigable tree into your documents that's easy enough. But what happens when the source-of-truth for your tree (another document in my case) changes? You have to reprocess the entire live database. The fact that you can't do that atomically mean you either: A: Hope for the best. Or B: Do a lot of extra work implementing processing that will (hopefully) converge on eventual consistency.
I feel like RethinkDB is moving up my list during this thread though. So there's that. :)
The lack of integrated full-text-search that provides decent guarantees is a real sticking point, but outside of Cloudant I can't think of another NoSQL solution that attempts it.
I'll say this though: If RethinkDB resolved the search issue, it would probably be THE DB to use IMO. RQL (despite the space I've given it) is pretty petty in comparison. And lack of multi-document atomic transactions are much less of an issue with the schema flexibility afforded by JOIN support.
Even without resolving it I think I'm convinced enough to give it a closer look.
> However, I think you'd be hardpressed to find another document store that [...]
CouchDB. Older by 6 yrs, so probably can't compete with RethinkDB on wow-yet-another-new-fresh-backed-tool-with-fancy-webpage-and-admin-ui scale ;)
In everthing else couchdb has more features than rethinkDB. And is much more mature, but somehow this word is understood backwards these days. Technology ageism I suppose ;)
CouchDB doesn't do joins, which can impose serious schema design changes depending on your application. map/reduce is done at index time. You wouldn't ever want to do it on the fly.
The base CouchDB also wasn't clustered last I looked (a year ago maybe?); you had to use BigCouch, which was an OSS contribution by Cloudant that lagged both Cloudant's own offering and CouchDB releases. It was supposed to get merged into head though so there's a good chance that actually happened while I wasn't looking.
If you're just looking at CouchDB though and not Cloudant, it's not all that compelling (IMO). No clustering. No integrated search. No transactions. No joins. Pretty shallow server metrics in the dashboard. CouchDB doesn't even really have queries AFAIK. Just index scans.