I'm asking this in good faith. What are the best use cases nowadays for Mongo? I...

fyrerise · on Sept 22, 2017

This is probably not the answer you're looking for, but MongoDB no longer has much technical justification.

Compared to Mysql and Postgres, it does have an out-of-the-box partitioning scheme, which is something, and a reason to use it (although I'm not sure it outweighs the downsides).

Compared to newer cloud databases like Spanner, Citus, or Aurora which scale well but also provide your with ACID guarantees reminiscent of traditional RDMSes ... there's really not much there.

meritt · on Sept 22, 2017

> Compared to Mysql and Postgres, it does have an out-of-the-box partitioning scheme,

Both MySQL and PostgreSQL support out-of-the-box partitioning. Are you referring to sharding?

fyrerise · on Sept 22, 2017

> Both MySQL and PostgreSQL support out-of-the-box partitioning. Are you referring to sharding?

Hopefully it's fairly obvious from the context, but yes, I'm referring to horizontal partitioning [1], or sharding.

---

[1] https://en.wikipedia.org/wiki/Partition_(database)

meritt · on Sept 22, 2017

I didn't feel it was obvious.

The top post on HN right now is about PostgreSQL 10 features, of which the very first feature listed is "Native Partitioning" (and it's not about sharding). Relational databases have been using the term "partitioning" for a very long time so if you're going to compare features across comparable systems, you need to use the correct terminology.

https://dev.mysql.com/doc/refman/5.7/en/partitioning.html

https://www.postgresql.org/docs/9.1/static/ddl-partitioning....

Meanwhile the feature you're referring to is sharding and each of these databases refer to it accordingly:

https://docs.mongodb.com/manual/sharding/

https://wiki.postgresql.org/wiki/Built-in_Sharding (via forks)

https://www.mysql.com/products/cluster/scalability.html (via a separate product)

Let's also keep in mind we're discussing MongoDB which has a very rich history of using misleading benchmarks to showcase their product (e.g. benchmarking the number of requests/sec the server can select() versus the fsync() rate of a RDBMS), so I think it's useful to be extremely clear what we're talking about.

mrgordon · on Sept 22, 2017

Yeah a colleague referred to MongoDB as the best marketed database and he was 100% correct. They’ll make some misleading benchmarks, throw a few conferences, and voila you have something to replace your ol’ RDBMS

My favorite MongoDB moment was when Stripe wrote MoSQL so they could sync Mongo into PostgreSQL so data scientists can do joins and otherwise interpret the data. Now you have a “SQL replacement” and SQL instead of just using SQL!

I can see it for truly self contained records at massive scale but IMO it falls flat as a RDBMS replacement which is how most people probably use it still.

fyrerise · on Sept 22, 2017

Fair enough. When I first read your original comment, I thought that you were being unnecessarily pedantic.

I imagined that by using "partitioning" most people would probably know what I was talking about, but I've been working with MongoDB for a long time, so I probably have some cognitive bias here.

"Sharding" is the more specific/better term, but I'd still say that although more general and possibly ambiguous, partitioning is still roughly correct.

lackbeard · on Sept 22, 2017

Yes, he means what you are calling sharding.

Back in 2004 or 2005 when I first learned about "sharding" it was introduced to me as partitioning.

vacri · on Sept 22, 2017

> Compared to newer cloud databases like Spanner, Citus, or Aurora

There's a lot more help available online for Mongo than any of those. Of course, *SQL has a lot as well.

bdcravens · on Sept 22, 2017

Citus and Aurora ARE *SQL (Postgresql and MySQL)

jakereps · on Sept 22, 2017

We use it for our database engine, as the nature of our data does not work in a SQL table. I work with DNA sequencing, and we store our data in a sample metadata, feature metadata, and sample-by-feature count table.

So our sample metadata has the generic metadata about each sample (where it's from, when was it collected, etc.). Our feature metadata is the particular feature metadata (DNA amplicon sequence variant, taxonomy, all seven taxa levels pre-split). These would work perfectly fine in a standard SQL database, but the problem is when we get into our sample-by-feature table/collection. Currently, we have over 400k unique features and tens of thousands of samples. If we were to try and map the frequency of each feature in our samples in a standard row/column relational database table, our table would be un-usably large (or just entirely not work as PostgreSQL, MySQL have a hard column limit). Thus, the benefit of MongoDB's document system fixes all the issues this causes.

Each of our documents in the sample by feature table can consist of ONLY the counts of the features that were in the sample, ignoring all the other N-thousand features that may be present in others. Building a pandas DataFrame from a query of this data will just fill the absent features with NaN, and we can just fill those NaN's with zeroes and carry on working with our target subset of the database.

smt88 · on Sept 22, 2017

Your entire response seems to be about why you chose document storage instead of relational data. Relational data makes sense most of the time, but not always.

The question, again, is why you'd use MongoDB even for document storage when there are so many alternatives that are safer and better-engineered. Even Postgres has a JSON data type that, by some reports, works better than Mongo and is much safer to use.

Another thread here[1] has gone into some detail about alternatives.

1. https://news.ycombinator.com/item?id=15308214

jakereps · on Sept 22, 2017

If that's the case, then it's probably purely marketing. I had no idea PostgreSQL offered a comparable option.

btown · on Sept 22, 2017

JSON wasn't stable in Postgres until 2012, whereas the "web scale" memes for Mongo started after its initial release in 2009. Mongo indeed had better marketing and a first mover advantage, and it never let go.

cervo · on Sept 22, 2017

Early versions of mongodb locked the entire database server for every single write. Later versions locked the entire database for every single write. Only since Mongo 3.0 does MMAP only lock a single collection for a write and is WiredTiger available to offer you MVCC. And even in the MongoDB world materials it advertised now you can use more of your hardware.... Granted you could always shard, though that gets very expensive very quickly. Meanwhile postgresql had MVCC all along and its write speed was always faster than mongo on a single server basis. And you always could serialize data that changed into text fields as xml/json/csv/some other format. Mongo was mostly great marketing and a lousy product. Over time it has gotten much better. Mongo 3.2 is a way way way better product than mongo 1.x. But marketing did capitalize on a lot of hype which made no sense. Developers enjoyed just serializing their objects (the ones who didn't think of doing this into a text column in a relational db) and for the ones who did not need 'web scale' they had no idea about the severe concurrency issues introduced by mongo's writes. And for the rest there are plenty of posts about hacking around it or switching from Mongo to something else.

_ugfj · on Sept 22, 2017

And it wasn't usable until practically 2015 (2014-12-18 if you so want).

smt88 · on Sept 22, 2017

Something I don't think is mentioned elsewhere: other databases have shipped with Mongo compatibility. This means you can migrate from Mongo to something better without changing your code.

The most mainstream and interesting of these is Cosmos:

https://docs.microsoft.com/en-us/azure/cosmos-db/connect-mon...

(Cosmos is actually one of the most interesting DB products available for production right now, in my opinion.)

cervo · on Sept 22, 2017

Even prior to the JSON type you always could have serialized the data for the sample to some kind of string JSON/XML/CSV and put it in the column. But I think the one thing NOSQL data stores e.g. HBase/Cassandra and to a lesser Degree MongoDB has taught me is to pay attention to the tradeoffs number of rows, size of index, etc...

deepsun · on Sept 22, 2017

Yep, I benchmarked it myself, JSONB column type in PostgreSQL worked faster for me than same data in MongoDB (both reads and writes).

tzmudzin · on Sept 22, 2017

Maybe I misunderstood your problem, but how about a relational DB model with:

- FEATURES table (~400k rows), primary key: Feature_ID

- SAMPLES table (tens of thousands), primary key: Sample_ID

- OCCURRENCE table (what you observed), with the fields Feature_ID, Sample_ID, Occurrence_Count.

This is pretty much a standard solution well proven in data warehousing...

gaius · on Sept 22, 2017

If you process this kind of high-throughput data in R then yep, you get it as 3 data.frames as you describe, it's nicely tabular and easy to "join" by just matching index numbers.

jacquesc · on Sept 22, 2017

I've been using and relying on MongoDB for years (since 2010). It's pleasant to develop for, the query interface is awesome, and the library / ORM support is top notch (Mongoid on ruby, and Mongoose on node).

I also use Postgres regularly so I can compare the 2. And in my experience, I always enjoy working with Mongo more.

Many people don't care about the developer experience and rather focus on Mongo's lack of features like joins and transactions. There's definitely tradeoffs to choosing Mongo and I wouldn't criticize anyone for picking Postgres over Mongo, however the amount of belly-aching about how Mongo is "the worst" was always pretty ridiculous.

mrgordon · on Sept 22, 2017

If you’ve been using it since 2010, then your usage predates the aggregation framework. I’ll just state for the record that writing a JavaScript mapreduce (or telling a data scientist to write one) to count records is not a good experience.

The other issue that got me was I was running queries that should have used an index but indexOnly kept coming back false. I dig into the Mongo JIRA and sure enough it was a bug and queries were returning incorrect values for whether they were using an index.

The reason most people got mad at Mongo was probably their benchmark gaming though. They made unsafe writes the default for a long time so their default configuration benchmark would blow away SQL databases which fsync to disk by default (very reasonably!)

franciscop · on Sept 22, 2017

I have been using Mongoose for long enough (since 2012?) to have felt its pain point. The documentation is lacking (mainly of examples), contradictory and there are way too many breaking changes. This makes searching quite ineffective since you'll have to check every single piece of code you see in-depth to make sure it still works.

_ugfj · on Sept 22, 2017

> the query interface is awesome

Huh! We started with MongoDB in 2009 and went live with it in 2010 (though I left that site in 2010) and as far as I am aware we were (one of?) the first paying customers of 10gen. I remember my boss joking about how he needed to press 10gen to actually accept money for support.

I even contracted wtih MongoDB Inc in 2012-2015 but gosh, the query interface being awesome is not my experience for complex queries. Multiple $elemMatch makes for nightmare queries. Note: $elemMatch got added because we asked for the functionality but the syntax is not our making, that was Eliot.

In fact, out of all MongoDB criticism https://www.linkedin.com/pulse/mongodb-frankenstein-monster-... this one is the one I sorta agree with (even if the author is reprehensible but let's try not get caught up on that).

cervo · on Sept 22, 2017

Agree and the other annoying thing is that the queries and most of the apis are not the same, so you work and build the query you want in JSON and now you need to go translate that to program code :( And vice versa, something is wrong with the query so if you want to figure it out and not be in a debugger you need to translate the code back to a query. There are libraries that accept the JSON but the standard one has an api that does not use the json queries directly.

_ugfj · on Sept 22, 2017

typo: though I left that site in 2010 -- I meant 2011.

tomphoolery · on Sept 22, 2017

We use MongoDB to build eCommerce apps at my company, using a Ruby-based platform that can be easily extended. Here's a couple of the reasons why we use it:

1. It's easier to extend the pre-defined model classes that our platform provides with new fields, without having to manage migrations. Mongoid manages the schema for you.

2. Because our apps represent an eCommerce "storefront", they are _not_ the system of record for our clients. Each client has their own way of storing this data, and fulfills orders through an order management system. They control all of that, obviously, because it contains important financial data that they need to retain.

3. MongoDB is a well-supported database, with a large ecosystem and lots of users. It's a familiar DB for new developers that we hire, as well as for systems integrators who are also building apps on our platform. Using MongoCloud and official support from MongoDB, Inc., hard questions about our database technology are just a phone call away. Postgres is amazing, but it isn't supported by a company you can turn to when shit really goes south, so it's a scary choice for enterprise service providers.

4. It doesn't have terrible documentation. Pretty much everything is there, it's just hard to find. I really wish they would use something other than Confluence because the search on there is awful.

All this aside, I still use Postgres on my own Rails apps because I don't need the complex features of MongoDB. But after a few years at my current job, I can definitely see why Mongo was chosen.

nrser · on Sept 22, 2017

NoSQL has real use cases. The model is much closer to how you usually are structuring and handling data in your application. If you application is handling hierarchically structured, variously shaped, low-relational data - which many are because that's pretty much the default for how programming languages work - and does it's data processing and validation at the application level, then it can be nice to have a storage system that matches that closely.

It's that situation where a single structured User object would "foil-out" to like 10 de-normalized tables. It feels ridiculous. It often is a bit ridiculous. Also, I'm not sure how much Open Source RDBMS have improved in the last 8 years, but I know for a fact that MySQL used to just melt under joins like that with any serious load.

As far as the recent OS RDBMS JSON additions, they're just that - recent additions. They feel tacked on. They're treated differently then first-level values. Support in ORMs is lacking. It's hard to find examples online. They're not nearly as easy to use and well supported as hierarchical data is in systems build around it from the ground up (of course).

As far as why MongoDB... because it's the most popular solution. If you hit a problem, someone had probably hit it before, and you can probably find some material online. The leading drivers and libraries are relatively mature and widely used. It makes it a much safer bet as far as tractability of issues you will hit.

I don't really like MongoDB, but the case for using it seems pretty strait-forward to me.

curiousDog · on Sept 22, 2017

Sure, but why wouldn't you use one of the myriad of options offered by Google/Microsoft/Amazon? Dynamodb, BigTable, DocumentDb would be better options instead of betting on a startup, no? Their NoSQL tech was novel several years ago and not the cloud providers are far ahead and there's no way MongoDb will be able to catch up in-terms of Features, Scale, Reliability and Availability

nrser · on Sept 22, 2017

I think all of those services are cloud-based, yeah?

Cloud services are a great option to have available, but there are many reasons organizations choose to manage their own instances... pre-existing infrastructure investment, flexibility, data location and control, cost structure, regulatory requirements, etc.

I read here that Mongo has started a cloud-based offering, but the core of their business competes against other self-managed storage software like MySQL, Postgres, Hadoop, etc.

flavio81 · on Sept 22, 2017

>What are the best use cases nowadays for Mongo?

Prototyping apps with little relational data.

Logging.

toomuchtodo · on Sept 22, 2017

> Logging.

Would ElasticSearch not be superior for logging? Its (mostly) solid as a Graylog backend storing 10s of TBs of logs (in my experience).

KallDrexx · on Sept 22, 2017

I don't use ElasticSearch for persistent storage of logs, only for the recent logs that need to be searchable (past month or so). I've found enough times I've had to wipe out ES indexes to keep things humming along well.

laichzeit0 · on Sept 22, 2017

I've been doing a stack that uses Mongo, Sails and Vue for about a year now. What I like is that it feels like I'm just writing Javascript all the way from front end to query. Mongo queries are just JSON documents. I'm not sure how nicely Postgres queries would work in Nodejs? Can I just feed it a JSON object that represents a query and it will "work"?

The aggregation pipeline is also quite nice. I actually prefer it to Elasticsearch aggregations now. I've done some reasonably complex queries involving grouping deeply nested arrays of embedded documents, etc. using the aggregation pipeline.

Put it this way, with Elasticsearch when you have nested documents and the schema is variable and you want to do aggregation you end up in mapping hell. There is none of that crap in Mongo.

harunurhan · on Sept 22, 2017

> Can I just feed it a JSON object that represents a query and it will "work"?

You can use a query builder [1][2], although it's not exactly create a JSON object for a query but still better than directly writing SQLs and IMHO it's more clean than JSON object queries.

[1] - https://github.com/tgriesser/knex

[2] - https://github.com/hiddentao/squel

dustingetz · on Sept 22, 2017

Ease of use - you can get a lot done with only one developer

Mongo doesn't have JOIN pain, sparse tables, many to many join tables, etc - all stuff you need programmers for on sql

sgt · on Sept 22, 2017

I think you need to work on some real systems using MongoDB. Your statement is incredibly naïve, implying that there's little complexity in using MongoDB.