2021 in Database Startups: Gold Rush

JCM9 · on Dec 31, 2021

The VC market is super frothy at the moment.

That drives up valuations in the short term and results in a very easy time to raise capital (aka Gold Rush). However what follows is a period where the market will turn the screws really hard on companies to show they have a viable and economically justifiable business or they’ll get washed out real quick.

We’re not in a bubble like dot com but it will get very uncomfortable for companies that don’t have a clear path to self-sustaining economics.

ashvardanian · on Dec 31, 2021

Maybe not exactly the same bubble, but it's hard not to make parallels. Pets.com, one of the most comical cases of 2000s, was worth $400M at peak. When Chewy.com went public, they dwarfed that number with a new record of $13B. The analogy was so obvious, that CNBC wrote an article on that [1].

After that $CHWY managed to go as high as $118/share, translating into a market cap of roughly $50 Billion in December 2021. Now they are trading at about half that price.

[1]: https://www.cnbc.com/2019/07/26/opinion-chewy-is-no-petscom....

joshgel · on Dec 31, 2021

Isn’t the fact that they got cut in half (and other examples, peloton for example) evidence that this isn’t a bubble. There are companies that are exploding in value, yes, but also companies just getting crushed left and right. If you’re growing, your value is very high, but miss your targets and boom, bye bye froth.

Traster · on Dec 31, 2021

Isn't what you're describing stage 4 of a bubble - https://www.investopedia.com/articles/stocks/10/5-steps-of-a...

geophile · on Dec 31, 2021

Having been in or associated with many database startups, I would not invest in one. You need to evangelize a product on which the life of someone’s project or company depends. in such a situation, you want boring technology that just works. Hard to imagine a new feature, or performance win, that justifies the vastly increased risk of obtaining it.

mnahkies · on Dec 31, 2021

I'm not sure I agree with this - there have been many postgres features over the past ~10 years that got me excited, some of the highlights that immediately spring to mind:

- JSON / jsonb

- improved partitioning support

- identity columns

I agree that I wouldn't want to trust a new entrant to the space unless it was subject to rigorous testing, eg https://jepsen.io/

Nextgrid · on Dec 31, 2021

> postgres

Postgres has the advantage that it's still open-source, can be ran locally and self-hosted (and has a healthy ecosystem of managed providers) and won't go away anytime soon. None of these are true for all these new DB startups.

geophile · on Dec 31, 2021

Doing well on the Jepsen tests is a very good sign. But it isn't anywhere near enough. Other things to consider:

- The product has bugs not addressed by Jepsen.

- Performance is inadequate.

- The optimizer is immature, and you find yourself struggling to understand and work around its limitations.

- Replication is missing or doesn't work well enough.

- There is absolutely no substitute for N years of real-world experience, no matter how rigorous the company's testing.

- Impossibility of finding people already familiar with the product, because they are all employed by the vendor.

- Even aside from product issues, is the company profitable? If not, how long is it going to be in existence?

KronisLV · on Dec 31, 2021

> You need to evangelize a product on which the life of someone’s project or company depends.

I've actually seen the results of a database product being picked, the company behind which ceased to exist. It wasn't pretty, since at one point the product ceased to be supported and therefore neither any updates/fixes were made, it wasn't available in the repositories for new OS distros and eventually even the documentation for it went offline. Having to support a system that integrated with it was an unpleasant experience, all the way to it being eventually replaced with something else instead.

Therefore, it probably makes a lot of sense to base something as critical as your data storage layer on proven technologies that have demonstrated that they'll probably be supported in one form or another for the following years or even decades, unless you have a good reason for choosing something else.

Those reasons might deal with particular workloads or requirements, e.g. clustering solutions for PostgreSQL/MySQL/MariaDB/..., geospatial extensions, solutions to integrate with it through REST interfaces or even GraphQL or something like that, with a stable and proven piece of software still at the core of it all.

In case anyone is wondering, the product in question was Clusterpoint, about which you can read a bit more here: https://en.wikipedia.org/wiki/Clusterpoint a NoSQL database that actually predates MongoDB by a few years, as far as i know. Of course, now it seems like even their homepage is offline.

jd_mongodb · on Jan 2, 2022

MongoDB 1.0 was released in 2009. ClusterPoint seems to have launched several years later. https://www.mongodb.com/blog/post/10-ga-released. MongoDB the company was founded in 2007.

KronisLV · on Jan 2, 2022

Clusterpoint: https://en.wikipedia.org/wiki/Clusterpoint

> Clusterpoint Ltd.

> Founded 2006

> Clusterpoint Database

> Initial release 2006

> Stable release 2015

MongoDB: https://en.wikipedia.org/wiki/MongoDB

> MongoDB, Inc. (formerly 10gen, Inc.)

> Founded 2007

> MongoDB

> Initial release 2009

> Stable release 2021

Then again, maybe Clusterpoint's Wikipedia page is lying, i don't really care much about it anymore.

zitterbewegung · on Dec 31, 2021

I think that open source database startups (or even ones that do data management like Elasticsearch) a big problem (other than finding traction obviously ). One is if you release your source code it’s very easy for someone else to just copy the straight forward business model of selling it as SaaS.

The second one would be keeping your promises or even competing entrenched databases.

Even when you have gotten this far a way that you can exit is to get one enterprise customer convinced to use it so badly they must acquirer you.

The biggest lie I have seen across databases is actual time travel for your data. Or in other words given a date and time in the past it gives you what the data was then.

Traster · on Dec 31, 2021

On the business model side of things, I think the most important thing to think about is the threat of the cloud providers. In the same way that MSFT were able to take most of the profits from the 90s/00s PC business, and Apple were able to take most of the profits from the appstore, and Amazon takes most of the profits from the Amazon market place, in the same way the cloud providers are realistically going to take the vast majority of the profits from running your DB in the cloud. Elasticsearch has exactly this problem. If they're lucky, Amazon will buy them. If they're unlucky Amazon will just steal their tech and take the profits (some would argue this already happened). They're by and large living in someone elses ecosystem.

lumost · on Dec 31, 2021

> The biggest lie I have seen across databases is actual time travel for your data. Or in other words given a date and time in the past it gives you what the data was then.

I worked at a company where we built this for our internal HBase clusters. Given any timestamp (or even better a commit ID) we could restore the cluster to that point in time. This was done through a combination of backing up the store files + write ahead logs. As I recall we had a retention limit for how much time we could do this for - but they were high enough for all of the "oh crap" moments.

pengaru · on Dec 31, 2021

That seems like it would burn a lot of space on store file snapshots to replay the relevant WAL atop.

Did they integrate with an underlying filesystem's snapshotting feature to make the store file snapshots differentials, or was that also implemented in the database? Or just threw space at the problem?

bushbaba · on Dec 31, 2021

While not a database. Apache iceberg has the time travel functionality for read heavy big data analytics, among other optimizations.

christkv · on Dec 31, 2021

Thus the reason they have created new licenses that poison the well for companies trying to offer SaaS services using mongo or elastic as two examples or just GPL for Neo4j

zitterbewegung · on Jan 1, 2022

I don’t know if they understood the implications of the Apache Software License when they should have used a closed source and or AGPL license.

mamcx · on Dec 31, 2021

My dream is enter in this space with a version of Access+Excel in the spirit of the old foxpro/dbase tools.

I have a sketch of the lang side here:

https://tablam.org

A big problem is that get funding outside of USA is not that easy, and here in Colombia this kind of projects is not considered "profitable" by investors.

But still convinced exist a lot of untapped potential in this area!

dang · on Dec 31, 2021

Recent and related:

Databases in 2021: A Year in Review - https://news.ycombinator.com/item?id=29731885 - Dec 2021 (126 comments)

ashvardanian · on Dec 31, 2021

Yes it is! It’s the first link in this article as well)

cletus · on Dec 31, 2021

Two thoughts spring to mind.

The first is RethinkDB [1], which was a darling for awhile but couldn't seem to find a business model. I really wonder if it came later how much money it could've raised. There were claims made that the cloud killed RethinkDB. This past year seems to fly in the face of that theory.

The second is: is this just database "success" (as measured by funding rounds and valuations) or is this just the general case with all startups? I honestly don't know.

Still, I wasn't aware of all these players and it was a good post so thanks for that.

[1]: https://rethinkdb.com/blog/rethinkdb-shutdown/

ashvardanian · on Dec 31, 2021

Thanks a lot! I have never worked with RethinkDB specifically, but I can assure you - it's only one of many names in the DBMS graveyard. Most have probably died because of inability to monetize open-source software.

For DBMS startups, which don't offer an order of magnitude improvements, the problem is convincing any one to replace a fundamental piece of infrastructure. For disruptive startups the problem is even bigger. As soon as you go open-source, all of your competitive technological advantage is gone.

Andys · on Jan 1, 2022

I'm happy with cockroachdb lately. It is drop in replacement for postgres for most apps. It is quite a joy to use now.

They have a new PAYG serveless offering that starts at $0 and scales up.

m_ke · on Dec 31, 2021

It has been a gold rush for all startups, especially late stage. (see: https://www.economist.com/sites/default/files/images/2021/11...) I wouldn't use valuations as a signal of traction.

titanomachy · on Dec 31, 2021

This is the first I’m hearing of Unum. Their benchmarks are impressive. I’ve often wondered how much we are held back in performance by all the layers of abstraction on data storage (db, os, filesystem, disk layout) and it seems they’re able to significantly beat the incumbents, partly by collapsing down the layers.

I’m not sure what BLAS has to do with a key-value store.

ashvardanian · on Dec 31, 2021

> it seems they’re able to significantly beat the incumbents, partly by collapsing down the layers.

Hahaha, thanks for the kind words! Not sure if I could describe our methodology better)) We originally come from and hope to continue working on AGI technologies (Artificial General Intelligence).

Writing a key-value store and, subsequently, a DBMS came out of necessity, as other solutions seemed too bulky, slow, outdated and expensive. So we invested time into building UnumDB for internal use, but after first benchmarks, it was clear, that this must be available to broader audience. We will polish it in the next couple of months and make it deployable in public clouds, just like other DBMS brands.

Now DBMS is our central focus, but with essential R&D out of the way, we will rebalance our priorities and continue working on BLAS and sparse neural nets.

sona_zevit · on Dec 31, 2021

It has been really an amazing year for startup DB providers in terms of the raised capital but we are yet to see the yield as users…

therealdrag0 · on Dec 31, 2021

Anyone know about RavenDB? Came across them years ago. Seems neat, but never heard of anyone using them. Also has been a for profit company for a while but not sure if they use venture or self sustaining

redwood · on Dec 31, 2021

Seemed like they had some traction in MSFT shops 5-6 years ago, guessing they hit some synergies in the .NET ecosystem?

jeremycarter · on Dec 31, 2021

Yeah a lot of .NET shops backed raven due to it's integration with LINQ at the time. It was actually very powerful and feature rich. Haven't touched raven in 8 years, although I have checked out their time series functionality.

redwood · on Dec 31, 2021

Great list. One omission I'm certainly tracking is Fauna

ashvardanian · on Dec 31, 2021

Thanks!

It was on the longer list of smaller DBMS brands, but I was afraid to make the volume incomprehensible. Maybe we will write a more focused version, covering just smaller startups in 2022.

redwood · on Dec 31, 2021

Sure Fauna are interesting with their serverless traction.

Arango is another one, seems to target Germany maybe kind of like SUSE

ashvardanian · on Dec 31, 2021

There is also ScyllaDB, Rockset, Hazelcast, Aerospike, VoltDB, GridGain… too many to count and all valued in hundreds of millions of $

redwood · on Jan 3, 2022

Oh right Scylla! I have to say, it's remarkable that there are two independent companies dividing the already small C* space (they and Datastax)