Echoing the sentiment expressed by others here, for a scalable time-series database that continues to invest in its community and plays well with others, please check out TimescaleDB.
Just wanted to say I am super impressed with the work TimescaleDB has been doing.
Previously at NGINX I was part of a team that built out a sharded timeseries database using Postgres 9.4. When I left it was ingesting ~2 TB worth of monitoring data a day (so not super large, but not trivial either).
Currently I have built out a data warehouse using Postgres 11 and Citus. Only reason I didn't use TimescaleDB was lack of multi-node support in October of last year.
I sort of view TimescaleDB as the next evolution of this style of Postgres scaling. I think in a year or so I will be very seriously looking at migrating to TimescaleDB, but for now Citus is adequate (with some rough edges) for our needs.
If you're looking at scaling monitoring timeseries data you may also wanter to consider more Availability leaning architecture (in the CAP theory sense) with respect to replication (i.e. quorum write/read replication, strictly not leader/follower - active/passive architecture) then you might also want to check out the Apache 2 project M3 and M3DB at m3db.io.
I am biased obviously as a contributor. Having said that I think it's always worth understanding active/passive type replication and the implications and see how other solutions handle this scaling and reliability problem to better understand the underlying challenges that will be faced with instance upgrades, failover and failures in a cluster.
Neat! Hadn't heard of M3DB before, but cursory poke around the docs seems like it's a pretty solid solution/approach.
My current use case isn't monitoring, or even time series anymore, but will keep M3DB in mind next time I have to seriously push a time series/monitoring solution.
We were successfully ingesting hundreds of billions of ad serving events per day to it. It is much faster at query speed than any Postgres-based database (for instance, it may scan tens of billions of rows per second on a single node). And it scales to many nodes.
While it is possible to store monitoring data to ClickHouse, it may be non-trivial to set up. So we decided creating VictoriaMetrics [2]. It is built on design ideas from ClickHouse, so it features high performance additionally to ease of setup and operation. This is proved by publicly available case studies [3].
ClickHouse's intial release was circa 2016 IIRC. The work I was doing at NGINX predates ClickHouse's initial release by 1-2 years.
ClickHouse was certainly something we evaluated later on when we were looking at moving to a true columnar storage approach, but like most columnar systems there are trade-offs.
* Partial SQL support.
* No transactions (not ACID).
* Certain workloads are less efficient like updates and deletes, or single key look ups.
None of these are unique to ClickHouse, they are fairly well known trade-offs most columnar stores make to improve write throughput and prioritize high scaling sequential read performance. As I mentioned before, the amount of data we were ingesting never really reached the limits of even Postgres 9.4, so we didn't feel like we had to make those trade-offs...yet.
I would imagine that servicing ad events is several factors larger scale than we were dealing with.
I've been using InfluxDB, but not satisifed with limited InfluxQL, or over-complicated Flux query languages. I love Postgres so TimescaleDB looks awesome.
The main issue I've got is how to actually get data into TimescaleDB. We use telegraf right now, but the telegraf Postgres output pull request still hasn't been merged:
https://github.com/influxdata/telegraf/pull/3428
I was testing out both TimescaleDB/InfluxDB recently, including maybe using with Prometheus/Grafana. I was leaning towards Timescale, but InfluxDB was indeed a lot easier to quickly boot a "batteries included" setup and start working with live data.
I eventually spent a while reading about Timescale 1 vs 2, and testing the pg_prometheus[1] adapter and started thinking through integrating its schema to our other needs then realizing it's "sunsetted" and then reading about the new timescale-prometheus[2] adapter and reading through its ongoing design doc[3] with updated schema that I'm less a fan of.
I finally wound up mostly-settling on Timescale although I've put the Prometheus extension question on hold, just pulling in metrics data and outputting with ChartJS and some basic queries got me a lot closer to done for now. Our use case may be a little odd regardless, but I think a timescale-prometheus extension with a some ability to customize how the data is persisted would be quite useful.
Thanks, as I say our use case may be too odd to be worth supporting, but effectively we're trying to add a basic metrics (prom/ad-hoc) feature to an existing product (using Postgres) with an existing sort of opinionated "ORM"/toolkit for inserting/querying data.
Because of that and the small scale required, the choice of table-per-metric would be a tough fit and I think a single table with JSONB and maybe some partial indexes is going to work a lot better for us. It would just be nice if we could somehow code in our schema mapping and use the supported extension, but I get it may be too baked-into the implementation.
Anyway, overall we're quite happy with TimescaleDB!
I see, that makes a whole lot of sense. This is an interesting use-case. It actually may be possible to use some of the aggregates provided by the extension even with a different schema. If you are interested in exploring further, my username on slack is `mat` and my email is the same @timescale.com
If you want the best of both worlds, then try VictoriaMetrics. It is simple to deploy and operate and it is more resource-efficient comparing to InfluxDB. More on this, it supports data ingestion over Influx line protocol [1].
For small and medium needs, InfluxDB is a delight to use. On top of that, the data exploration tool within Chronograf is one-of-a-kind when it comes to exploring your metrics.
If you are looking for a vendor to host, manage, and provide attentive engineering support, check out my company: https://HostedMetrics.com
One of the biggest quirks that I had bumped up against with TimescaleDB is that it's backed by a relational database.
We are a company that ingests around 90M datapoints per minute across an engineering org of around 4,000 developers. How do we scale a timeseries solution that requires an upfront schema to be defined? What if a developer wants to add a new dimension to their metrics, would that require us to perform an online table migration? Does using JSONB as a field type allow for all the desirable properties that a first-class column would?
90M datapoints per minute means 90M/60=1.5M datapoints per second. Such amounts of data may be easily handled by specialized time series databases even in a single-node setup [1].
> What if a developer wants to add a new dimension to their metrics, would that require us to perform an online table migration?
Specialized time series databases usually don't need defining any schema upfront - just ingest metrics with new dimensions (labels) whenever you wish. I'm unsure whether this works with TimescaleDB.
Completely anecdotal, but I went from 50B to 2B per point using the compression feature. Mind you this is for slow moving sensor data collected at regular 1 second intervals.
Prior to the compression feature, I had the same complaint. Timescale strongly advocated for using ZFS disk compression if compression was really required. Requiring ZFS disk compression wasn't feasible for me.
I don't consider TimescaleDB to be a serious contender as long as I need a 2000 line script to install functions and views to have something essential for time-series data like dimensions:
Because TimescaleDB is a relational database for general-purpose time-series data, it can be used in a wide variety of settings.
That configuration you cite isn’t for the core TimescaleDB time-series database or internals, but to add a specific configuration and setup optimized for Prometheus data, so that users don’t need to think about it and TimescaleDB can work with the specific Prometheus data model right out of the box.
Those “scripts” automatically setup a deployment that is optimized specifically to have TimescaleDB serve as a remote backend for Prometheus without making users think about anything. Some databases would need to write a lot of specific internal code for this; TimescaleDB can enable this flexibility through configuring proper schemas, indexes, and views instead (in addition to some specific optimizations we've done to support PromQL pushdown optimizations in-database written in Rust).
This setup also isn't something that users think about. Users install our Prometheus stack with a simple helm command.
$ helm install timescale/timescale-observability
The Timescale-Prometheus connector (written in Go) sets this all up automatically and transparently to users. All you are seeing is the result of the system being open-source rather than having various close-source configuration details :)
(TimescaleDB engineer here). This is a really unfair critique.
The project you cite is super-optimized for the Prometheus use-case and data model. TimescaleDB beats InfluxDB on performance even without these optimization. It's also not possible to optimize in this way in most other time-series databases.
These scripts also work hard to give users a UIUX experience that mimicks PromQL in a lot of ways. This isn't necessary for most projects and is a very specialized use-case. That takes up a lot of the 2000 lines you are talking about.
> TimescaleDB beats InfluxDB on performance even without these optimization
Would you mind sharing the schema used for this comparison? Maybe I missed it in your documentation of use-cases.
When implementing dynamic tags in my own model, my tests showed that your approach is very necessary.
It's about as rude as telling someone their criticism is misplaced in that way. You've added more words to why you believe that, but that doesn't change the original sentiment.
You can say you don't want to support that usecase but someone inquired about whether you would. You said no, they're wrong to compare the two. Sounds a lot like "you're holding it wrong."
(Not affiliated with TimescaleDB, just trying to understand your critique)
TimescaleDB is a Postgres extension, which means they have to play PG's rules, which rather implies a complicated mesh of functions, views, and "internal-use-only" tables.
But once they're there, you can pretty much pretend they don't exist (until they break, of course, but this is true for everything in your software stack).
Is your complaint targeting the size of these scripts, or their contents? Because everything in there appears pretty reasonable to me.
My problem is what these scripts do. When working with timeseries you want to be able to filter on dimensions/tags/labels.
TimescaleDB doesn't support this out of the box, the reason being Postgres' limitations on multi column indexes when you have data types like jsonb.
What they have to do to work around this is to have a mapping table from dimensions/tags/labels which are stored in a jsonb field to integers which they then use to look the data up in the metrics table.
I would expect that a timeseries database is optimized for this type of lookup without hacks involving two tables.
(TimescaleDB engineer) Actually, the reason we break out the Jsonb like this is because json blobs are big and take up a lot of space and so we normalize it out. This is the same thing that InfluxDB or Prometheus do under the hood. We also do this under the hood for Timescale-Prometheus.
If you have a data model that is slightly more fixed than Prometheus, you'd create a column per tag and have a foreign-key to the labels table. That's a pretty simple schema which would avoid a whole lot of complexity in here. So if you are developing most apps this isn't a problem and filtering on dimensions/tags/labels isn't a problem. Again, this is only an issue of you have very dynamic user-defined tags, as Prometheus does. In which case you can just use Timescale-Prometheus.
I'll echo the sibling comment by the TDB engineer, having read and contributed to similar systems myself.
Most of these custom databases (Influx) use techniques similar to TDB's under the hood, in order to persist the information in an read/write optimized way; they just don't "expose" it as clearly to the end-user, because they aren't offering a relational database.
A critical difference between the two, is that custom databases don't benefit from the fine-tuned performance characteristics of Postgres. So even though they're doing the same thing, they'll generally be doing it slower.
Which kinda confirms the parent’s issues... InfluxDB is really easy to deploy and forget. With TimescaleDB you should be ready to know ins-and-outs of PG to secure and maintain correctly.
Sure, for scaling and high loads TDB might be good but InfluxDB is easier and suitable for most loads and maintainability.
> With TimescaleDB you should be ready to know ins-and-outs of PG to secure and maintain correctly.
Gotcha, this makes sense. To this, I'd pose the question: is the same not true for Influx? (i.e. With IFDB, you should be ready to know its ins and outs to secure and maintain it correctly). I guess I think about choosing a database like buying a house. I want it to be as good in X years as it is today, maybe better.
From this perspective, PG has been around for a long time, and will continue to be around for a long time. When it comes time to make tweaks to your 5 year old database system, it will be easier to find engineers with PG experience than Influx experience. Not to mention all of the security flaws that have been uncovered and solved by the PG community, that will need to be retrodden by InfluxDB etc.
Anyways, it's just an opinion, and it's good to have a diversity of them. FWIW, I think your perspective is totally valid. It's always interesting to find points where reasonable people differ.
Think if InfluxDB is "move-in ready" when buying a house. Sure you won't know the electrical and plumbing as well as you will a "fixer-uppper" in the long run, but you'll have a place to sleep a lot faster and easier.
I think it's a good analogy; very reasonable people come to different conclusions on this problem IRL too.
I prefer older houses (at least pre-80s). They have problems that are very predictable and minor. If they had major problems, they would show them quite clearly in 2020.
My friends prefer newer properties, but this is an act of faith, that the builders today have built something that won't have severe structural/electrical/plumbing problems in 20 years.
Of course, if you only plan on living some place for 3-5 years, none of this matters; you want to live where you're comfortable.
The linked scripts are not part of the base installation yet TimescaleDB claims, for example, Prometheus support.
If you design your own schema and want to filter on such a dynamic tag field you have to replicate what they have implemented in these scripts. If you look at the code you'll see that this isn't exactly easy and other products do this out of the box.
I use TimescaleDB and have suffered from this, otherwise I wouldn't be talking here. Another problem is that these scripts are not compatible with all versions of TimescaleDB so you have to be careful when updating either one.
IMO, the HN guidelines are pretty clear and reasonable regarding these situations.
> Be kind. Don't be snarky. Have curious conversation; don't cross-examine. Comments should get more thoughtful and substantive, not less, as a topic gets more divisive.
> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.
The original poster was making an assertion that TDB is too complex. As the guidelines suggest, it doesn't help the conversation to assume that they arrived upon this conclusion unreasonably, and that they're irrational or otherwise unwilling to have an open conversation about it.
We have to assume that they are open to a discussion, despite whatever we might interpret as evidence to the contrary.
Why performances shouldn't matter in this case? The more instances, the more timeseries and the more datapoints you have, the more you care for speed when you need to query it to visualize for example WoW changes in, say, memory usage.
Consider the case at hand, involving streaming charts and alerts. There will be zero perceptible difference in the streaming charts regardless of what database is used. Alerts won't trigger for whatever millisecond difference there may be, and I don't think that this matters to any developer or manager awaiting the 3am call.
TimescaleDB and my team is using it, but one significant drawback compared to solutions like Prometheus are the limitations of continuous aggregations (basically no joins, no order by, no window functions). That’s a problem when you want to consolidate old data.
(TimescaleDB engineer) we hear you and are working on making continuous aggregations easier to use.
For now, the recommended approach is to perform continuous_aggregates on single tables and perform joins, order by, and window when querying the materialized aggregate rather than when materializing.
This often has the added benefit of often making the materialization more general so that a wider range of queries can use it.
Thanks for the advice! Makes sense. We are doing something similar (aggregate on single table and join later). But still looking a solution to compute aggregated increments when there are counter resets.
A TimescaleDB engineer here. Current implementation of database distribution in TimescaleDB is centralised where all traffics go through an access node, which distributes the load into data nodes. The implementation uses 2PC. Abilities of PostgreSQL to generate distributed query plans are utilised together with TimescaleDB optimisations. So PostgreSQL is used not just a dumb storage :)
After a few years in the industry of systems engineering and administration I think that "no complaints so far" is one of the best compliments a software can receive.
We (I work at TimescaleDB) recently announced that multi-node TimescaleDB will be available for free, specifically as a way to keep investing in our community: https://blog.timescale.com/blog/multi-node-petabyte-scale-ti...
Today TimescaleDB outperforms InfluxDB across almost all dimensions (credit goes to our database team!), especially for high-cardinality workloads: https://blog.timescale.com/blog/timescaledb-vs-influxdb-for-...
TimescaleDB also works with Grafana, Prometheus, Telegraf, Kafka, Apache Spark, Tableau, Django, Rails, anything that speaks SQL...