Weird, I just ran some benchmarks with bun a few days ago for my own purposes, and I concluded that their marketing slogan was bs and Node was faster.
However, I did it again with this example and Bun is indeed significantly faster
- 15.1k inserts/s with `uuid` and `Date.toUTCString()`
- 20k inserts/s with faking uuid and date (generated unique strings)
Bun is now scientifically at the Rust level of Web Scale, which is pretty crazy. I wonder how they do that, because it's still just v8 under the hood? Mayhaps their sqlite driver is faster.
What benchmark did you run where Node outperformed Bun? Happy to take a look.
There are still "obvious" things we need to fix, like our EventEmitter implementation is currently around 3x slower than Node due to C++ <> JS callback overhead and several functions in node:crypto are still using JS-only polyfills intended for web browsers (instead of BoringSSL/OpenSSL).
In general, I’ve found that tons of projects suffer leaks. People don’t care about leaks :( (or alternative the tooling for discovering them kinda sucks)
That's an important point but not likely the bulk of the issue (vs Node). Deno that uses V8 is also faster than Node. Node has a lot of other issues before we get to the Javascript engine part even.
Even with node.js, it's possible to be faster and/or use less memory than better-sqlite3. For example, here is an opinionated sqlite addon I wrote that shows just that (while executing queries asynchronously): https://github.com/mscdex/esqlite
If WAL mode is enabled and the inserts are running in a separate thread or process then the performance will be exactly while any amount of selects are running.
For reference, a single Apache Cassandra node does about 150k inserts/second or also 150k selects/second on my laptop. All that from a single client thread using a very similar approach (also in Rust with async).
CREATE TABLE IF NOT EXISTS users (
id BLOB PRIMARY KEY NOT NULL,
created_at TEXT NOT NULL,
username TEXT NOT NULL
);
The data:
let user = User {
id: uuid::Uuid::new_v4(),
created_at: chrono::Utc::now(),
username: String::from("Hello"),
};
If you can't insert 15,000 of these records per second then there's something wrong with your database. I'm aware that people are all on the SQLite hype train, but this kind of stuff is table-stakes.
I really like SQLite. But this blog isn't any kind of great performance indicator for it.
Well, some people think 3 inserts per second is fast, so maybe it gives some perspective? To be fair, I wouldn’t expect those to hang around hackernews though.
I think a lot of people have forgotten or don't know how powerful CPUs are. Simple operations should be insanely fast, like easily sub-millisecond if not sub-nanosecond fast.
Counterintuitively, write performance in modern databases is primarily constrained by disk read performance.
That said, it is increasingly rare for disk I/O to be a bottleneck for inserts. When your storage will happily sustain 10+ GB/s of write throughput, you tend to run out of things like network and memory bandwidth first.
I don't see why you wouldn't just bundle them all into a transaction at that point. The idea of 1 insert at a time is to roughly estimate the overhead/perf in a real environment, where you're only adding 1 entry at a time.
I think it's useful stress testing for when your product goes viral.
For example, if the whole world wanted to create an account, and an account consisted of a UUID and a date, it'd take 5.4 days using this metric as a minimum viable benchmark.
It's all about context, perception. 15,000 seems like a large number, and 1 second seems like a small amount of time. On the other hand, 5.4 days seems like quite a large number. Likewise at small scales, 15,000/second operated at a fidelity only able to capture 0.0005% of a typical processor's clock.
Like a picture in a frame.
This is not to say that the ability to casually write 15,000 things a second to a database is not impressive. Nor that a beowulf cluster of it wouldn't be.
I made a prototype message queue in Rust that processed about 7 million messages/second. That was running from RAM.
I also had a prototype that ran from disk and it very rapidly maxed out the random write performance of the SSD. Can't remember the number but it was in the order of 15K to 30K messages a second, bottlenecked by the SSD.
If the sqlite test is writing to disk it is likely also bottlenecked at the disk.
Is it possible to improve this micro benchmark? Of course, by bundling all the inserts in a single transaction, for example, or by using another, non-async database driver, but it does not make sense as it's not how a real-world codebase accessing a database looks like. We favor simplicity over theorical numbers.
I don't understand. Using transactions and prepared statements you would get hundreds of thousands of inserts per second even using languages like python/ruby.
Nothing theoretical or not-real-world about it. It is how SQLite docs recommend you do it.
It would not be real-world in the sense that most OLTP workloads are inserting (or updating) only a couple rows at a time. In this sense it provides an upper bound on transactions per second, which for many real-world workloads essentially means requests per second (or requests*n per second for small n).
However, it's still kind of weak because you really need to interleave those with SELECTs and some UPDATEs to get a decent view of how it would perform in such a case.
This is so lazy as to be irritating to me. Magic pragmas with no explanation, no use of the important ones if you're going for max performance (synchronous and journal mode), no discussion of statement compilation and the benefits of plan caching, no discussion of indexes or transactions... Just garbage to make a lazy point that maybe SQLite is fast.
Anyone able to replicate this? With XFS on an NVMe device I only get 500 inserts per second. I wonder if the 15000 figure is attributable to the scaleway virtual server SCSI device ignoring syncs.
They're using WAL and synchronous = NORMAL, so there shouldn't be that many syncs to begin with.
By default, I think it'd sync only on checkpoints, and a checkpoint would happen for every 1000 pages of WAL. 1000 is the default value of the wal_autocheckpoint pragma.
I am very surprised that you cannot reproduce more than 500 inserts/second on an NVMe drive with the same settings. I didn't try running this benchmark, as I'm not familiar with rust, but I've achieved ~15k/sec write transactions with python bindings under similar settings.
Well, there's something about either XFS, Linux 6.2, or the sqlite libraries on Ubuntu 23.04 because on my machine it's hitting 300 flushes per second while doing about 1300 writes / 13MB per second.
This is running on an AMD EPYC 7543 CPU, which has 32 cores that can each cycle 2,794,750,000 times per second. It's running with 3 threads (if I understand correctly), so as a rough estimate it's doing one insert every 500,000 CPU cycles (2,794,750,000 * 3 / 15000).
It's not enormous, but it's good for the simplicity and would probably be sufficient for quite a number of use cases. And I'd guess that it's also probably more than what many programmers get by Lego stacking 14 different abstractions.
The article is not meant to show off how fast it is, but how easy it is to achieve a good-enough level of performance without anything special. It's meant to show that you don't need specialized "infinitely scalable" services.
>"It's meant to show that you don't need specialized "infinitely scalable" services."
1) This "revelation" has been known to any decent programmer for ages.
2) The example is very contrived and has zero real practical value as the access pattern, amount of tables involved, constraints and ACID requirements in real life are very different.
I think articles like this are written to gain some points for employment purpose. If I was hiring however this type of article would be a net negative as it leaves me with the impression that the author has no clue about real life situations.
I work with sqlite a lot. Usually with millions rows. The last time I check my script written in crystal which read ten millions entries from the disk, remove the duplication then write the result to sqlite3 database file and it all happen in 70 seconds or so.
And I didn't have to do any fancy trick like in this article, just setting `journal_mode=WAL` and `synchronous=normal` was enough.
This reminded me of a prior discussion[0] on bulk data generation in SQLite with Rust (vs Python vs PyPy) which previously led me to trying out two different techniques using just SQLite[1]. The approach here is so similar I tried my prior solution on the slowest VPS I have access to (1 vCPU core 2.4GHz, 512Mi, $2.50/month from vultr):
sqlite> create table users (id blob primary key not null, created_at text not null, username text not null);
sqlite> create unique index idx_users_on_id on users(id);
sqlite> pragma journal_mode=wal;
sqlite> .load '/tmp/uuid.c.so'
sqlite> .timer on
sqlite> insert into users(id, created_at, username)
select uuid(), strftime('%Y-%m-%dT%H:%M:%fZ'), 'hello'
from generate_series limit 100000;
Run Time: real 1.159 user 0.572631 sys 0.442133
where the UUID extension comes from the SQLite authors[2] and generate_series is compiled into the SQLite CLI. It is possible further pragma-tweaking might eke out further performance but I feel like this representative of the no-optimization scenario I typically find myself in.
In the interest of finding where the bulk of the time is spent and on a hunch I tried swapping the UUID for plain auto-incrementing primary keys as well:
sqlite> insert into users(created_at, username) select strftime('%Y-%m-%dT%H:%M:%fZ'), 'hello' from generate_series limit 100000;
Run Time: real 0.142 user 0.068090 sys 0.025507
We chose sqlx for a SQLite based project because it provides a reasonable async interface, and our experience has not been great. Development momentum has been poor, PRs and issue reports are languishing, the developer is unresponsive to ergonomic and technical feedback, sqlite doesn't seem to be a priority, and the fundamental mechanism for compile-time query validation is very, very flawed. As our project has become more important to us, we've become more and more tempted to write our own SQLite library (maybe forking some parts of sqlx).
AFAICS you're doing all your inserts in a single transaction, unlike the article. Using transactions is better for performance, but probably less representative in this specific case.
A commit on every insert would definitely crush your performance. That's one of the worst ways to do bulk inserts.
Really the whole thing is kind of pointless because there are so many factors. It's not about the language, it's more about sqlite and its configuration.
This isn't about language use. This is about what is possible with a simple setup. Single service in language of your choice and just sqlite. The idea being that unless there is a good reason you probably don't need k8s, A high availability database server cluster, and a full on proxy routing layer. Adding more layers of architecture comes with costs and unless you get something essential in return for paying that cost you should probably not bother. You may in fact be able to get by with a single server backed by sqlite and maybe litestream for backups and if that is the case then there isn't any reason not to and a lot of reasons to do so.
They're working on it. For example, knowing that you can use async in some contexts (functions) but not in others (traits) is complexity that needs to be learned. By making it work everywhere, the language becomes simpler.
But no, there will always be some inherent complexity in the language because it has new concepts not present elsewhere. Newer tutorials in novel formats are being made all the time and they help with learning, but there's no way around having to learn new things.
I think async exists and it works well. There were 2 equally popular runtimes earlier but now there's only 1 that everyone uses. I wouldn't characterise that as contention, just people trying different stuff out.
Tokio is super popular but there are still better runtimes at the moment. The proper design you want is thread per core with some background pool of threads available for work stealing / batch jobs. The reason is that you can completely remove the need for any locking in your critical path. I think tokio is working on making that paradigm available but I don’t know how that’s progressing
For what it’s worth, Glommio is thread-per-core async executor library in Rust. Comes with io_uring support too. Not as widely supported as tokio is, obviously, but still viable enough, worth keeping an eye on IMO.
Yes that’s what I was referring to. I’m using that. ByteDance has a similar one although I haven’t tried it. The annoying thing is how deeply a specific async implementation “infects” all parts of the code. Would be nice for something more general so I could more easily test out individual implementations to compare performance vs having to rewrite a good chunk of the application.
Yeah I 100% agree, more executor-neutral API’s would be amazing. Especially for stuff like networking client libs, I wish hyper, Axum, tower, tonic, etc were all executor-neutral, or at least provided executor specialisations via features or something.
As a guy planning to use SQLite with Rust, it was interesting to me since you often don't have visibility on the performance cost when you cross the C-Rust barrier, or other FFI bridges. You have to write the code at which point it's a 70% done deal.
Plus the top comment gave useful additional data.
This saves time for other people looking to make an informed decision.
---
People disliking Rust seem to always be on the lookout for the Boogeyman to me. And degrading the quality of HN's comment sections. :(
Just look at your other replies. Seems that several people got lost on their way to 4chan and Reddit.
I don't dislike Rust, I'm trying to understand the blind upvotes that seem to regularly happen. You don't see "with $OTHERLANGUAGE" in the majority of post titles.
RE my posts, I'm old. I ask questions, I challenge things I think are wrong. I also praise. This is normal interaction, rather than blind cargo cult following.
Every community has bad apples. I've worked with hardcore Rust devs and they hate the cargo culting as well.
What ticks me off is that people single out Rust and that's very visible. Go criticize C/C++ legacy baggage and you'll get a very different engagement. Say something about Python and Golang. Very different responses.
At this point and to me, as a mostly indifferent side observer, people negatively overreact if Rust is praised -- and only cite a bunch of loonies that even the Rust devs and core contributors dislike.
Does not seem like a fair and objective assessment to me.
As techies we should be much better than that. Merit before bias and all that.
How do you know all the others are "blind cargo cult following"? Maybe is it that people that frequent Hacker News are interested in Rust, and as Rust is in the wind a lot of content is made about Rust, which again means that a lot of content is upvoted. I don't think it means its anything nefarious. A new language can get a lot of hype, a lot of hype means a lot of content. (I say this as someone who haven't done any Rust other than looking at some intro exercises a bit over a year ago).
It's also that most devs don't understand reasonable estimates on how many inserts/sec a database should be able to handle or how many requests/sec a website should be able to handle, and they upvote and wax estatic about some of the most unimpressive stats ever.
100k inserts using `better-sqlite3` (difference: it does use prepared statements), all single-threaded with the same pragmas.
- 8.6k inserts/s with `uuid` and `Date.toUTCString()`
- 12.0k inserts/s with faking uuid and date (generated unique strings)
...So a large amount of time was spend in uuid generation and current date.
EDIT: I mixed up the two numbers above. Simple strings is of course faster than syscall+date format+entropy. Fixed now.
It's fair to say that although Rust is slightly more Web Scale, this benchmark simply measures the excellent performance of SQLite.