Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
15k inserts/s with Rust and SQLite (2021) (kerkour.com)
116 points by mattrighetti on April 2, 2023 | hide | past | favorite | 93 comments


Decided to run this with NodeJS (on Linux) for mandatory Web Scale:

100k inserts using `better-sqlite3` (difference: it does use prepared statements), all single-threaded with the same pragmas.

- 8.6k inserts/s with `uuid` and `Date.toUTCString()`

- 12.0k inserts/s with faking uuid and date (generated unique strings)

...So a large amount of time was spend in uuid generation and current date.

EDIT: I mixed up the two numbers above. Simple strings is of course faster than syscall+date format+entropy. Fixed now.

It's fair to say that although Rust is slightly more Web Scale, this benchmark simply measures the excellent performance of SQLite.


Try again with bun. It uses a baked-in SQLite silicate to better-sqlite3.

https://bun.sh


Weird, I just ran some benchmarks with bun a few days ago for my own purposes, and I concluded that their marketing slogan was bs and Node was faster.

However, I did it again with this example and Bun is indeed significantly faster

- 15.1k inserts/s with `uuid` and `Date.toUTCString()`

- 20k inserts/s with faking uuid and date (generated unique strings)

Bun is now scientifically at the Rust level of Web Scale, which is pretty crazy. I wonder how they do that, because it's still just v8 under the hood? Mayhaps their sqlite driver is faster.


What benchmark did you run where Node outperformed Bun? Happy to take a look.

There are still "obvious" things we need to fix, like our EventEmitter implementation is currently around 3x slower than Node due to C++ <> JS callback overhead and several functions in node:crypto are still using JS-only polyfills intended for web browsers (instead of BoringSSL/OpenSSL).


> What benchmark did you run where Node outperformed Bun?

It was a very similar SQLite bench but I may have done something wrong.

> Happy to take a look.

That’s ok but let me bait and switch you for this dead socket leak issue: https://github.com/oven-sh/bun/issues/1397

In general, I’ve found that tons of projects suffer leaks. People don’t care about leaks :( (or alternative the tooling for discovering them kinda sucks)


One of their tricks is that Bun doesn't use Chrome's V8. Instead, they use Safari's JavascriptCore, which tends to be faster[citation needed].


That's an important point but not likely the bulk of the issue (vs Node). Deno that uses V8 is also faster than Node. Node has a lot of other issues before we get to the Javascript engine part even.


Not V8… it’s Apple’s JavaScript Core.


Even with node.js, it's possible to be faster and/or use less memory than better-sqlite3. For example, here is an opinionated sqlite addon I wrote that shows just that (while executing queries asynchronously): https://github.com/mscdex/esqlite


From my experience years ago a tight loop { bind() step() } in a transaction can run fast.


15k inserts/second is basically nothing. This is just a batch load of INSERT statements.

Now do it while simultaneously serving 15k selects/second.

Now try it with 15k counts/second. You'll see where this breaks down fast.


If WAL mode is enabled and the inserts are running in a separate thread or process then the performance will be exactly while any amount of selects are running.


For reference, a single Apache Cassandra node does about 150k inserts/second or also 150k selects/second on my laptop. All that from a single client thread using a very similar approach (also in Rust with async).


Can you qualify your statement that this is nothing?

How many CPUs and cores are available? Are we talking SSDs?

How much data is being inserted?

What are the keys and index?

15,000 per second seems like plenty much in some contexts.


The schema:

    CREATE TABLE IF NOT EXISTS users (
        id BLOB PRIMARY KEY NOT NULL,
        created_at TEXT NOT NULL,
        username TEXT NOT NULL
    );

The data:

    let user = User {
        id: uuid::Uuid::new_v4(),
        created_at: chrono::Utc::now(),
        username: String::from("Hello"),
    };

If you can't insert 15,000 of these records per second then there's something wrong with your database. I'm aware that people are all on the SQLite hype train, but this kind of stuff is table-stakes.

I really like SQLite. But this blog isn't any kind of great performance indicator for it.


This entire post reduces to:

    insert into users (id, created_at, username) values ("<id>", "<created_at>", "Hello"), ...15,000 times.

And we're going to be impressed with sqlite's performance?


Well, some people think 3 inserts per second is fast, so maybe it gives some perspective? To be fair, I wouldn’t expect those to hang around hackernews though.


I think a lot of people have forgotten or don't know how powerful CPUs are. Simple operations should be insanely fast, like easily sub-millisecond if not sub-nanosecond fast.


But, for insert, the CPU is not the bottleneck. It's I/O constrained.

Disk I/O has always been the bottleneck for writes.


Counterintuitively, write performance in modern databases is primarily constrained by disk read performance.

That said, it is increasingly rare for disk I/O to be a bottleneck for inserts. When your storage will happily sustain 10+ GB/s of write throughput, you tend to run out of things like network and memory bandwidth first.


You can double the performance of this whole thing by inserting 2 values at a time.


I don't see why you wouldn't just bundle them all into a transaction at that point. The idea of 1 insert at a time is to roughly estimate the overhead/perf in a real environment, where you're only adding 1 entry at a time.


I think it's useful stress testing for when your product goes viral.

For example, if the whole world wanted to create an account, and an account consisted of a UUID and a date, it'd take 5.4 days using this metric as a minimum viable benchmark.

It's all about context, perception. 15,000 seems like a large number, and 1 second seems like a small amount of time. On the other hand, 5.4 days seems like quite a large number. Likewise at small scales, 15,000/second operated at a fidelity only able to capture 0.0005% of a typical processor's clock.

Like a picture in a frame.

This is not to say that the ability to casually write 15,000 things a second to a database is not impressive. Nor that a beowulf cluster of it wouldn't be.


15K doesn't seem particularly noteworthy.

I made a prototype message queue in Rust that processed about 7 million messages/second. That was running from RAM.

I also had a prototype that ran from disk and it very rapidly maxed out the random write performance of the SSD. Can't remember the number but it was in the order of 15K to 30K messages a second, bottlenecked by the SSD.

If the sqlite test is writing to disk it is likely also bottlenecked at the disk.


To be fair it's very easy to outperform a DBMS by not offering ACID guarantees.


Agree. I know for Postgres, we are getting at least 50K / second sustained and not doing anything special - other than using bulk insert.


> I made a prototype message queue in Rust that processed about 7 million messages/second. That was running from RAM.

Can you share the source code?


It was pure garbage - a failed attempt to learn Rust.


Need some garbage collection added?


  Is it possible to improve this micro benchmark? Of course, by bundling all the inserts in a single transaction, for example, or by using another, non-async database driver, but it does not make sense as it's not how a real-world codebase accessing a database looks like. We favor simplicity over theorical numbers.
I don't understand. Using transactions and prepared statements you would get hundreds of thousands of inserts per second even using languages like python/ruby.

Nothing theoretical or not-real-world about it. It is how SQLite docs recommend you do it.

https://www.sqlite.org/faq.html#q19

https://rogerbinns.github.io/apsw/tips.html#sqlite-is-differ...


It would not be real-world in the sense that most OLTP workloads are inserting (or updating) only a couple rows at a time. In this sense it provides an upper bound on transactions per second, which for many real-world workloads essentially means requests per second (or requests*n per second for small n).

However, it's still kind of weak because you really need to interleave those with SELECTs and some UPDATEs to get a decent view of how it would perform in such a case.


I had 30k with PostgreSQL 6.5.3 like in 1999 on a PIII-400 or something like that.


This is so lazy as to be irritating to me. Magic pragmas with no explanation, no use of the important ones if you're going for max performance (synchronous and journal mode), no discussion of statement compilation and the benefits of plan caching, no discussion of indexes or transactions... Just garbage to make a lazy point that maybe SQLite is fast.


I agree that there's lots of room for improvement in both the benchmark and the exposition around it, but I disagree with:

> no use of [...] synchronous and journal mode

They enable both of these when configuring the connection:

    let connection_options = SqliteConnectOptions::from_str(&database_url)?
        .create_if_missing(true)
        .journal_mode(SqliteJournalMode::Wal)
        .synchronous(SqliteSynchronous::Normal)
        .busy_timeout(pool_timeout);


Anyone able to replicate this? With XFS on an NVMe device I only get 500 inserts per second. I wonder if the 15000 figure is attributable to the scaleway virtual server SCSI device ignoring syncs.


They're using WAL and synchronous = NORMAL, so there shouldn't be that many syncs to begin with.

By default, I think it'd sync only on checkpoints, and a checkpoint would happen for every 1000 pages of WAL. 1000 is the default value of the wal_autocheckpoint pragma.

I am very surprised that you cannot reproduce more than 500 inserts/second on an NVMe drive with the same settings. I didn't try running this benchmark, as I'm not familiar with rust, but I've achieved ~15k/sec write transactions with python bindings under similar settings.


Using the code from the repo, I got this on a standard Dell XPS laptop, running Ubuntu 20.04.1 with ext4 on an NVMe:

    $ cargo run --release -- -c 100 -i 100000
    ...
    Inserting 100000 records. concurrency: 100
    Time elapsed to insert 100000 records: 7.939734809s (12594.88 inserts/s)


Well, there's something about either XFS, Linux 6.2, or the sqlite libraries on Ubuntu 23.04 because on my machine it's hitting 300 flushes per second while doing about 1300 writes / 13MB per second.


> pragma temp_store = memory;

Did you do this?


I am just taking their repo and command directly from their blog post.


I'm probably missing something, but why would the temp_store pragma be relevant? They don't appear to be using temp tables in this benchmark.


Is that a lot?


This is running on an AMD EPYC 7543 CPU, which has 32 cores that can each cycle 2,794,750,000 times per second. It's running with 3 threads (if I understand correctly), so as a rough estimate it's doing one insert every 500,000 CPU cycles (2,794,750,000 * 3 / 15000).


Pretty sure cpu is not a limiting factor here


  15 per millisecond sounds horrible.. and it looks like it is actually using 3 threads so its even worse..


It's not enormous, but it's good for the simplicity and would probably be sufficient for quite a number of use cases. And I'd guess that it's also probably more than what many programmers get by Lego stacking 14 different abstractions.


Doesn't strike me as a lot honestly.


Translation - SQLite is capable of 15K batch uniform inserts/s under some configuration. I have no clue what is so impressing here.

And what magical potion Rust adds to this result?


The article is not meant to show off how fast it is, but how easy it is to achieve a good-enough level of performance without anything special. It's meant to show that you don't need specialized "infinitely scalable" services.


>"It's meant to show that you don't need specialized "infinitely scalable" services."

1) This "revelation" has been known to any decent programmer for ages.

2) The example is very contrived and has zero real practical value as the access pattern, amount of tables involved, constraints and ACID requirements in real life are very different.


You're correct. Posts like these are made by people who haven't actually tried to scale the approaches they're advocating.


I think articles like this are written to gain some points for employment purpose. If I was hiring however this type of article would be a net negative as it leaves me with the impression that the author has no clue about real life situations.


I work with sqlite a lot. Usually with millions rows. The last time I check my script written in crystal which read ten millions entries from the disk, remove the duplication then write the result to sqlite3 database file and it all happen in 70 seconds or so.

And I didn't have to do any fancy trick like in this article, just setting `journal_mode=WAL` and `synchronous=normal` was enough.

p/s: yeah transaction helps a lot.


This reminded me of a prior discussion[0] on bulk data generation in SQLite with Rust (vs Python vs PyPy) which previously led me to trying out two different techniques using just SQLite[1]. The approach here is so similar I tried my prior solution on the slowest VPS I have access to (1 vCPU core 2.4GHz, 512Mi, $2.50/month from vultr):

    sqlite> create table users (id blob primary key not null, created_at text not null, username text not null);
    sqlite> create unique index idx_users_on_id on users(id);
    sqlite> pragma journal_mode=wal;
    sqlite> .load '/tmp/uuid.c.so'
    sqlite> .timer on
    sqlite> insert into users(id, created_at, username)
            select uuid(), strftime('%Y-%m-%dT%H:%M:%fZ'), 'hello'
            from generate_series limit 100000;
    Run Time: real 1.159 user 0.572631 sys 0.442133
where the UUID extension comes from the SQLite authors[2] and generate_series is compiled into the SQLite CLI. It is possible further pragma-tweaking might eke out further performance but I feel like this representative of the no-optimization scenario I typically find myself in.

In the interest of finding where the bulk of the time is spent and on a hunch I tried swapping the UUID for plain auto-incrementing primary keys as well:

    sqlite> insert into users(created_at, username) select strftime('%Y-%m-%dT%H:%M:%fZ'), 'hello' from generate_series limit 100000;
    Run Time: real 0.142 user 0.068090 sys 0.025507
Clearly UUIDs are not free!

[0]: https://news.ycombinator.com/item?id=27872575

[1]: https://idle.nprescott.com/2021/bulk-data-generation-in-sqli...

[2]: https://sqlite.org/src/file/ext/misc/uuid.c


We chose sqlx for a SQLite based project because it provides a reasonable async interface, and our experience has not been great. Development momentum has been poor, PRs and issue reports are languishing, the developer is unresponsive to ergonomic and technical feedback, sqlite doesn't seem to be a priority, and the fundamental mechanism for compile-time query validation is very, very flawed. As our project has become more important to us, we've become more and more tempted to write our own SQLite library (maybe forking some parts of sqlx).


Have you checked out rusqlite? No affiliation, just curious.


TLDR: C++ is 115K inserts/sec to 166K inserts/sec

Here is a totally unscientific benchmark in C++. Of course there is "lies, damned lies and benchmarks".

AMD Ryzen 9 5900X 12-Core Processor

./sqlitetest

For writing to disk: Inserted 1500000 records in 13 seconds, at a rate of 115385 records per second.

For writing to RAM: change users.db to :memory: Inserted 1500000 records in 9 seconds, at a rate of 166667 records per second.

Credit goes not to me but to everyone's favorite AI programmer.

I've made a tiny attempt to optimise - likely much more can be done.

To compile on Linux:

g++ sqlitetest.cpp -lsqlite3 -luuid -o sqlitetest

    #include <iostream>
    #include <cstring>
    #include <ctime>
    #include <chrono>
    #include <sqlite3.h>
    #include <uuid/uuid.h>
    
    struct User {
        uuid_t id;
        char created_at[25];
        char username[6];
    };
    
    int main() {
        sqlite3 *db;
        sqlite3_open(":memory:", &db);
    
        const char *sql = "CREATE TABLE IF NOT EXISTS users ("
                          "id BLOB PRIMARY KEY NOT NULL,"
                          "created_at TEXT NOT NULL,"
                          "username TEXT NOT NULL"
                          ");";
        sqlite3_exec(db, sql, nullptr, nullptr, nullptr);
    
        sqlite3_exec(db, "BEGIN", nullptr, nullptr, nullptr);
    
        User user;
        char uuid_str[37];
        time_t now = time(nullptr);
    
        int count = 0;
        auto start = std::chrono::steady_clock::now();
    
        for (int i = 0; i < 1500000; i++) {
            uuid_generate(user.id);
            uuid_unparse_lower(user.id, uuid_str);
            strftime(user.created_at, sizeof(user.created_at), "%Y-%m-%d %H:%M:%S", localtime(&now));
            strncpy(user.username, "Hello", sizeof(user.username));
            sqlite3_stmt *stmt;
            sqlite3_prepare_v2(db, "INSERT INTO users (id, created_at, username) VALUES (?, ?, ?);", -1, &stmt, nullptr);
            sqlite3_bind_blob(stmt, 1, user.id, sizeof(user.id), SQLITE_STATIC);
            sqlite3_bind_text(stmt, 2, user.created_at, -1, SQLITE_STATIC);
            sqlite3_bind_text(stmt, 3, user.username, -1, SQLITE_STATIC);
            sqlite3_step(stmt);
            sqlite3_finalize(stmt);
            count++;
        }
    
        sqlite3_exec(db, "COMMIT", nullptr, nullptr, nullptr);
    
        auto end = std::chrono::steady_clock::now();
        auto elapsed = std::chrono::duration_cast<std::chrono::seconds>(end - start).count();
        double rate = static_cast<double>(count) / elapsed;
    
        std::cout << "Inserted " << count << " records in " << elapsed << " seconds, at a rate of " << rate << " records per second." << std::endl;
    
        sqlite3_close(db);
    
        return 0;
    }


AFAICS you're doing all your inserts in a single transaction, unlike the article. Using transactions is better for performance, but probably less representative in this specific case.


True.

A commit on every insert would definitely crush your performance. That's one of the worst ways to do bulk inserts.

Really the whole thing is kind of pointless because there are so many factors. It's not about the language, it's more about sqlite and its configuration.


The original code doesn’t use a single transaction.


The thing to note of course is that this is single core. Only one core lights up when running.


The botteneck is on sqlite, you'll get the same performance with any programming language. Rust had nothing to do with this.


How is this not completely I/O bound? What's so cpu intensive that programming language matters?


This isn't about language use. This is about what is possible with a simple setup. Single service in language of your choice and just sqlite. The idea being that unless there is a good reason you probably don't need k8s, A high availability database server cluster, and a full on proxy routing layer. Adding more layers of architecture comes with costs and unless you get something essential in return for paying that cost you should probably not bother. You may in fact be able to get by with a single server backed by sqlite and maybe litestream for backups and if that is the case then there isn't any reason not to and a lot of reasons to do so.


Ah. Thanks for the context


How many MB per second is this?

I was inserting data at 5MB/s using MySQL and C++ some years ago.

MySQL was the bottleneck, the C++ part could potentially run many times faster.


Yeah but rust has gotten simpler?


They're working on it. For example, knowing that you can use async in some contexts (functions) but not in others (traits) is complexity that needs to be learned. By making it work everywhere, the language becomes simpler.

But no, there will always be some inherent complexity in the language because it has new concepts not present elsewhere. Newer tutorials in novel formats are being made all the time and they help with learning, but there's no way around having to learn new things.


I thought there was contention within the community about async?


I think async exists and it works well. There were 2 equally popular runtimes earlier but now there's only 1 that everyone uses. I wouldn't characterise that as contention, just people trying different stuff out.


I still think async/await is a bit of a fad outside web services, and one I can't ignore if I want to use some popular crates.


Tokio is super popular but there are still better runtimes at the moment. The proper design you want is thread per core with some background pool of threads available for work stealing / batch jobs. The reason is that you can completely remove the need for any locking in your critical path. I think tokio is working on making that paradigm available but I don’t know how that’s progressing


For what it’s worth, Glommio is thread-per-core async executor library in Rust. Comes with io_uring support too. Not as widely supported as tokio is, obviously, but still viable enough, worth keeping an eye on IMO.


Yes that’s what I was referring to. I’m using that. ByteDance has a similar one although I haven’t tried it. The annoying thing is how deeply a specific async implementation “infects” all parts of the code. Would be nice for something more general so I could more easily test out individual implementations to compare performance vs having to rewrite a good chunk of the application.


Oh cool.

Yeah I 100% agree, more executor-neutral API’s would be amazing. Especially for stuff like networking client libs, I wish hyper, Axum, tower, tonic, etc were all executor-neutral, or at least provided executor specialisations via features or something.


For me, everyday.


No idea why this is on the front page. Doesn't seem like anything special. People seem to blindly upvote anything "with Rust", and I've no idea why.


As a guy planning to use SQLite with Rust, it was interesting to me since you often don't have visibility on the performance cost when you cross the C-Rust barrier, or other FFI bridges. You have to write the code at which point it's a 70% done deal.

Plus the top comment gave useful additional data.

This saves time for other people looking to make an informed decision.

---

People disliking Rust seem to always be on the lookout for the Boogeyman to me. And degrading the quality of HN's comment sections. :(

Just look at your other replies. Seems that several people got lost on their way to 4chan and Reddit.


I don't dislike Rust, I'm trying to understand the blind upvotes that seem to regularly happen. You don't see "with $OTHERLANGUAGE" in the majority of post titles.

RE my posts, I'm old. I ask questions, I challenge things I think are wrong. I also praise. This is normal interaction, rather than blind cargo cult following.


Every community has bad apples. I've worked with hardcore Rust devs and they hate the cargo culting as well.

What ticks me off is that people single out Rust and that's very visible. Go criticize C/C++ legacy baggage and you'll get a very different engagement. Say something about Python and Golang. Very different responses.

At this point and to me, as a mostly indifferent side observer, people negatively overreact if Rust is praised -- and only cite a bunch of loonies that even the Rust devs and core contributors dislike.

Does not seem like a fair and objective assessment to me.

As techies we should be much better than that. Merit before bias and all that.


How do you know all the others are "blind cargo cult following"? Maybe is it that people that frequent Hacker News are interested in Rust, and as Rust is in the wind a lot of content is made about Rust, which again means that a lot of content is upvoted. I don't think it means its anything nefarious. A new language can get a lot of hype, a lot of hype means a lot of content. (I say this as someone who haven't done any Rust other than looking at some intro exercises a bit over a year ago).


Eh this has happened before on HN. I remember when it was "with Ruby" or "with Go".

Just move on. Rust will pass eventually into the realm of boring and it will be "with Fooboz" or something


It's also that most devs don't understand reasonable estimates on how many inserts/sec a database should be able to handle or how many requests/sec a website should be able to handle, and they upvote and wax estatic about some of the most unimpressive stats ever.


Anything with Rust or SQLite is basically instant front page.


And anything opposing it is instant downvote :p


Nah, the WebAssembly version of Postgres got upvoted.

I'm waiting for someone to transpile MySQL to PHP to see what the limits of HN upvotes are, however.


So what you're saying is I have a weekend project now...


It’s a known tactic to get anything on HN frontpage. Just put Rust in the title, even if it’s about a food recipe.


“Baking pancakes with Rust, a delicious recipe!”


No that's r/castiron


This may be a legacy of kuro5hin.


I clicked the link. I read the article. Understood the code. I like rust. I forgot to click upvote button. Thanks for the reminder.


upvoted this post to make this argument




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: