Hacker Newsnew | past | comments | ask | show | jobs | submit | more alexgarcia-xyz's commentslogin

Even though it's written in Rust, you can still use it in other languages like Python or Node.js. It compiles to a shared library file, which most SQLite clients support with `.loadExtension()` or another similarly named method.

This extension isn't the best example, since it's a thrown-together demo, but sqlite-ulid is a similar extension written in Rust that could be run anywhere, not just Rust

That being said, writing in Rust instead of C has many drawbacks (slightly slower, cross compiling, larger binary sizes, WASM is more difficult, statically compiling is complex, etc.). But for cases like this, many SQLite extensions I write in Rust are just light wrappers around extremely high quality Rust crates (like jiff), which makes my life easier and it "good enough"


Why would WASM be any more difficult in Rust than in C?


Because of SQLite. The SQLite official WASM build is a very complicated emscripten build, so if you want to write a SQLite extension for the WASM build, you need to statically compile it into the complex emscripten build process. This is hard to do in general, but especially for Rust

That being said, I find WASM projects that are written entirely in Rust to be pleasant, more pleasant than C WASM projects. But for SQLite extensions specifically, Rust/WASM gets a bit harder


> slightly slower

Because of the FFI overhead?


I think so - I wrote my own Rust FFI bindings for SQLite extensions, and I tried to make it as fast as possible, but a "hello world" extension was still 10-15% slower in Rust than C[0].

That being said, it depends what the extension does - a "hello world" extension mainly just calls the same SQLite C APIs over and over again, so the small Rust layer makes it a bit slower. However, my Rust extensions for regex[1] and CSV parsing[2] are usually faster than the C counterparts, mostly due to less memory allocations and batching. It's not a 1:1 comparison (both extensions have slightly different APIs and features), but I'd say a lot of "real world" features available in Rust can be faster than what's available in C.

That being said, I'm sure someone could write their own faster CSV or regex extension in C that is faster than the Rust ones. But it's a ton of work to do that from scratch, and I find wrapping a pre-existing Rust implementation to be much easier

[0] https://github.com/asg017/sqlite-loadable-rs?tab=readme-ov-f... [1] https://github.com/asg017/sqlite-regex [2] https://github.com/asg017/sqlite-xsv


First, Rust compiled is not wt that performant. Well behind c, go, c++,...

But also because SQLite is based on crazily optimized C!


Internally vectors (at least float vectors) get converted to float * arrays, where the vectors lengths are manually checked. If you provide a vector in JSON form, then it parses that JSON into a float * array. Vectors must be the same length for distance operations.


I plan to have v0.1.0 in about a month or so! Definitely will include a ton of docs and a quickstart guide. There's an undocumented pip package "sqlite-vec" that you might be able to use now, if you wanted to call it from your Python "Agent Backend" directly.


Only public SQLite APIs! So no need to append to amalgamation.

I'm a big fan of your wazero SQLite bindings! I actually plan on providing 1) CGO bindings to sqlite-vec and 2) a custom WASI build sqlite-vec that can be used in go-sqlite3 directly. My plan was to build a sqlite3.wasm file using the build scripts in your repo. If you did want to support it in your project directly, I think you could just drop the sqlite-vec.c/h file in go-sqlite3/sqlite3 and be good to go

Re incremental Blob I/O: I learned that the hard way! It's definitely the limiting factor on query speed for sqlite-vec. I've found that keeping the chunks relatively low in size (low MB's) and increasing the page_size strikes a good balance, but page_size in particular has consequences. PRAGMA mmap_size also helps out a ton, since it seems to keep pages in memory and makes overflow lookups faster, but that of course means a ton more memory usage. It's a difficult balance!


OK, when you feel it's time, feel free to ping me on my repo, and I'll look into it.

A custom Wasm blob definitely works: it's an explicit design goal of my bindings that one can bring their own Wasm (e.g. because one wants to configure the build differently, or bought the SEE extension, or something). And if your extension turns out much like FTS5 that would work.

Still, "one big Wasm blob" is less flexible than I'd like, because all connections in an app (currently? maybe this could be lifted) need to use the same Wasm blob. Then (and if you want to import different extensions...) it becomes a fight over who got to initialize the global variable last.

So, I've been on-and-off looking into what it would take to "dynamically link" a C extension as a second Wasm, but... I'm not even sure it's possible.


Ya dynamically linking extensions in WASM would be amazing, but really hard to see how it would work. DuckDB was able to figure it out[0], but I struggle to see how it would work for SQLite. Though if anything, I think your library would make it easier to solve, since you don't have to fight with the browser/JavaScript

[0] https://duckdb.org/2023/12/18/duckdb-extensions-in-wasm.html


Thanks for the kinds words! Really cool to hear that people were able to use sqlite-vss, lol.


As of now for this beta release not really, but after v0.1.0 is released, will definitely have contributing guides + issues that people could tackle!


Oooh I haven't thought of seperating the SQLite stuff and the vector search stuff. Good idea, will keep that in mind!


Yes it is! Sorry for not following up there. Actually, when I first read that ticket, it started me down the rabbit-hole of "how can I make sqlite-vss" better, which eventually turned into "I should make sqlite-vec." So thanks for helping me go down this path!

With sqlite-vec's builtin binary quantization, you should be able to do something like:

  CREATE VIRTUAL TABLE vec_files USING vec0 (
    contents_embedding bit[1536]
  );

  INSERT INTO vec_files(rowid,contents_embedding) 
    VALUES (
      (1, vec_quantize_binary( /* 1536-dimension float vector here*/))
    )


Hey, you’re welcome! Looking forward to trying this at some point. Sooner if rust bindings are available, later if I have to contribute them.

My embedding is already binary, presumably I can bind a blob instead of the call to vec_quantize_binary?


Yup, you'll be able to bind blobs, you'll currently have to do use vec_bit(?) which is a bit awkward, but it works!


It could! It's based on the official SQLite WASM build, so you can use the same persistent options[0] that are offered there. Not sure if IndexedDB is specifically supported, but localStorage/OPFS VFS is available.

[0] https://sqlite.org/wasm/doc/trunk/persistence.md#kvvfs


OP is correct, the official WASM SQLite build is a sync only build and doesn't support async VFSs (which a IndexedDB VFS needs to be as it's an async api, unless you load the whole file into memory).

The best option for IndexedDB backed WASM SQLite is wa-sqlite, it offered both a sync and async build and a bunch of different VFSs, including an IndexedDB one. Note however that the async builds add significant overhead and reduce performance.

The most performant VFS is the "OPFS access handle pool" VFS that Roy developed for wa-SQLite and the official build also adopted as an option. That would be my recommendation for now.


I definitely plan to! I have a much larger list of SQLite extensions I've built here: https://github.com/asg017/sqlite-ecosystem

Here's a few other references you may enjoy if you wanna learn more about SQLite extensions:

- The single source file for sqlite-vec: https://github.com/asg017/sqlite-vec/blob/main/sqlite-vec.c

- sqlean, a project from Anton Zhiyanov which is good base of great SQLite extensions: https://github.com/nalgeon/sqlean

- The official SQLite docs: https://www.sqlite.org/loadext.html

- The "hello world" SQLite extension example: https://www.sqlite.org/src/file/ext/misc/rot13.c


Awesome! Thanks


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: