Simon already gave a great intro to my SQLite extensions, and I just want to point folks to Anton Zhiyanov's sqlean project, for pure-C extensions: https://github.com/nalgeon/sqlean
Also, some of these extensions are also packaged as Datasette plugins, so you can also run `datasette install datasette-sqlite-regex` or `datasette install datasette-sqlite-ulid` to add these extension to your Datasette instances! https://docs.datasette.io/en/stable/plugins.html
Have you looked into cross-compiling those arm targets using Docker and Github Actions? I'm not familiar with the toolchain you have to use but other projects do that.
I've tried it! For Rust SQLite extensions, I'm able to use some cross compilers for some extra ARM targets on linux[0], I just haven't had the time to do it for all extensions yet. For MacOS arm builds cargo has the `aarch64-apple-darwin` target which works, but is a bit slower than expected at runtime. I've also tried zig for cross compiling C extensions on different platforms, which seemed to work ok but I've heard of some perf issues cropping up with those.
Not yet! Spatialite is a very big and intimidating codebase that's hard to compile. I've been mulling the idea of creating several smaller GIS sqlite extensions, especially since Spatialite development is starting to get stale, but nothing to share for now.
A soon-to-be-released sqlite-loadable extension that's the fasted CSV parser for SQLite, and rivals DuckDB's parser at non-analytical queries: https://github.com/asg017/sqlite-xsv
The API for virtual tables is tricky, because you don't ever consume the API in Rust: SQLite does it. However, as it is presented the interface is unsafe, I think.
In the sqlite-xsv VTab::open impl for XsvTable [0], you pass a &'vtab XsvTable to XsvCursor. But the means that if open is called again, to create a second cursor, then a `&mut XsvTable` and a `&XsvTable` reference exist at the same time, which is unsafe. Note that sqlite-xsv doesn't actually keep the `&XsvTable`, so it's fine, but it serves as an example where a safety problem could surface.
The same problem also applies to VTabWriteable::update. The fix is to make both of these methods receive an immutable reference and force implementors to use interior mutability. Note this isn't hypothetical, it's actually unsafe, and would appear if you had a unit test that implemented a writable virtual table backed by a rust data structure, and attempted to iterate and update at the same time.
I have an as-yet-unpublished code base that experiments with this. It's not published yet because it doesn't cover everything I want to and I'm trying to avoid publishing something where the API might have to break. My goals are explicitly safety first, performance second. https://github.com/CGamesPlay/sqlite3_ext
Please consider GIS/spatial extension. SpatiaLite is pretty bad, for example the KNN feature is deprecated and the replacement has not yet made it to a release yet.
Yeah, "dont make network requests from the database" is a common antipattern, but it doesn't quite apply with sqlite-http. It's mainly meant for data processing scripts that use SQL, which is typically on a local database file with pretty low risk. You probably wouldn't want/need sqlite-http on a SQLite db that a live application uses, unless you use the no network[0] option
re pipelining: unless Go's network package does this magically under the hood, then sqlite-http doesn't take advantage of that! If you think that would be a nice feature, file an issue and I'll see what can be done
How complex is your text processing? If I'm working with a small dataset, I typically just save the "raw" responses into a table and create several intermediate tables that extracts the data I need. Something like:
create table raw_responses as http_get('https:/....');
create table intermediate1 as select xxxx(response_body) as message from raw_responses;
create table final as select message from intermediate;
(some CTEs[0] will be very useful here)
The "lambda" pattern described about half-way through this post might be useful as well, if you need to do some complex text processing.
Go's HTTP client doesn't support pipelining in HTTP 1.1. It will do keep alive / socket re-use. HTTP 2+ have support for request multiplexing which Go supports. Your code appears to create a new client on each request, so I'm not sure if the underlying transport will be re-used or if any multiplexing will take place. I would be surprised if it is.
On SSRF: There is a "no network"[0] option for the extension, if you don't trust SQL input. This is what the example Datasette server[1] uses
Also to note, this is mainly meant for use with local data processing, I doubt there's a need for sqlite-http in a SQLite DB used in regular applications. If you're just wrangling some data locally with SQLite with these extensions, I doubt there's much of a security risk
Not an extension, but I wanted to add that sqlite-utils is an incredible Python library for working with SQLite dbs. Great for squashing several lines of SQL into a one-liner.
totally not dumb - in general there isn't much upside. Maybe there's a usecase in server-side rendering HTML where you already use SQLite somewhere, or in a CLI tool if you're using sqlite3. But I doubt sqlite-html will be as fast as other tools. The querying functions are much more useful than the generating ones, IMO
Re infering schema of input JSON: That would be slick! Though SQLite does have some limitations here with table-valued functions vs virtual table. I won't go into the specifics, but something like this isn't possible in SQLite:
select name, age from lines_json_read('students.json')
The "name" and "age" dynamic columns aren't possible when using the "table function" syntax, but something like this is possible using traditional "virtual table" syntax:
create virtual table students using lines_json_read(filename="students.json");
select name, age from students;
It's a small difference, but definitely possible! Though parsing JSON in C is tricky, but would definitely accept contributions that figure it out.
And re benchmarks - thanks for sharing! Yeah, they're pretty basic, so would love to add more complex ones. With the books.json example, I think what's happening is that in SQLite's JSON function, it parses the JSON each time in each json_extract function - so it parses twice for each row in that query. I also suspect that the long strings in "reviewText" might slow down, but can't be sure. Once I get some free time I'll add OctoSQL to the benchmark suite and this new books dataset
Hmm, so you have lines that have fields in fixed indexed positions? Maybe something line this:
$ cat hn.txt
11alex a
12brian b
12craig c
$ cat test.sql
.header on
.mode box
.load dist/lines0
select
substr(line, 1, 2) as id,
substr(line, 3, 7) as name,
substr(line, 11, 2) as age
from lines_read('hn.txt')
$ sqlite3x :memory: '.read test.sql'
┌────┬─────────┬─────┐
│ id │ name │ age │
├────┼─────────┼─────┤
│ 11 │ alex │ a │
│ 12 │ brian │ b │
│ 12 │ craig │ c │
└────┴─────────┴─────┘
Unfortunately SQLite's substr function is a bit awkward, but if each field has constant indicies it may work
Hey, author here, happy to answer any questions! Also checkout this notebook for a deeper dive into sqlite-lines, along with a slick WASM demonstration and more thoughts on the codebase itself https://observablehq.com/@asg017/introducing-sqlite-lines
I really dig SQLite, and I believe SQLite extensions will push it to another level. I rarely reach for Pandas or other "traditional" tools and query languages, and instead opt for plain ol' SQLite and other extensions. As a shameless plug, I recently started a blog series on SQLite and related tools and extensions if you want to learn more! Next week I'll be publishing more SQLite extensions for parsing HTML + making HTTP requests https://observablehq.com/@asg017/a-new-sqlite-blog-series
I'm so interested to hear about the HTTP requests stuff.
I've played a tiny bit with SQLite Python extensions that perform HTTP requests, but I'm very unsure of the implications of this kind of thing.
What happens to the rest of the SQLite query engine if your custom function hits a page like https://httpbin.org/delay/10 with a deliberate 10 second delay before the request returns?
Maybe this is fine - your transaction takes 10 seconds longer but everything else continues to work as normal. I've not done the work to figure that out.
Yup, if it hits https://httpbin.org/delay/10 then the entire script will delay for 10 seconds (unless you have a custom lower timeout setting). Which means you can only make HTTP requests in serial requests, which sucks, but much simpler than coordinating with other tools
It'll come out in a week, but lmk if you want to play with it beforehand!
I think you're right, tho not an expert with multiple connections. I think if the DB is in WAL mode and you're only issuing read-only requests, then there will be no lock and long HTTP requests won't lock anything up. But if you issue a write-request that locks tables, then other write-requests will be blocked with long requests.
But if you're in journal mode then there's not much you can do. Please correct me if I'm wrong!
How difficult do you think it would be compile SpatiaLite (a big hairy C codebase) to WASM and load it as a dynamic module, like you've done with sqlite-lines?
Simon already gave a great intro to my SQLite extensions, and I just want to point folks to Anton Zhiyanov's sqlean project, for pure-C extensions: https://github.com/nalgeon/sqlean
Also, some of these extensions are also packaged as Datasette plugins, so you can also run `datasette install datasette-sqlite-regex` or `datasette install datasette-sqlite-ulid` to add these extension to your Datasette instances! https://docs.datasette.io/en/stable/plugins.html