Hey! OctoSQL[0] author here, this is really impressive! I like this much more th...

alexgarcia-xyz · on July 30, 2022

Hey thanks for sharing!

Re infering schema of input JSON: That would be slick! Though SQLite does have some limitations here with table-valued functions vs virtual table. I won't go into the specifics, but something like this isn't possible in SQLite:

  select name, age from lines_json_read('students.json')

The "name" and "age" dynamic columns aren't possible when using the "table function" syntax, but something like this is possible using traditional "virtual table" syntax:

  create virtual table students using lines_json_read(filename="students.json");
  
  select name, age from students;

It's a small difference, but definitely possible! Though parsing JSON in C is tricky, but would definitely accept contributions that figure it out.

And re benchmarks - thanks for sharing! Yeah, they're pretty basic, so would love to add more complex ones. With the books.json example, I think what's happening is that in SQLite's JSON function, it parses the JSON each time in each json_extract function - so it parses twice for each row in that query. I also suspect that the long strings in "reviewText" might slow down, but can't be sure. Once I get some free time I'll add OctoSQL to the benchmark suite and this new books dataset

tstack · on July 30, 2022

Another option would be to create the virtual table used for the table-valued function using a schema to define the columns:

    CREATE VIRTUAL TABLE students_json_read USING lines_json_read(schema="students-schema.json);

Then, that one table could be used for multiple files:

    SELECT * FROM students_json_read('school1-students.json');
    SELECT * FROM students_json_read('school2-students.json');