Using Askgit – A SQL interface to your Git repository

andreypopp · on Aug 16, 2020

What would be nice is to add to this a way to plug language specific semantic analysers. Then it could be possible to do queries like "which commits did changes to a specific function" and so on.

JulianWasTaken · on Aug 16, 2020

That's in some sense already native git functionality.

(Though I recognize that the entire askgit here is essentially a layer on top of some existing functionality).

But changes to a specific function is a particularly underused feature in git.

You do `git log -L :nameOfFunction:fileItLives.ext` and git will show you changes to the function over time.

Of course that syntax is completely arcane (which contributes to its underuse), but e.g. for Python I just wrote a trivial thing to let me instead write `git pylog foo.bar.baz` and it translates to the right syntax above (that lives here: https://github.com/Julian/dotfiles/blob/76bd63f6c9a2c650c185...)

mattigames · on Aug 16, 2020

I have always wondered why GitHub doesn't add such functionality, its a no-brainer how much it would help to have a way to right-click a function and see how has it changed through history, it could use an interface where you scroll through time (eg the most you scroll the older versions you see, until you reach when it was created); same thing for files.

sigstoat · on Aug 16, 2020

maybe some day after their search develops even the faintest idea of what programming language lexemes look like.

oefrha · on Aug 16, 2020

Presumably you’ve heard about git-blame, which is available on GitHub.

mattigames · on Aug 16, 2020

Yeah, I find it of little use due being syntax agnostic (e.g can't tell me about the changes on a function because it doesn't know what a function is). If you are taking about me asking the same for "files" I meant being able to see all the versions of a file in a single page (maybe using "infinite scroll" in case there are too many)

patrickdevivo · on Aug 17, 2020

I'm the creator of AskGit - and this is something I've been thinking a bit about, unsure of the best way to implement and very open to ideas! Adding language specific analysis (this line is a comment, this is a function name, etc) is really interesting to me and I think could really kick this tool up a notch (there are still some more fundamental needs I'd like to address though, such as diffs and blames, query performance, etc).

I've been considering leveraging the syntax highlighting implementation in editors to make this possible (rather than something as heavy handed as parsing a "universal" AST, though I know there are projects to do this)...

taspeotis · on Aug 16, 2020

Visual Studio has a limited version of what you want: if you start with a function it can show you the revision history.

https://docs.microsoft.com/en-us/visualstudio/ide/find-code-...

I think Rider has it as well.

cube2222 · on Aug 16, 2020

Yep, jetbrains IDE's can show git history for a code selection. I've been finding it immensely useful.

gigatexal · on Aug 16, 2020

I’m an absolute fanboy of all the efforts to bolt on SQL interfaces to things. I love this. Kudos to the creator.

cube2222 · on Aug 16, 2020

If you like this, check out OctoSQL[0], it bolts on SQL on json, CSV, Excel and Parquet files.

It also let's you join them with each other (and other databases).

[0]:https://github.com/cube2222/octosql

sirodoht · on Aug 16, 2020

There is also xsv [0] for CSV SQLing

[0] https://github.com/BurntSushi/xsv

gigatexal · on Aug 16, 2020

For this and the above +1 to you both and thanks.

eitland · on Aug 16, 2020

You are maybe aware of this already then but someone here might like to know that the Fossil DVCS is built on top of SQLite, by the same author (edit as pointed out by Gaelan:) as SQLite.

Gaelan · on Aug 16, 2020

To be clear, the same author as SQLite, not Askgit.

eitland · on Aug 16, 2020

Thanks for pointing out, I thought it was obvious from the context but I'm glad someone tells me when it is not.

gigatexal · on Aug 16, 2020

I did not know that. Thank you!

tyingq · on Aug 16, 2020

Sqlite has an api for "virtual tables" that provides a fairly easy way to bolt SQL on top of things. https://www.sqlite.org/vtab.html

I haven't used it directly, but I did use a Perl module that hooks into it, and that made it very easy to cobble together. https://metacpan.org/pod/release/SALVA/SQLite-VirtualTable-0...

thechao · on Aug 16, 2020

The VTAB extension is new. The documentation is a bit thin compared to the rest of SQLite. (To be clear: the documentation is good; just not fantastically awesome like the rest of SQLite.)

The main Achille's Heel of VTAB is wrapping your head around the indexing API: the basic idea is you need to tell the query planner which of a set of columns is best to work with. It is exceedingly unintuitive. This is compounded by the problem that the planner essentially refuses to do binary-search queries on more than one column of a VTAB. (Hell -- getting it to use any binary-search on any index is nearly impossible.) The result is that unless you're in a real bind, you should always translate your custom VTAB data into a reified table.

This is all compounded by the fact that the slightest misstep in the callback API will corrupt/confuse/lockup SQLite (the in-memory representation -- your data is still safe). I'm glad VTAB exists, and I know it's just a side-effect of the compression support, but goddamn it's frustrating for it to be almost be awesome...

gigatexal · on Aug 17, 2020

Bummer. Perhaps over time these pain points will be remedied.

shoo · on Aug 16, 2020

It'd be interesting to see some examples of queries that operate on the graph structure of the repo.

mercurial offers a language where you can build expressions to select revsets:

E.g.

> Changesets mentioning "bug" or "issue" that are not in a tagged release:

hg log -r "(keyword(bug) or keyword(issue)) and not ancestors(tag())"

https://hg.mozilla.org/mozilla-central/help/revsets

bravura · on Aug 16, 2020

What are good APIs to git, besides git itself? Something high level, ergonomic, and with write operations, not just querying?

gru · on Aug 16, 2020

libgit2[0] has a comprehensive API and bindings for dozens of languages.

I also like go-git[1] - a pure Go implementation with idiomatic Go API and cool features like in-memory filesystem.

[0] https://libgit2.org/

[1] https://github.com/go-git/go-git

danielbigham · on Aug 16, 2020

This is really nice. I experimented converting natural language to git log commands this week (https://twitter.com/danielbigham/status/1294461750251839489), but mapping natural language to git sql might be better and more flexible in some cases.

burstmode · on Aug 16, 2020

If adding a SQL interface makes a system easier to access, that says a lot about how convoluted the original interface is...

Next logical step : a kernel level SQL interpreter integrated in systemd.

jarym · on Aug 16, 2020

Actually I think it says a lot about how expressive SQL is.

I get why a lot of developers take swipes at SQL - it is a bit different to other languages. But you cannot beat it for what it does.

Another ‘great’ language is XSLT - so far I’ve seen nothing else that comes close for transforming data. It’s just a shame it’s so closely tied to XML which has understandably fallen out of favour

rco8786 · on Aug 16, 2020

I’m not sure there’s much truth to your statement. There’s nothing inherently complex about git’s internal data structure nor does putting a SQL interface on something indicate a level of complexity.

teget · on Aug 16, 2020

As arankine noted there is the platform independent osquery. There is also SQL for WMI [1], which predates osquery, I believe.

[1]: https://docs.microsoft.com/en-us/windows/win32/wmisdk/queryi...

arankine · on Aug 16, 2020

There is OSQuery (https://osquery.io/)!

cocktailpeanuts · on Aug 16, 2020

Most people only need to know git commit, git push, git pull, and just a handful of commands.

It is by design that something like this is not included into the original interface. It would be. terrible design.