Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Database Inside Your Codebase (feifan.blog)
145 points by todsacerdoti on Feb 16, 2021 | hide | past | favorite | 84 comments


Check out Aquameta, a web dev stack built entirely in PostgreSQL (self-plug): http://github.com/aquametalabs/aquameta

There's a ton down this rabbit hole. One of the great anomalies of our industry is that we've used the database to bring coherence to countless "user domains", but never applied the same principles to our own stack. The benefits of doing so compound exponentially.


So basically Oracle Apex, oracle html db before that, and end of the 90s we were generating this from Oracle Designer 2000: low-code tools to generate web applications from the database...


When I hear "low code" I get a little skeptical. Often such tools make the first 80% of the project simpler but the last 20% a bear, unless you live with clunky defaults. There's a difference between managing code better and removing code. Generally I find the best path to "low code" is to write small simplification wrappers that fit your shop's conventions, because a big vendor probably won't fit your shop's conventions out of the box.

Thus, you can code like "currentForm.AddButton("clickMe", destination: screenY); and all the styling etc. is done by your shop's wrapper to fit your shop's preference. The wrapper won't fit all needs, but if fits 90% of buttons, then you only have to specialize 10% of them. I don't know why people tolerate copy and paste of verbose snippets for such. Wrap the repetitive clutter away to make it easier to grok your primary work.

I like optional named parameters such that customization is incremimental:

currentForm.AddButton("clickMe", destination: screenY, color: "green");

currentForm.AddButton("clickMe", destination: screenY, style: "compact", animation: "images/wiggle7.gif");

It's kind of like query-by-example: only specify the constraints you need instead instead of every potential attribute under the sun.


I worked on Oracle Developer/Designer in the 90s. Was pretty good. I also went to the Oracle course at Reading for MOD PL/SQL which Htmldb and Apex were based on. There is still quite a bit of Apex work around, and all the oracle "applications" of course as well.


This is wild and ridiculous and I am genuinely excited to try it in a project.



Whoa, this project looks really cool. I'm trying to read the docs though, and there's a lot of broken links — e.g. all the "Backend Documentation" links here: https://github.com/aquametalabs/aquameta/tree/master/docs


Yes, sorry. It's in a state of major upheaval. I'll get it cleaned up ASAP. Nice to see there's actually some interest in this direction from others!


Oh the Eric Hanson I guess? I remeber watching an interview with you where you said something to the effect "back then I had time and money on my hands so I went and tried to build this thing. Four years later, and none of that is true anymore". That stuck with me.


Lucid put C++ into a database back in the day. Didn't catch on, sadly.


Dudes and dudettes let me tell you about this sweet technology called Oracle Forms.


I've been playing with some ideas for creating a SQLite database of classes, functions and suchlike found in Python code, so I can analyze my codebases with SQL queries.

I've had some good initial results with https://github.com/davidhalter/jedi - which is the Python introspection library that powers various editor autocomplete implementations. I have a prototype which uses that to create a SQL database of functions, classes and places that they are used.

I've also been playing with https://github.com/github/semantic - it can parse Python, JavaScript and other languages and offers a --json-symbols option which dumps out a JSON object showing the symbols (functions, variables etc) found in the code.


Oh this sounds cool! When I was writing this post I was thinking about ASTs and transforming them into domain-specific semantic graphs.

E.g. that `run_config` example would be a generic "method call" node in an AST, but a domain-specific AST crawler could recognize that a "method call node whose name is `run_config`" should be replaced with a semantically-meaningful `run_config` node.

And maybe that would be an interesting way to build up a conceptual graph of a codebase?


Although I understand what we're trying to achieve here is easier code navigation and manipulation on a much broader scale with preferably a single tool, I couldn't shake the feeling that some of the examples you gave have existing solutions.

The `run_config` one stood out the most. Isn't it perfectly feasible to have a `jobs` and a related `run_configs` table, instead of them only being objects? You could even put them in a separate database, much like one does with the auditing tables in a Rails application, or with reporting tables that need to be accessible by the likes of PowerBI. That separate database could have it's own light-weight GUI, something that just plugs your data structure into the interface, like Django's admin back-end.


Sure, but then you'd have the "config" for a job separate from its implementation. If I'm looking at the code for a job, how do I easily find when it's scheduled to run? If I'm looking at the config database, how do I easily get to the implementation of the job?

Putting aside the feasibility of migrating a large codebase that exists one way or the other, my point here is that these shouldn't be separate tools.


I'm with you on this one. I'd actually suggest https://github.com/CoatiSoftware/Sourcetrail could be extended to do this, though I haven't found the time yet for my own codebases. For example https://github.com/CoatiSoftware/SourcetrailPythonIndexer and under the hood the file format is SQLite: https://github.com/CoatiSoftware/SourcetrailDB


do you have a repo? this sounds cool


I think you could drop the first "base" and just say "the data inside your codebase"

Generally speaking if declarative data can be factored out of part of my code, I try and do that (in practice I'm usually doing this in JS, which is well-suited to that style). This makes it less opinionated about when and where it gets executed, or even how. It is simply information. It also allows for some neat reflective stuff as the author describes.

Incidentally this has always been the calling card of Lisp: "code as data". I don't love Lisp as a whole, personally, but this is a powerful benefit of it


Really interesting piece. I like how it discusses various seemingly disparate issues and ties them together.

> In our codebase, we have hundreds of these jobs, each with their own run_config. If you look at this codebase as a database, you can imagine these jobs being rows in a jobs table, where the fields on each row correspond to the parameters provided to the RunConfig class.

> Wouldn’t it be great to be able to browse all these jobs in a table?

I know the article suggests a database / table, and that could work but I feel like there are a suite of things you'd want on top of that that aren't given for free -- JSON validation, history, libraries with caching, namespaces, etc.

For this exact subproblem, I created www.config.ly .


I had this realisation too, after spending countless hours searching the code bases with regexes to try to extract valuable information.

Right now we have ide's that provide rich language support but they only expose extremely basic traversals over language symbols, for example I can't semantically query for all the deprecated classes in a particular package that have 3 or more instances.

Right now we have to resort to grep and other text finding tool's to achieve the same functionality which is for all reasons not efficient or reliable.

Once we can semantically query our code bases using a rich expressive language then we could visualise the system and gain richer understanding.

Some of these idea's are being implemented in https://gtoolkit.com/, it's a fantastic tool built on one simple yet powerful idea that each object can be represented in multiple forms not just syntactic code.


This is one succinct summary of what Glamorous Toolkit is!


Great post.

I see two big challenges.

1. How do you do this without changing the way we program? The most amount of value I would get from this kind of tool is being able to spelunk in a huge codebase with lots of legacy gunk. The cost of entry for this cannot be "rewrite all the codes."

2. What does the query language look like and how do you expose this to the user? This is especially challenging because the query language needs to serve so many different roles.

I'm excited to see what comes of this as I see this problem as a blocker for the next generation of software development. We've more or less solved infrastructure, yet taming complexity still eludes us.

Shameless plug, with a bit of fortuitous timing:

I just released an alpha CLI for my project SourceScape.

It's a tool that indexes your Typescript and Javascript code (Ruby coming soon.) You can then query your code by code structure instead of just raw text. As a trivial example, you can search for all classes with a render method that returns a jsx element with name div.

Install instructions here: https://github.com/sourcescapeio/sourcescape-cli#install

Marketing materials (a bit outdated, but looks pretty): https://sourcescape.io


Re: "How do you do this without changing the way we program?"

Let's change the way we program. If we outgrow trees, we've outgrown trees. I don't know if existing code will be easy to convert. It's kind of like when OOP IDE's fell out of favor for web stacks: the old OOP classes couldn't be reshaped for web because the web is state-poor and OOP is state-rich. We had to dump OOP libraries and stacks into the trash.

Re: "What does the query language look like and how do you expose this to the user? This is especially challenging because the query language needs to serve so many different roles."

I suggest using existing RDBMS and SQL. There's no reason to reinvent the wheel unless inherent flaws can be found in RDBMS for code management purposes. I haven't found any yet in my little experiments in "table oriented programming".

I take that back a bit. Different UI components/widgets will have different attribute structures. It's hard to change the schema every time a new widget type is added. Possible solutions include Dynamic Relational, or a Windows-Registry-like "attribute tree" for UI models. Attributes of UI widgets then could be accessed like:

     setAttrib("screenX.widgetY.maxHeight", 37);
     // context-based shortcut:
     setAttrib("currentWidget.maxHeight", 37);
The second form is handy because instead of writing explicit loops, event triggers may be used traverse and customize per-widget behavior. The traversal mechanism will create a reference to the "current" item per behind-the-scenes looping to simplify path references. More samples:

     // sequence control (display order):
     setAttrib("screenX.widgetY.sequence", 47.5);
     // deactivate:
     setAttrib("screenX.widgetY.active", false);
     // alternative:
     setAttrib("screenX.widgetY.status", "inactive");
     // more uses of "status" attrib:
     setAttrib("screenX.widgetY.status", "hidden");
     // Grid samples:
     setAttrib("screenX.grid4.row.7.col.9.value", "foo");
     setAttrib("screenX.grid4.row.7.backGroundColor", "green");
Helper API's would probably simplify grid path management.


I'm all for a rewrite of existing systems, but I think that's going to be a long-term thing (5-10 years) that will happen in parallel to developments in treating code as a database.

I've found RDBMS is quite limited in terms of graph traversal. Maybe you've found a way around this? Would be happy to riff on that.


C# and Visual Studio just does the whole 'Where is this used?' thing out of the box.

And one of the intentions of the Roslyn compiler was to open up code analysis to the IDE even more, I believe.


I am really glad they mentioned Smalltalk! I recently had a chance to experience Smalltalk and it was just mind blowing. Check out a video demo of the code introspection and visualization at minute 17 in https://m.youtube.com/watch?v=baxtyeFVn3w


That was a nice demo, indeed. If you liked that one, you might want to take a look at gtoolkit.com, too.


It's really a shame the database is not at the base of what we're doing. We're inventing and re-inventing tons of file formats and ways to persist data without using the database, we have to eternally worry about race conditions in the file system and stuff, and all the while most of our concerns would be well served by an RDBMS. To be sure we still need file formats and persisting data to files so we can conceptually, read, understand, and pass around our data but when we work on it, when we deal with data, we should most of the time do that with a database. Except we don't. Just look at the mess that querying installation data from APT is. Startup time, functionality and search time of the synaptic GUI are abysmal.


At r2c, we've been using our own https://semgrep.dev/ to search across our repositories and take inventory of certain things. For instance, we have a Semgrep rule that finds and extracts API routes and whether they require authentication.

See page 27 for a visualization (PDF warning): https://owasp.org/www-chapter-dorset/assets/presentations/20...


This brings back thoughts I've had that we should be working with normalized data in code, rather than thick objects. I think the main reason we don't is because there's a lack of tooling around it in our languages. I think a system/language/library designed around this could solve some of the problems in the article as well. First class support for having only one value for a given domain specific id, relations, declaratively describing constraints, and strong querying seem like they would be very helpful to lots of programming problems in complex apps.


For a while I've been saying our stacks are outgrowing hierarchical file systems. Hierarchies force one trait to have more power than another, even though each may be very important at different times. There may still be inherent hierarchical patterns to the code, but that doesn't have to be our only view.

Further, during compilation a file hierarchy may still be needed, but the IDE doesn't necessarily have to force that view during the normal course of dev work. It's mostly to make the compiler happy, not the developer.

Related ideas:

https://www.reddit.com/r/AskProgramming/comments/j6uivx/crud...

I'm glad to see experiments along that line. I doubt anybody will get it right the first time, but technology often requires experimentation to make it practical. I just know that hierarchies are too constraining for non-trivial code bases.

Typical CRUD stacks can be viewed as a big pile of event handlers. How all these events are listed, sorted, searched, filtered, etc. should be based on RDBMS or RDBMS-like tools to give as-needed views via queries and query-like interfaces, such as Query-By-Example.


It feels like DB technology is catching up to the cloud era. Back in the 2000s it was common for entire products to effectively be written as stored procedures, which carried a lot of benefits when it came to data consistency and performance. I'd bet good money that if Oracle didn't charge a C round, and it was easy to scale out RDBMSs with HA in the cloud that this era wouldn't have ended.

Now that we have managed, fast distributed SQL DBs, with high availability - I wouldn't be surprised to see companies moving more logic back to the DB.


Good piece, tho not new ideas. Ton of references, two recent that immediately come to mind

https://github.com/src-d/guide https://github.com/Datomic/codeq

But work here goes back at least to the 1970s.


The future's already here, just not evenly distributed :) I wrote this in part to surface these ideas to new people!


I'm pretty sure Eclipse or IntelliJ build a meaningful index of my code base... What about that ?


They do but those aren't made available to you like a database, just through their search tool and indirectly through their contextual lookups.

See also tools that build databases from code history, like

https://www.adamtornhill.com/articles/crimescene/codeascrime...


I can barely read with that font colour..


I found it perfectly readable and checked with Firefox' accessibility tools: The contrast of the main text is a 12.19 and thus meets the WCAG AAA standards for accessible text. [1] is the documentation Firefox links to.

[1] https://developer.mozilla.org/en-US/docs/Web/Accessibility/U...


The page has a different (lighter) text colour for dark mode but the background colour doesn't change on dark mode, making it nearly unreadable.

The Firefox tools tell me it has a contrast of 1.43


Yep, that's it. Turns out, Safari/Chrome change the page background automatically based on your OS setting. FF seems not to.


I find this response hilarious, because you jumped to a standard to make your argument without considering that what you are seeing may not match what others are seeing.

The page is super hard to read with the white background it has in Firefox, but fine with the dark background it has in Chrome.

I.e., the problem is that the website sometimes renders with a white background and very light pink text.


> the dark background it has in Chrome

...eh? https://i.imgur.com/Qwn7A4f.png


Maybe it's random? I get the dark in Chrome and white in Firefox.


Oh, word? Mine's at 1.43. https://imgur.com/a/AVQxh20


Contrast isn't the only indicator of readability. The font choice here is IMHO terrible for screen. It's much too thin.

Taking this to the logical extreme - just for fun - imagine a the perfect contrast score of Vantablack™ on Spectralon white. It wouldn't matter if the typeface stroke widths were only a micron wide. It would still be unreadable.


This is why metrics versus real world measurement are at odds. Sure, measure the luminance all you want; sure ignore the fact that color spaces and visual impairment aren't evenly distributed. But don't tell someone they can see something they can't.


I was not telling anyone that they must be able to see this properly. I was saying that I was able to see this properly and I was saying that according to an established standard it should be as readable as it gets.

Sure, the standard might be lacking, but then the standard should be fixed. As a website author you need something to refer to when you don't experience these issues yourself.


> Sure, the standard might be lacking, but then the standard should be fixed. As a website author you need something to refer to when you don't experience these issues yourself.

The problem is that it’s the wrong standard that’s lacking. WCAG does the best it can with the web specs, but it can’t just impose a perceptual color space on HTML/CSS, and it would be impossible for mere mortals to evaluate a perceptual contrast standard on RGB/HSL.

But you can easily find A11y articles pointing out where the contrast ratios which pass or fail are very flawed for even people with good vision.


Just because you find it readable does not mean others can; also, color contrast is not the only thing that affects readability. In this case, the issue is font weight.

Here's some discussion on the issue: https://github.com/w3c/wcag/issues/665


Hey, author here — sorry about that. I don't use/didn't test with Firefox, but it looks like it's not setting the page background properly in dark mode. I'll get that fixed!


Consider also testing "-webkit-font-smoothing: antialiased;" on non-retina screens. When removed text became somewhat thicker and significantly more readable.


I'll try that tonight! Any suggestions for testing/seeing how the anti-aliasing works when I don't physically have non-retina screens?


Done — I think antialiasing looks better on Retina screens in dark mode so I kept it only for that case (removed it for non-dark mode or non-Retina).


It's just the right amount of contrast to allow you to read a word or two easily, but it's difficult to scan the article. I used Dark Reader and tweaked the contrast, and made it much easier to read.


Same. Reading mode in Firefox was a quick fix


I think I fixed for Firefox — let me know if you're still seeing 1.4 contrast/super-light text.


I found it hard to read too.


This seems like a GNU Global use case https://www.gnu.org/software/global/


Ah, BlueJ...takes me back to Intro to CS where an entire class was spent on getting that program alone to run on everyone's computer.


Somewhat related: https://www.unisonweb.org/



Yep, I had a brief chat with one of their engineers a few days ago: https://twitter.com/FeifanZ/status/1361392018514010116


The article shows Blush and the visualization chart it can produce. Is there something like this for JavaScript already?


*BlueJ


I remember learning AspectJ, and finally deciding it was a query language for your code.


Ugh, the site is giving me an eye sore. Bad font or bad color choice?


Geez, have you heard of a contrast checker before?

https://webaim.org/resources/contrastchecker/


Geez, have you heard of being nice? https://news.ycombinator.com/item?id=26161659

> author here — sorry about that. I don't use/didn't test with Firefox, but it looks like it's not setting the page background properly in dark mode. I'll get that fixed!


[Speaking only for myself, with poor eyesight, a cognitive hearing disorder, and a variety of sensory sensitivities related to ADHD]

Hey, it's awesome that you were so quick to address the issue. But I think you might want to take this one with a little grace. Putting aside the contrast you intended and missed by mistake, try to understand that accessibility is a pretty sore spot for people who need it. It's often ignored completely, or just jammed into a Lighthouse or similar score/checklist without regard for whether people can actually... access things. When people have difficulty with vision or hearing or any other sense, and when it's commonplace for that to be disregarded in very pronounced ways, it's not uncommon to feel invisible and react that way.


This is totally understandable! Thanks for pointing this out.


Thank you for being receptive! Let’s go make the world more accessible.


Snark isn't the way to go.


That seems like the literal opposite of snark.


Thank you for policing my self advocacy but I don’t need your help.

Edit: oh my god please downvote this. I definitely need more people telling me they don’t approve of me clearly stating what actually helps my sensory limitations and offering nothing constructive at all.



Believe it or not, just because a tool says the contrast is fine doesn't mean that it is. Even assuming a "normal" human range of color spectrum vision, many of these tools measure contrast ratios in color spaces that aren't even representative of that "normal" spectrum. All of the color spaces in native use on the web are not proportional to "normal" perception of color. Of course that gets even more complicated when different vision conditions are introduced.

Don't just quote some automated tool at people who very clearly experience something else.


This seems like a hard job then. If we adjust the site to make it accessible for you, then we might make it less accessible for me.

Maybe that’s why CSS was originally made to be overridden client side, so everyone can make their own choices about font and background.


> If we adjust the site to make it accessible for you, then we might make it less accessible for me.

Yeah, this is certainly possible. A physical manifestation of this is how curb cuts for improving accessibility for wheelchair users on their own reduce accessibility for people with low vision who no longer have a clear demarcation between the sidewalk and the road (where there is automobile traffic).

That's where tactile paving comes in: https://en.wikipedia.org/wiki/Tactile_paving

one name for this general principle is https://en.wikipedia.org/wiki/Universal_design . Users making choices about font and background is a good example.


> This seems like a hard job then. If we adjust the site to make it accessible for you, then we might make it less accessible for me.

I don’t think it works like that. Generally speaking most people with good or limited vision share the luminance spectrum and portions of the color spectrum. Increasing contrast doesn’t hurt anyone (though dark mode users generally prefer slightly lower contrast for lower luminance overall). Increasing type size doesn’t hurt anyone but might make HN users complain about information density.

> Maybe that’s why CSS was originally made to be overridden client side, so everyone can make their own choices about font and background.

It still can be! In fact it’s standardized in web extensions. As an example you can try my HNDarkMode theme on my GitHub (same handle).


> I don’t think it works like that. Generally speaking most people with good or limited vision share the luminance spectrum and portions of the color spectrum. Increasing contrast doesn’t hurt anyone (though dark mode users generally prefer slightly lower contrast for lower luminance overall). Increasing type size doesn’t hurt anyone but might make HN users complain about information density.

I think there's enough human variation that it can be hard to say anything general like this. Setting aside the semantic debate about what is a disability, or what is accessibility, let me take for granted that "attention disorder" is a thing and anything that anything that distracts a reader could hurt. It is hard to determine flat-out which complaints are valid.

I like to watch captioned content. Others find it distracting, or hate having the punchline to a joke come early (usually due to bad captioning!). I am hearing and AFAIK don't have an audio-processing disorder. I don't "need" captions, otherwise it would probably be a clear case of definitely my needs override someone who just doesn't "want" captions. But that's also another semantic debate.

I don't think that should be used as an argument for not trying to improve accessibility in the first place. But it certainly is something to continually improve.


> "attention disorder" is a thing and anything that distracts a reader could hurt.

This is true and I have ADHD so I can speak to it from experience, although not for everyone with ADHD of course. At least for me, motion is far more distracting than contrast as such. The places where contrast may distract me tend to be places where there are already accessibility issues for people with color blindness.

You’re right to say I’ve overgeneralized a bit. But for the vast majority of people with vision suitable enough for them to choose to read, the things I said tend to be true.

> I don't think that should be used as an argument for not trying to improve accessibility in the first place. But it certainly is something to continually improve.

Full agree here!


> Don't just quote some automated tool at people who very clearly experience something else.

I don't think that's a fair response to my comment when it was GP who specifically brought contrast into the discussion and even linked to a contrast checker that also checks two colors against the WCAG guidelines.

I wasn't I claiming that everyone is able to read it. I was just saying that according to the standard they themselves referenced it was fine. The fact that it actually was not fine for certain dark mode configurations is orthogonal to that and was not taken into account by both of us.


I (presumably along with GP) am using Firefox where the background shows up white and the contrast ratio weighs in at 1.43:1. But it looks fine on Chrome.


What's funny, is chrome renders the page worst out of the 3 browsers installed on my machine

https://imgur.com/a/fh0BAzV

E: Edge F: Firefox C: Chrome




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: