Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is the central problem with all programming today.

You can think of everything as configuration. Parameters of a function are just configuration for that function.

What we need is traceability of every value in a system, not just configuration. You click on any value and you see a graph/chain of all the transformations this data goes through right back to the source of truth. This is how we should code. Think about how difficult this is to do today with your current IDE/editor.

Every task in software is wanting to change what is displayed in a user interface of some sort. Usually you have to make your way through a labyrinth to figure out what to change where, and also what side-effects will occur.

If this is not smack bang in front of your face when you are coding, then you are just adding more and more tangled data flows and complexity to your system.



Truer words have never been spoken.

Microsoft IntelliTest (formerly Pex) [1] is internally using Z3 constraint solver that traces program data and control flow graph well enough to be able to generate desired values to reach given statement of code. It can even run the program to figure out runtime values. Hence the technique is called Dynamic Symbolic Execution [3]. We have technology for this, just not yet applied correctly.

I would also like to be able to point at any function in my IDE and ask it:

- "Can you show me the usual runtime input/output pairs for this function?"

- "Can you show me the preconditions and postconditions this function obeys?"

There is plenty of research prototypes doing it (Whyline [4], Daikon [5], ...) but sadly, not a tool usable for the daily grind.

[1] https://learn.microsoft.com/en-us/visualstudio/test/intellit...

[2] https://link.springer.com/chapter/10.1007/978-3-642-38916-0_...

[3] https://queue.acm.org/detail.cfm?id=2094081

[4] https://www.cs.cmu.edu/~NatProg/whyline.html

[5] https://plse.cs.washington.edu/daikon/


I suspect this is an area that is part of the essential complexity [1] of programming. Reachability, variable liveness, pointer aliasing, etc. are undecidable problems for arbitrary programs. Also, the control flow graph of most non-trivial programs is probably too complex to visualize in a usable way.

However, there is still a lot of room for improvement in compilers and debuggers. If I could tell the compiler to instrument my code to track the values that I care about, and then have the debugger visualize the flow of information through the system, that would be very useful. It wouldn't guarantee that the data flow analysis applies to subsequent runs of the program, but it could reduce the amount of time spent reading and debugging code.

[1] https://en.wikipedia.org/wiki/No_Silver_Bullet


Something that gives me hope is that I haven't seen anyone prioritize debuggability in a modern programming language or even a framework. Vale is maybe the first getting close: https://verdagon.dev/blog/perfect-replayability-prototyped.

I think there is a new era yet to come.

> probably too complex to visualize in a usable way

You can always abstract the complexity. A simplification of the control flow graph at the function level.

As an example, I want to understand the Postgres codebase better...there are a lot of very visualizable things going on there. If you look at lecture slides on relational DBMS, there are lots of diagrams...so why can't be view those animating diagrams as the code executes.

I was thinking of instrumenting the code to get a trace of every function call and its arguments, and then filter that and visualize it.

The key to this approach working though is a think ultimately we need to write our code and organize our code such that it can be easily visualizable. The purpose of code is for humans to understand it, otherwise we are writing assembly.


> I haven't seen anyone prioritize debuggability

Static-checking tools and strongly-typed languages: "Are we a joke to you?"

Finding and removing bugs before the code runs is still debugging.


> This is the central problem with all programming today.

Sweeping generalizations rarely age well.

> You can think of everything as configuration. Parameters of a function are just configuration for that function.

No, parameters of a function are contextual collaborators which participate in a function implementation. Configuration canonically influences a program aside from its individual run-time interactions.

Is a value given to the cosine function a configuration? Words matter.

> Every task in software is wanting to change what is displayed in a user interface of some sort.

This is a myopic perspective. Software processes data into information which provides value in context. This may never be "displayed in a user interface."

To wit, the network gear involved in my posting this reply most certainly qualifies as a "task in software." So, too, is the software running in the power companies involved. It would be quite a stretch to categorize them as "wanting to change what is displayed in a user interface."


This would be sooooo nice, especially in DI heavy languages like Java. It might not be turtles all the way down, but finding the final turtle can be so difficult.


It drives me nuts when systems are unnecessarily built with "magic DI" frameworks rather than initializer/constructor DI. Why is it so hard to pass a dep into the initializer? Make the initializer parameter a protocol/interface and you can pass a mock in at testing. And you can also grep the whole codebase for callers and find the exact concrete type passed in at runtime.


IMO, automatic DI frameworks were a big mistake. An idea that looks useful at the time, but actually leads to spaghetti mess code that is harder to maintain than it would be without any automatic DI.


I agree, though I'd say in the more general case that overly flexible DI frameworks are the broader problem.

Designers will put much more thought into how best to interact with a framework, so start with a small, opinionated interface then consider extension feature requests on a case by case basis


Agreed.


Eh, it's not like it's much easier to find the last turtle in python. Or C++ when people are extensively using patterns like PointerToImpl*, which can really obfuscate things.


Object orientation kills traceability. Classes are instantiated in one trace, then methods are called in another trace, and if something goes wrong in the latter, it is not trivial to find the wrong settings passed in the former.

I think some functional programming languages solve that by running a good chunk (if not all) of the application in a single trace.


Yes, this is why we love functional programming! "What happened along the way" equals to the call stack, as long as there is no field mutation involved.

And, of course, async/non-blocking calls, as tracing a call along different threads or promises may not be available all the time.


"""The venerable master Qc Na was walking with his student, Anton. Hoping to prompt the master into a discussion, Anton said "Master, I have heard that objects are a very good thing - is this true?" Qc Na looked pityingly at his student and replied, "Foolish pupil - objects are merely a poor man's closures."

Chastised, Anton took his leave from his master and returned to his cell, intent on studying closures. He carefully read the entire "Lambda: The Ultimate..." series of papers and its cousins, and implemented a small Scheme interpreter with a closure-based object system. He learned much, and looked forward to informing his master of his progress.

On his next walk with Qc Na, Anton attempted to impress his master by saying "Master, I have diligently studied the matter, and now understand that objects are truly a poor man's closures." Qc Na responded by hitting Anton with his stick, saying "When will you learn? Closures are a poor man's object." At that moment, Anton became enlightened."""


OOP is actually class oriented programming and classes are orthogonal the object/closure duality.

For example in php class A { public $a = new B;} is not valid

https://onlinephp.io/c/31246

Such a restriction makes no sense in a "closures are a poor man's object" world.


Not sure what are you trying to say. My point was that a invoking a function that will act on arguments that have been defined away from the callsite is not a OO specific feature.


Yeah, exactly. Referring back to the DI comment earlier... In FP, "DI" is literally just calling a function.


> And, of course, async/non-blocking calls, as tracing a call along different threads or promises may not be available all the time.

I think there is still hope on that front. Following structured concurrency patterns like the supervisor model should handle it. See the discussion about "Notes on structured concurrency, or: Go statement considered harmful" [1].

[1] https://news.ycombinator.com/item?id=16921761


I disagree. Pretty much any program of enough size, including those written in functional languages, are going to have state that outlives the lifetime of a single call stack.


hopefully not

it's certainly not a given


Simple example: the web server that takes in the HTTP requests? That one definitely got initialized outside the call trace that's serving the HTTP request.

No different from an object that got constructed and then a method called on it.


ah, sorry, i misinterpreted what you meant -- obviously, yes, you're correct

i thought you meant state that is created from a call stack and persists after that call stack returns -- which is something else


> You click on any value and you see a graph/chain of all the transformations this data goes through right back to the source of truth.

This kind of reminds me of the Whyline: https://www.cs.cmu.edu/~NatProg/whyline.html


> Now Patented!

Yippee. /s


So it's an interesting thought experiment, but... I suspect the Rust compiler could statically figure this out for a large # of cases .... based on following the borrow/copy/clone tree for any particular variable. It could in theory be possible to write a tool which traces that graph back to the snippets of code where it originates, though that would usually end up being a command-line args parser, or a config file reader...

It's mostly feasible in other languages to, but probably not with the same level of rigor.

Anybody here from JetBrains?


This should be provided by the PLS for the language, not by an IDE.


First thing I thought of after reading your comment was web inspectors which do help a little bit in figuring out stuff like "Why is this line in italics?" by showing you computed styles. But they could go a lot farther. I'd like to see a temporal graph of everything (e.g. HTML, JS, CSS, Virtual DOM, whatever) that had any visual or behavioral impact on an object, and the reasons why things changed in that order, why some changes overrode others, etc.


Sounds a lot like what systeminit is working on: https://www.systeminit.com/


It seemed very promising until it showed DevOps engineers dragging blocks in a shared (synchronous) canvas like Figma.

Am I the only one that thinks pull requests (asynchronous) are a more appropriate medium of collaboration for this kind of work that requires strong consistency between elements?


It works with pull requests too AFAIK.


This is a minor thing which I found useful- The "help" functions of all my bash-defined functions report which file they're defined in (such as: `echo "This function is defined in ${BASH_SOURCE[0]}"`). If I could do something similar for my environment-defined values (hmmm, now I'm thinking...), I would. (I could create a function that tracks at least the changes _I'VE_ made to these values, which could help...)


> What we need is traceability of every value in a system, not just configuration.

Man, you would love functional programming. I'm' not even joking.


anyParadigm :: RAM -> RAM


A feature of actuarial software is generating dependency graphs of functions. Actuaries don't write the financial models that power reporting processes in standard languages (Python, Julia, C++) and instead have proprietary systems with features like this.


What systems are you thinking if?

In my experience most actuaries I interact with maintain truly horrendous spreadsheets with labyrinthine complexity.


i wonder whether our current strong compile-time/runtime distinction can support this. it may be slightly easier for interpreted and vm-based languages but i’m not sure we can have it more broadly.

lately i’ve been playing with prolog and revisiting computer science/computer programming before imperative languages drew a curtain on innovation and i think we have lost a lot. our modern programming isn’t the programming the founders had in mind, it seems.


Compiled languages absolutely can support this.

But imperative ones create a bit of a problem. Nothing impossible to deal with, but it becomes harder.

But any choice of language, the OS can enforce it too, and maybe in a much easier way than messing with your language.


I think it could be done for Rust


Sounds quite a lot like the Common Lisp experience when using Sly and Emacs. You can (trace) every function and see lexical scopes of variables, even bind them if you execute some s-expr that does not currently have some variable under that symbol.

The distributed aspect is a bit more difficult though. I don't know if there is a system that truly got this right. Maybe Erlang's BEAM VM together with a good debugger I don't yet know about.


FWIW, we do something like this in Spack[1] with `spack config blame`. There are different YAML configs, they can be merged and override each other based on precedence, but we tell you where all the merged values are coming from. Here's a simplified example -- you can see that some settings are coming from the user's "default" environment, some from Spack's defaults, and some from specialized defaults for Darwin:

    > spack -e default config blame packages
    ---                                               packages:
    spack/environments/default/spack.yaml:59            fftw:
    spack/environments/default/spack.yaml:60              require: ~mpi
    spack/etc/spack/defaults/darwin/packages.yaml:17    all:
    spack/defaults/packages.yaml:18                       compiler: [apple-clang, gcc, clang]
    spack/defaults/darwin/packages.yaml:20                providers:
    spack/defaults/darwin/packages.yaml:21                  unwind: [apple-libunwind]
    spack/defaults/packages.yaml:20.                        blas: [openblas, amdblis]
    spack/defaults/packages.yaml:21                         jpeg: [libjpeg-turbo, libjpeg]
    spack/defaults/packages.yaml:22                         lapack: [openblas, amdlibflame]
    spack/defaults/packages.yaml:23.                        mpi: [openmpi, mpich]
    spack/defaults/darwin/packages.yaml:22              apple-libunwind:
    spack/defaults/darwin/packages.yaml:23                buildable: False
    spack/defaults/darwin/packages.yaml:24                externals:
    spack/defaults/darwin/packages.yaml:27                - spec: apple-libunwind@35.3
    spack/defaults/darwin/packages.yaml:28                  prefix: /usr
The filenames are shown in color in the UI; you can see what it really looks like here: https://imgur.com/uqeiBb5

For the curious, there is more on scopes and merging [2]. This isn't exactly easy to do -- we use ruamel.yaml [3], but had to add our own finer-grained source attribution to get this level of detail. There are also tricky things to deal with to maintain provenance on plain old Python dict/str/list/etc. objects [4,5].

[1] https://github/spack/spack

[2] https://spack.readthedocs.io/en/latest/configuration.html

[3] https://yaml.readthedocs.io/en/latest/

[4] https://github.com/spack/spack/blob/95ca9de/lib/spack/spack/...

[5] https://github.com/spack/spack/blob/95ca9de/lib/spack/spack/...


This is mostly due to the procedural "recipe" style programming paradigm that most developers seem to default to. It doesn't scale well past small programs with the control flow being embedded in the code so the only good way to follow the flow is to track things through the callers. This is one of the reasons I prefer dataflow programming styles that makes the control flow explicit.


Yes but wouldn't the logging take up quite a bit of memory? Something like O(compute cycles* (log(lines of code)+log(memory allocated))) before compression.



In complex prod systems, logging and traceability are generally more important than memory consumption and disk space. Memory and Disk are cheaper than a lawsuit if something goes awry


> You click on any value and you see a graph/chain of all the transformations this data goes through right back to the source of truth

Isn't that just a debugger?


Not in a distributed system


This is what's really needed. In one code base it's relatively easy for the dev team. Across your microservices it's practically impossible unless all teams are involved. This is a nadir in dev productivity currently


Which debugger are you using?


And now people are resorting to obfuscating code, just to protect their IP. fun times.


The perfect is the enemy of the good.


Look into Nix / NixOS




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: