This is the central problem with all programming today. You can think of everyth...

spawarotti · on June 25, 2023

Truer words have never been spoken.

Microsoft IntelliTest (formerly Pex) [1] is internally using Z3 constraint solver that traces program data and control flow graph well enough to be able to generate desired values to reach given statement of code. It can even run the program to figure out runtime values. Hence the technique is called Dynamic Symbolic Execution [3]. We have technology for this, just not yet applied correctly.

I would also like to be able to point at any function in my IDE and ask it:

- "Can you show me the usual runtime input/output pairs for this function?"

- "Can you show me the preconditions and postconditions this function obeys?"

There is plenty of research prototypes doing it (Whyline [4], Daikon [5], ...) but sadly, not a tool usable for the daily grind.

[1] https://learn.microsoft.com/en-us/visualstudio/test/intellit...

[2] https://link.springer.com/chapter/10.1007/978-3-642-38916-0_...

[3] https://queue.acm.org/detail.cfm?id=2094081

[4] https://www.cs.cmu.edu/~NatProg/whyline.html

[5] https://plse.cs.washington.edu/daikon/

fasterik · on June 25, 2023

I suspect this is an area that is part of the essential complexity [1] of programming. Reachability, variable liveness, pointer aliasing, etc. are undecidable problems for arbitrary programs. Also, the control flow graph of most non-trivial programs is probably too complex to visualize in a usable way.

However, there is still a lot of room for improvement in compilers and debuggers. If I could tell the compiler to instrument my code to track the values that I care about, and then have the debugger visualize the flow of information through the system, that would be very useful. It wouldn't guarantee that the data flow analysis applies to subsequent runs of the program, but it could reduce the amount of time spent reading and debugging code.

[1] https://en.wikipedia.org/wiki/No_Silver_Bullet

vaughan · on June 25, 2023

Something that gives me hope is that I haven't seen anyone prioritize debuggability in a modern programming language or even a framework. Vale is maybe the first getting close: https://verdagon.dev/blog/perfect-replayability-prototyped.

I think there is a new era yet to come.

> probably too complex to visualize in a usable way

You can always abstract the complexity. A simplification of the control flow graph at the function level.

As an example, I want to understand the Postgres codebase better...there are a lot of very visualizable things going on there. If you look at lecture slides on relational DBMS, there are lots of diagrams...so why can't be view those animating diagrams as the code executes.

I was thinking of instrumenting the code to get a trace of every function call and its arguments, and then filter that and visualize it.

The key to this approach working though is a think ultimately we need to write our code and organize our code such that it can be easily visualizable. The purpose of code is for humans to understand it, otherwise we are writing assembly.

Terr_ · on June 25, 2023

> I haven't seen anyone prioritize debuggability

Static-checking tools and strongly-typed languages: "Are we a joke to you?"

Finding and removing bugs before the code runs is still debugging.

AdieuToLogic · on June 26, 2023

> This is the central problem with all programming today.

Sweeping generalizations rarely age well.

> You can think of everything as configuration. Parameters of a function are just configuration for that function.

No, parameters of a function are contextual collaborators which participate in a function implementation. Configuration canonically influences a program aside from its individual run-time interactions.

Is a value given to the cosine function a configuration? Words matter.

> Every task in software is wanting to change what is displayed in a user interface of some sort.

This is a myopic perspective. Software processes data into information which provides value in context. This may never be "displayed in a user interface."

To wit, the network gear involved in my posting this reply most certainly qualifies as a "task in software." So, too, is the software running in the power companies involved. It would be quite a stretch to categorize them as "wanting to change what is displayed in a user interface."

Blackthorn · on June 25, 2023

This would be sooooo nice, especially in DI heavy languages like Java. It might not be turtles all the way down, but finding the final turtle can be so difficult.

louzell · on June 25, 2023

It drives me nuts when systems are unnecessarily built with "magic DI" frameworks rather than initializer/constructor DI. Why is it so hard to pass a dep into the initializer? Make the initializer parameter a protocol/interface and you can pass a mock in at testing. And you can also grep the whole codebase for callers and find the exact concrete type passed in at runtime.

inoffensivename · on June 25, 2023

IMO, automatic DI frameworks were a big mistake. An idea that looks useful at the time, but actually leads to spaghetti mess code that is harder to maintain than it would be without any automatic DI.

flaminHotSpeedo · on June 27, 2023

I agree, though I'd say in the more general case that overly flexible DI frameworks are the broader problem.

Designers will put much more thought into how best to interact with a framework, so start with a small, opinionated interface then consider extension feature requests on a case by case basis

erickf1 · on June 25, 2023

Agreed.

Blackthorn · on June 25, 2023

Eh, it's not like it's much easier to find the last turtle in python. Or C++ when people are extensively using patterns like PointerToImpl*, which can really obfuscate things.

giovannibonetti · on June 25, 2023

Object orientation kills traceability. Classes are instantiated in one trace, then methods are called in another trace, and if something goes wrong in the latter, it is not trivial to find the wrong settings passed in the former.

I think some functional programming languages solve that by running a good chunk (if not all) of the application in a single trace.

gombosg · on June 25, 2023

Yes, this is why we love functional programming! "What happened along the way" equals to the call stack, as long as there is no field mutation involved.

And, of course, async/non-blocking calls, as tracing a call along different threads or promises may not be available all the time.

gpderetta · on June 25, 2023

"""The venerable master Qc Na was walking with his student, Anton. Hoping to prompt the master into a discussion, Anton said "Master, I have heard that objects are a very good thing - is this true?" Qc Na looked pityingly at his student and replied, "Foolish pupil - objects are merely a poor man's closures."

Chastised, Anton took his leave from his master and returned to his cell, intent on studying closures. He carefully read the entire "Lambda: The Ultimate..." series of papers and its cousins, and implemented a small Scheme interpreter with a closure-based object system. He learned much, and looked forward to informing his master of his progress.

On his next walk with Qc Na, Anton attempted to impress his master by saying "Master, I have diligently studied the matter, and now understand that objects are truly a poor man's closures." Qc Na responded by hitting Anton with his stick, saying "When will you learn? Closures are a poor man's object." At that moment, Anton became enlightened."""

afiori · on June 25, 2023

OOP is actually class oriented programming and classes are orthogonal the object/closure duality.

For example in php class A { public $a = new B;} is not valid

https://onlinephp.io/c/31246

Such a restriction makes no sense in a "closures are a poor man's object" world.

gpderetta · on June 25, 2023

Not sure what are you trying to say. My point was that a invoking a function that will act on arguments that have been defined away from the callsite is not a OO specific feature.

asimpletune · on June 25, 2023

Yeah, exactly. Referring back to the DI comment earlier... In FP, "DI" is literally just calling a function.

giovannibonetti · on June 26, 2023

> And, of course, async/non-blocking calls, as tracing a call along different threads or promises may not be available all the time.

I think there is still hope on that front. Following structured concurrency patterns like the supervisor model should handle it. See the discussion about "Notes on structured concurrency, or: Go statement considered harmful" [1].

[1] https://news.ycombinator.com/item?id=16921761

Blackthorn · on June 25, 2023

I disagree. Pretty much any program of enough size, including those written in functional languages, are going to have state that outlives the lifetime of a single call stack.

preseinger · on June 26, 2023

hopefully not

it's certainly not a given

Blackthorn · on June 26, 2023

Simple example: the web server that takes in the HTTP requests? That one definitely got initialized outside the call trace that's serving the HTTP request.

No different from an object that got constructed and then a method called on it.

preseinger · on June 26, 2023

ah, sorry, i misinterpreted what you meant -- obviously, yes, you're correct

i thought you meant state that is created from a call stack and persists after that call stack returns -- which is something else

josephcsible · on June 25, 2023

> You click on any value and you see a graph/chain of all the transformations this data goes through right back to the source of truth.

This kind of reminds me of the Whyline: https://www.cs.cmu.edu/~NatProg/whyline.html

aendruk · on June 25, 2023

> Now Patented!

Yippee. /s

cmrdporcupine · on June 25, 2023

So it's an interesting thought experiment, but... I suspect the Rust compiler could statically figure this out for a large # of cases .... based on following the borrow/copy/clone tree for any particular variable. It could in theory be possible to write a tool which traces that graph back to the snippets of code where it originates, though that would usually end up being a command-line args parser, or a config file reader...

It's mostly feasible in other languages to, but probably not with the same level of rigor.

Anybody here from JetBrains?

catlifeonmars · on June 25, 2023

This should be provided by the PLS for the language, not by an IDE.

dreamcompiler · on June 25, 2023

First thing I thought of after reading your comment was web inspectors which do help a little bit in figuring out stuff like "Why is this line in italics?" by showing you computed styles. But they could go a lot farther. I'd like to see a temporal graph of everything (e.g. HTML, JS, CSS, Virtual DOM, whatever) that had any visual or behavioral impact on an object, and the reasons why things changed in that order, why some changes overrode others, etc.

marcosnils · on June 25, 2023

Sounds a lot like what systeminit is working on: https://www.systeminit.com/

giovannibonetti · on June 25, 2023

It seemed very promising until it showed DevOps engineers dragging blocks in a shared (synchronous) canvas like Figma.

Am I the only one that thinks pull requests (asynchronous) are a more appropriate medium of collaboration for this kind of work that requires strong consistency between elements?

aedocw · on June 25, 2023

It works with pull requests too AFAIK.

pmarreck · on June 25, 2023

This is a minor thing which I found useful- The "help" functions of all my bash-defined functions report which file they're defined in (such as: `echo "This function is defined in ${BASH_SOURCE[0]}"`). If I could do something similar for my environment-defined values (hmmm, now I'm thinking...), I would. (I could create a function that tracks at least the changes _I'VE_ made to these values, which could help...)

asimpletune · on June 25, 2023

> What we need is traceability of every value in a system, not just configuration.

Man, you would love functional programming. I'm' not even joking.

hoosieree · on June 25, 2023

anyParadigm :: RAM -> RAM

opensourceac · on June 25, 2023

A feature of actuarial software is generating dependency graphs of functions. Actuaries don't write the financial models that power reporting processes in standard languages (Python, Julia, C++) and instead have proprietary systems with features like this.

etothepii · on June 25, 2023

What systems are you thinking if?

In my experience most actuaries I interact with maintain truly horrendous spreadsheets with labyrinthine complexity.

yawboakye · on June 25, 2023

i wonder whether our current strong compile-time/runtime distinction can support this. it may be slightly easier for interpreted and vm-based languages but i’m not sure we can have it more broadly.

lately i’ve been playing with prolog and revisiting computer science/computer programming before imperative languages drew a curtain on innovation and i think we have lost a lot. our modern programming isn’t the programming the founders had in mind, it seems.

marcosdumay · on June 25, 2023

Compiled languages absolutely can support this.

But imperative ones create a bit of a problem. Nothing impossible to deal with, but it becomes harder.

But any choice of language, the OS can enforce it too, and maybe in a much easier way than messing with your language.

cmrdporcupine · on June 25, 2023

I think it could be done for Rust

maximaximal · on June 25, 2023

Sounds quite a lot like the Common Lisp experience when using Sly and Emacs. You can (trace) every function and see lexical scopes of variables, even bind them if you execute some s-expr that does not currently have some variable under that symbol.

The distributed aspect is a bit more difficult though. I don't know if there is a system that truly got this right. Maybe Erlang's BEAM VM together with a good debugger I don't yet know about.

tgamblin · on June 25, 2023

FWIW, we do something like this in Spack[1] with `spack config blame`. There are different YAML configs, they can be merged and override each other based on precedence, but we tell you where all the merged values are coming from. Here's a simplified example -- you can see that some settings are coming from the user's "default" environment, some from Spack's defaults, and some from specialized defaults for Darwin:

    > spack -e default config blame packages
    ---                                               packages:
    spack/environments/default/spack.yaml:59            fftw:
    spack/environments/default/spack.yaml:60              require: ~mpi
    spack/etc/spack/defaults/darwin/packages.yaml:17    all:
    spack/defaults/packages.yaml:18                       compiler: [apple-clang, gcc, clang]
    spack/defaults/darwin/packages.yaml:20                providers:
    spack/defaults/darwin/packages.yaml:21                  unwind: [apple-libunwind]
    spack/defaults/packages.yaml:20.                        blas: [openblas, amdblis]
    spack/defaults/packages.yaml:21                         jpeg: [libjpeg-turbo, libjpeg]
    spack/defaults/packages.yaml:22                         lapack: [openblas, amdlibflame]
    spack/defaults/packages.yaml:23.                        mpi: [openmpi, mpich]
    spack/defaults/darwin/packages.yaml:22              apple-libunwind:
    spack/defaults/darwin/packages.yaml:23                buildable: False
    spack/defaults/darwin/packages.yaml:24                externals:
    spack/defaults/darwin/packages.yaml:27                - spec: apple-libunwind@35.3
    spack/defaults/darwin/packages.yaml:28                  prefix: /usr

The filenames are shown in color in the UI; you can see what it really looks like here: https://imgur.com/uqeiBb5

For the curious, there is more on scopes and merging [2]. This isn't exactly easy to do -- we use ruamel.yaml [3], but had to add our own finer-grained source attribution to get this level of detail. There are also tricky things to deal with to maintain provenance on plain old Python dict/str/list/etc. objects [4,5].

[1] https://github/spack/spack

[2] https://spack.readthedocs.io/en/latest/configuration.html

[3] https://yaml.readthedocs.io/en/latest/

[4] https://github.com/spack/spack/blob/95ca9de/lib/spack/spack/...

[5] https://github.com/spack/spack/blob/95ca9de/lib/spack/spack/...

eikenberry · on June 25, 2023

This is mostly due to the procedural "recipe" style programming paradigm that most developers seem to default to. It doesn't scale well past small programs with the control flow being embedded in the code so the only good way to follow the flow is to track things through the callers. This is one of the reasons I prefer dataflow programming styles that makes the control flow explicit.

n40487171 · on June 25, 2023

Yes but wouldn't the logging take up quite a bit of memory? Something like O(compute cycles* (log(lines of code)+log(memory allocated))) before compression.

vaughan · on June 25, 2023

Check out https://pernos.co/.

bg46z · on June 25, 2023

In complex prod systems, logging and traceability are generally more important than memory consumption and disk space. Memory and Disk are cheaper than a lawsuit if something goes awry

Yajirobe · on June 25, 2023

> You click on any value and you see a graph/chain of all the transformations this data goes through right back to the source of truth

Isn't that just a debugger?

snapcaster · on June 25, 2023

Not in a distributed system

smrtinsert · on June 25, 2023

This is what's really needed. In one code base it's relatively easy for the dev team. Across your microservices it's practically impossible unless all teams are involved. This is a nadir in dev productivity currently

olejorgenb · on June 25, 2023

Which debugger are you using?

s3p · on June 27, 2023

And now people are resorting to obfuscating code, just to protect their IP. fun times.

CodeWriter23 · on June 25, 2023

The perfect is the enemy of the good.

reasonabl_human · on June 25, 2023

Look into Nix / NixOS