I understand what you mean, but in my opinion there's a big difference between writing in natural language and actively engaging your brain with writing code, looking up documentation, etc.
It also sort of feels like "you don't know what you don't know", i.e. would you have considered an alternative better solution if you thought about it yourself, went to the documentation, found a tutorial on the web?
Of course, production is arguably a lot faster but it feels like there's starting to become a trade-off where the models feel so capable that we stop trying to find the solution to the problem ourselves and thus perhaps degrading our personal reasoning capabilities. I say this as something I'm afraid is happening, not something I'm certain of.
A compiler is a predictable, testable, deterministic piece of software.
An LLM is not.
Sure, all abstractions leak; so, at some point in time, for some reason, you may need to check its compiled code ( coughcough gcc 2.96 ). But, if today your code compiles properly, it will properly compile tomorrow as well.
LLMs can be deterministic as well - same prompt on the same model produces the same input. On the other hand, compilers can be quite undeterministic - you get a new version of compiler, or change compiler options (turn on optimizations) - you might get a very different binary. And JIT compilers (and GC languages) even less deterministic, their compilation can depend on the nature of the inputs.
But I think, in the analogy compiler ~ LLM, the issue is more of a trust than determinism. It took decades to assembler programmers to trust compilers enough not to write code in assembler. The similar will happen with AI - some will embrace it sooner than others.
> LLMs can be deterministic as well - same prompt on the same model produces the same input
> compilers can be quite undeterministic - you get a new version of compiler, or change compiler options (turn on optimizations)
That’s a whole other level pf bad faith argument right here. Flags and options are input too.
> It took decades to assembler programmers to trust compilers enough not to write code in assembler.
You do realize that Cobol, Algol, and Lisp are very old, and they were not assembly. And that Unix were written in C shortly after the language was created.
> That’s a whole other level pf bad faith argument right here.
Not sure where you see the bad faith argument. (Btw I mean "same output", not "same input", it was a typo.)
Take for example JVM. It used to be horribly bad and unpredictable, performance wise, in the 90s. Sun tried to base a desktop environment on it - it didn't work.
> You do realize that Cobol, Algol, and Lisp are very old, and they were not assembly.
Of course! But people have been hand-writing assembler until late 2000s, because compilers were simply not that good.
The same will happen with LLMs - some people will not trust it and won't use it for decades, possibly. Some have already embraced it.
You proof for your argument that a compiler is undeterministic is to change the whole compiler to another version and saying it won’t produce the same output as the old one.
> But people have been hand-writing assembler until late 2000s, because compilers were simply not that good.
And we have software like Unix, enacs, ksh, awk… that’s all written in C. I strongly believe that those people who were writing assembly was optimizing stuff or dealing with constraints (like the 640kb of DOS). Just like today, you may still have to write assembly for microcontrollers or video codecs. Compilers were expensive, but people were paying for them.
> You proof for your argument that a compiler is undeterministic is to change the whole compiler to another version and saying it won’t produce the same output as the old one.
Fair enough. What I meant though was that compilation as a process is not deterministic, because often when you recompile couple years later, you're using a different compiler. (In modern world it can be much shorter time, actually.)
> And we have software like Unix, enacs, ksh, awk… that’s all written in C.
So? IIRC, first compiler was FORTRAN, invented in 1958. OpenAI Codex, first coding LLM, came out August 2021. So we are like in a year 1963. For this comparison, we have ten more years to produce (using a coding LLM) a compiler and operating system just from the textual specification, without an intermediate formal programming language. Funny - we have actually already done that (Claude C Compiler, VibexOS).
> So? IIRC, first compiler was FORTRAN, invented in 1958. OpenAI Codex, first coding LLM, came out August 2021. So we are like in a year 1963. For this comparison, we have ten more years to produce (using a coding LLM) a compiler and operating system just from the textual specification, without an intermediate formal programming language.
Nope, the timeframe would have been three years
In 1961, the MCP was the first OS written exclusively in a high-level language (HLL).[0]
So by 2024, we should all have been able to verify that LLMs are reliable to produce a good enough product. Instead, it’s just slop everywhere, where the one producing it does not even care about its creation.
are you saying ai writes code that is semantically wrong? because i dont think humans write deterministic code - they come up with different solutions to the same problem.
This would only be somewhat equivalent if you compiled your code into assembly and committed that output to the repo, and then had to continue development within the assembly codebase using the same method.
How is that relevant to the topic of this discussion?
Compilation from higher order languages to the machine code is deterministic. It is sufficient to review and well-test the tool which does the translation. Given the same input, the output will always be the same.
Transformation of a natural language prompt to code by an AI tool is non-deterministic. The outputs will vary between runs. Therefore, it is always necessary to verify them.
Compilation is not deterministic, see JITs and GCs. What is deterministic is the resulting program output, but not its performance. So with compilers, we traded away the determinism over performance in exchange for ease of programming.
With LLMs, we are trading away the determinism of the program output as well, in exchange for even more easier programming. Is it a good or bad thing? There are ways to mitigate the problem, just like there are with compilers.
You could argue the determinism of the program output was never really there, because the specification at the high enough level was always unclear. So we are not really losing that much, just accepting more messy reality.
Then the only question remains, can these computer programs (LLMs) do a better job (and where) than a SW developer, who is supposed to translate unclear specifications into a formal language (source code). It happened with compilers - eventually they got better than all of assembler programmers. Same happened to chess players.
> Compilation is not deterministic, see JITs and GCs. What is deterministic is the resulting program output, but not its performance.
Does JIT compiles some other program code instead of the one being run? Does it produce bytecodes for a differenr VM? Does it tries to compile parts of the program that have not been executed or aren’t going to be?
Does GC destroy objects being in use? Does it ignores instances and memory that has been properly released?
JITs and GC are deterministic algorithms, you can predict its behavior by just reading their code. LLM tooling involves an actual random generator for its output.
> Does JIT compiles some other program code instead of the one being run? Does it produce bytecodes for a different VM? Does it tries to compile parts of the program that have not been executed or aren’t going to be?
Sure, but the same is true for LLMs - the lead models no longer make trivial mistakes like answering "What is the capital of France?" wrong.
> JITs and GC are deterministic algorithms, you can predict its behavior by just reading their code.
On large enough systems, you can't, just like it's difficult to predict weather. Determinism has little to do with it. At work, I have just witnessed a bug in JIT (it seems to have been fixed in OpenJDK 25). It inlined a wrong method. We weren't able to reproduce the error conditions without a private customer dataset.
And the fact is, historically, there have been many bugs in compilers, or they have been bad at their job, writing performant programs. The output (resulting program) of a good compiler is difficult to understand (because it is written to be efficient). LLMs (for the programming use case) are different quantitatively, not qualitatively.
It’s really weird how you shift the goalposts and your own definitions.
No one is saying that a compiler can’t have bugs. What we have been saying is that if we take the compiler has a blackbox, we’re reasonably certain given we know the input, what the outputs will be. And the output will stay the same if you keep the input the same.
But you can send the LLM the same prompt, and it will gives you a different answer each time. And it’s not even about the verbiage used.
LLM doesn't have to be non-deterministic, it can work just like any other deterministic algorithm.
But I am not sure why the insistence on the relevance of (non)determinism, rather than on the chaotic relation of the output to the input (which is true for both compilers and LLMs). In practice, inputs to the LLM, as well as to the compiler, change. And the fact is, the output can change radically due to that.
I think nobody really sends the same prompt twice to the LLM, so nobody cares about it being deterministic. I think what you're looking for is something different, some form of stability (as opposed to chaotic behavior). Although it's hard to define exactly, because in case of LLMs theory lacks behind praxis. (And as I said - we already gave up on stability with respect to performance by using compilers. We resolve that issue by doing performance testing.)
(I asked AI what's the opposite of "chaotic", I use "stable", but it seems that people use "deterministic" or "predictable" also in that meaning. So if you're using "deterministic" in that meaning, then you don't really care about sampling and temperature, i.e. determinism in the philosophical sense, but rather whether the output is consistent, albeit expressed differently.)
The whole point of technology is about control and consistency. Even with random parameters, we want their value to an item in a specific sets. When I use a tool, I want it to produce the outcome I want, not any other outcome it wants to produce. If it fails at that, it’s a defective tool.
> Compilation from higher order languages to the machine code is deterministic.
but that's not the analogy. there are problems that you can solve better if you can go deeper in the stack, and they can have different solutions.
The usual response to this is the "but high level languages are deterministic blah blah blah" (which IMO would be a good enough argument but well, we know how this goes now)
I posit a different argument. When you install a compiler on your computer, that compiler is "yours" for as long as you have the binary. You are able to completely forget about assembly because of 1. reliable _enough_ compiler 2. reliable access to said compiler.
Let's rewind decades back and pretend that the very first assembly compiler was behind a monthly subscription*. Do you think we'd be in the same place now?
Now the natural follow up to this "but the open models are close to SotA now". Well why aren't we using them? Do we really think we'd have a GNU moment for """open""" models? And are we willing to bet our industry on that?
But my point is, _these are not the same things_ and positing them as such is frankly insulting. How good are you at writing assembly when your compiler is inevitably taken away?
* I'm not a historian so I wouldn't be surprised some version of them were
This is a great point! And not only a compiler behind a subscription, it's also a compiler whose financial interests are not aligned to be the best compiler but the one that makes the most money, which is unclear what it means at this moment. Will it have ads? Will it give preference to some technology over another? Will it steal your code? It's an unreliable and opaque compiler!
We are though? It just depends on the task and the costs.
> Do we really think we'd have a GNU moment for """open""" models? And are we willing to bet our industry on that?
Yes and yes. We're in the mainframe era. But history this time around is passing us by at a ridiculously fast clip. Local models become "good enough" for new tasks by the day, after which they continue to shrink for a given performance level.
I'm not going to bet against either moore's law or relentless increases in model efficiency any time soon.
There is an argument that I’ve been seeing more recently that argues why we should expect open models to eventually reach good enough status that people use them over frontier commercial models.
Basically it boils down to geopolitics, the US economy is currently being propped up by a small subset of companies, and a lot of that is based on proprietary models and speculation in the market around them. China is going to continue to dump better and better free models out to complete. Thus pulling the rug out on all that speculation.
Interactions with agents are conversational, while higher order langs are declarative. Spec driven development has been failing us, because there is no feedback loop from the runtime to the spec.
Maybe I’m pushing it a bit, I know, but a couple of decades ago you could’ve been asking this instead.