Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

AI field desperately needs smarter models - not faster models.


Definitely needs faster and cheaper models. Fast and cheap models could replace software in tons of situations. Imagine a vending machine or a mobile game or a word processor where basically all logic is implemented as a prompt to an llm. It would serve as the ultimate high level programming language.


I think natural language to code is the right abstraction. Easy enough barrier to entry but still debuggable. Debugging why an LLM randomly gives you Mountain Dew instead of Sprite if you have a southern accent sounds like a nightmare.


I'm not sure it would be that hard to debug. Make sure you can reproduce the llm state (by storing the random seed for the session, or something like that) and then ask it "why did you just now give that customer mountain dew when they ordered sprite?"


> and then ask it "why did you just now give that customer mountain dew when they ordered sprite?"

Worse than useless for debugging.

An LLM can't think and doesn't have capabilities for self-reflection.

It will just generate a plausible stream of tokens in reply that may or may not correspond to the real reason why.


Of course a llm can't think. But that doesn't mean it can't answer simple questions about the output that was produced. Just try it out with chatgpt when you have time. Even if it's not perfectly accurate it's still useful for debugging.

Just think about it as a human employee. Can they always say why they did what they did? Often, but not always. Sometimes you will have to work to figure out the misunderstanding.


> it's still useful for debugging

How so? What the LLM says is whatever is more likely given the context. It has no relation to the underlying reality whatsoever.


Not sure what you mean by "relation to the underlying reality". The explanation is likely to be correlated with the underlying reason for the answer.

For example, here is a simple query:

> I put my bike in the bike stand, I removed my bike from the bike stand and biked to the beach, then I biked home, where is my bike. Answer only with the answer and no explanation

> Chatgpt: At home

> Why did you answer that?

> I answered that your bike is at home because the last action you described was biking home, which implies you took the bike with you and ended your journey there. Therefore, the bike would logically be at home now.

Do you doubt that the answer would change if I changed the query to make the final destination be "the park" instead of "home"? If you don't doubt that, what do you mean that the answer doesn't correspond to the underlying reality? The reality is the answer depends on the final destination mentioned, and that's also the explanation given by the LLM, clearly the reality and the answers are related.


You need to find an example of the LLM making a mistake. In your example, ChatGPT answered correctly. There are many examples online of LLMs answering basic questions incorrectly, and then the person asking the LLM why it did so. The LLM response is usually nonsense.

Then there is the question of what you would do with its response. It’s not like code where you can go in and update the logic. There are billions of floating point numbers. If you actually wanted to update the weights you’ll quickly find yourself fine-tuning the monstrosity. Orders of magnitude more work than updating an “if” statement.


I don't think llms always can give correct explanations for their answers. That's a misunderstanding.

> Then there is the question of what you would do with its response. I

Sure but that's a separate question. I'd say the first course of action would be to edit the prompt. If you have to resort to fine tuning I'd say the approach has failed and the tool was insufficient for the task.


It’s not really a separate question imo. We want to know whether computer code or prompts are better for programming things like vending machines.

For LLMs, interpretability is one problem. The ability to effectively apply fixes is another. If we are talking about business logic, have the LLM write code for it and don’t tie yourself in knots begging the LLM to do things correctly.

There is a grey area though, which is where code sucks and statistical models shine. If your task was to differentiate between a cat and a dog visually, good luck writing code for that. But neural nets do that for breakfast. It’s all about using the right tool for the job.


> The explanation is likely to be correlated with the underlying reason for the answer.

No it isn't. You misunderstand how LLMs work. They're giant Mad Libs machines: given these surrounding words, fill in this blank with whatever statistically is most likely. LLMs don't model reality in any way.


Did you read the example above? Do you disagree that the LLM provided a correct explanation for the reason it answered as it did?

> They're giant Mad Libs machines: given these surrounding words, fill in this blank with whatever statistically is most likely. LLMs don't model reality in any way.

Not sure why you think this is incompatible with the statement you disagreed with.


> Do you disagree that the LLM provided a correct explanation for the reason it answered as it did?

Yes, I do. An LLM replies with the most likely string of tokens. Which may or may not correspond with the correct or reasonable string of tokens, depending on how stars align. In this case the statistically most likely explanation the LLM replied with just happened to correspond with the correct one.


> In this case the statistically most likely explanation the LLM replied with just happened to correspond with the correct one.

I claim that case is not so uncommon as people in this thread seem to think


Why not just store the state in the code and debug as usual, perhaps with LLM assistance? At least that’s tractable.


Why on earth would you implement a vending machine using an LLM?


The same reason we make the butter dish suffer from existential angst.


Because it's easy and cheap. Like how many products use a Raspberry Pi or ESP32 when an ATtiny would do.


How in the world is this easy and cheap? Are you planning to run this LLM inside the vending machine? Or are you planning to send those prompts to a remote LLM somewhere?


The premise here is that the model runs fast and cheap. With the current state of the technology running a vending machine using an LLM is of course absurd. The point is that accuracy is not the only dimension that brings qualitative change to the kind of applications that LLMs are useful for.


Running a vending machine using an LLM is absurd not because we can't run LLMs fast or cheap enough - it's because LLMs are not reliable, and we don't know yet how to make them more reliable. Our best LLM - o3 - doubled the previous model (o1) hallucination rate. OpenAI says it hallucinated a wrong answer 33% of the time in benchmarks. Do you want a vending machine that screws up 33% of the time?

Today, the accuracy of LLMs is by far a bigger concern (and a harder problem to solve) than its speed. If someone releases a model which is 10x slower than o3, but is 20% better in terms of accuracy, reliability, or some other metric of its output quality, I'd switch to it in a heartbeat (and I'd be ready to pay more for it). I can't wait until o3-pro is released.


Do you seriously think a typical contemporary LLM would screw up 33% of vending machine orders?

I don't know what benchmark you're looking at but I'm sure the questions in it were more complicated than the logic inside a vending machine.

Why don't you just try it out? It's easy to simulate, just tell the bot about the task and explain to it what actions to perform in different situations, then provide some user input and see if it works or not.


You could run a 3B model on 200 dollars worth of hardware and it would do just fine, 100 percent of the time, most of the time. I could definitely see someone talking it out of a free coke now and then though.

With vending machines costing 2-5k, it’s not out of the question, but it’s hard to imagine the business case for it. Maybe the tantalizing possibility of getting a free soda would attract traffic and result in additional sales from frustrated grifters? Idk.


Yet deepseek has shown that more dialogue increases quality. Increasing speed is therefore important if you need thinking models.


If you have much more speed in the available time, for an activity like coding, you could use that for iteration, writing more tests and satisfying them, especially if you can pair that with a concurrent test runner to provide feedback. I'm not sure the end result would be lower scoring/smartness than an LLM could achieve in the same duration.


I'm not sure the end result would be lower scoring/smartness than an LLM could achieve in the same duration.

It probably wouldn’t with current models. That’s exactly why I said we need smarter models - not more speed. Unless you want to “use that for iteration, writing more tests and satisfying them, especially if you can pair that with a concurrent test runner to provide feedback.” - I personally don’t.


LLM's can't think, so "smarter" is not possible.


They can by the normal English definitions of "think" and "smart". You're just redefining those words to exclude AI because you feel threatened by it. It's tedious.


Incorrect. LLM's have no self-reflection capability. That's a key prerequisite for "thinking". ("I think, therefore I am.")

They are simple calculators that answer with whatever tokens are most likely given the context. If you want reasonable or correct answers (rather than the most likely) then you're out of luck.


It is not a key prerequisite for "thinking". It's "I think therefore I am" not "I am self-aware therefore I think".

In the 90s if your cursor turned into an hourglass and someone said "it's thinking" would you have pedantically said "NO! It is merely calculating!"

Maybe you would... but normal people with normal English would not.


Self-reflection is not the same thing as self-awareness.

Computers have self-reflection to a degree - e.g., they react to malfunctions and can evaluate their own behavior. LLMs can't do this, in this respect they are even less of a thinking machine than plain old dumb software.


Technically correct and completely besides the point.


People cant fly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: