Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I still don’t really get this argument/excuse for why it’s acceptable that LLMs hallucinate. These tools are meant to support us, but we end up with two parties who are, as you say, prone to “hallucination” and it becomes a situation of the blind leading the blind. Ideally in these scenarios there’s at least one party with a definitive or deterministic view so the other party (i.e. us) at least has some trust in the information they’re receiving and any decisions they make off the back of it.




For these types of problems (i.e. most problems in the real world), the "definitive or deterministic" isn't really possible. An unreliable party you can throw at the problem from a hundred thousand directions simultaneously and for cheap, is still useful.

"The airplane wing broke and fell off during flight"

"Well humans break their leg too!"

It is just a mindlessly stupid response and a giant category error.

The way an airplane wing and a human limb is not at all the same category.

There is even another layer to this that comparing LLMs to the brain might be wrong because the mereological fallacy is attributing the brain "thinks" vs the person/system as a whole thinks.


You are right that the wing/leg comparison is often lazy rhetoric: we hold engineered systems to different failure standards for good reason.

But you are misusing the mereological fallacy. It does not dismiss LLM/brain comparisons: it actually strengthens them. If the brain does not "think" (the person does), then LLMs do not "think" either. Both are subsystems in larger systems. That is not a category error; it is a structural similarity.

This does not excuse LLM limitations - rimeice's concern about two unreliable parties is valid. But dismissing comparisons as "category errors" without examining which properties are being compared is just as lazy as the wing/leg response.


Have you ever employed anyone?

People, when tasked with a job, often get it right. I've been blessed by working with many great people who really do an amazing job of generally succeeding to get things right -- or at least, right-enough.

But in any line of work: Sometimes people fuck it up. Sometimes, they forget important steps. Sometimes, they're sure they did it one way when instead they did it some other way and fix it themselves. Sometimes, they even say they did the job and did it as-prescribed and actually believe themselves, when they've done neither -- and they're perplexed when they're shown this. They "hallucinate" and do dumb things for reasons that aren't real.

And sometimes, they just make shit up and lie. They know they're lying and they lie anyway, doubling-down over and over again.

Sometimes they even go all spastic and deliberately throw monkey wrenches into the works, just because they feel something that makes them think that this kind of willfully-destructive action benefits them.

All employees suck some of the time. They each have their own issues. And all employees are expensive to hire, and expensive to fire, and expensive to keep going. But some of their outputs are useful, so we employ people anyway. (And we're human; even the very best of us are going to make mistakes.)

LLMs are not so different in this way, as a general construct. They can get things right. They can also make shit up. They can skip steps. The can lie, and double-down on those lies. They hallucinate.

LLMs suck. All of them. They all fucking suck. They aren't even good at sucking, and they persist at doing it anyway.

(But some of their outputs are useful, and LLMs generally cost a lot less to make use of than people do, so here we are.)


I don’t get the comparison. It would be like saying it’s okay if an excel formula gives me different outcomes everytime with the same arguments, sometimes right, but mostly wrong.

People can accomplish useful things, but sometimes make mistakes and do shit wrong.

The bot can also accomplish useful things, and sometimes make mistakes and do shit wrong.

(These two statements are more similar in their truthiness than they are different.)


As far as I can tell (as someone who worked on the early foundation of this tech at Google for 10 years) making up “shit” then using your force of will to make it true is a huge part of the construction of reality with intelligence.

Will to reality through forecasting possible worlds is one of our two primary functions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: