Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I m not sure it's overrated, but the concerns are very real.

We love the model because it speaks our language as if it's "one of us", but this may be deceiving, and the complete lack of model for truth is disturbing. Making silly poems is fun but the real uses are in medicine and biology, fields that are so complex that they are probably impenetrable to the human mind. Can Reinforcement learning alone create a model for the truth? The Transformer does not seem to have one, it only works with syntax and referencing. How much % of truthfulness can we achieve, and is it good enough for scientific applications? If a blocker is found in the interface between the model and reality, it will be a huge disappointment



> model for the truth?

Without sensing/experiencing the world, there is no truth.

The only truth we can ever truly know, is the present moment.

Even our memories of things that we “know” that happened, we perceive them in the now.

Language doesn’t have a truth. You can make up anything you want with language.

So the only “truth” you could teach an LLM, is your own description of it. But these LLMs are trained on thousands or even million different versions of “truth”. Which is the correct one?


There is a paper showing you can infer when the model is telling the truth by finding a direction in activation space that satisfies logical consistency properties, such as that a statement and its negation have opposite truth values. Apparently we can detect even when the model is being deceitful.

https://arxiv.org/abs/2212.03827

Another approach - a model can learn the distribution - is this fact known or not in the training set, how many times does it appear, is the distribution unimodal (agreement) or multi-modal (disagreement or just high variance). Knowing this a model can adjust its responses accordingly, for example by presenting multiple possibilities or avoiding to hallucinate when there is no information.


I think for practical purposes you could hold that text from wikipedia or scientific papers if true, for example. The issue I think OP is referring to is if a LLM can refer back to these axiomatically true sources to ground and justify its outputs like a human would.


Well in that case, maybe the debate is: do we want that? Why?


If you can trust the model is at least as accurate as wikipedia then it becomes a drop in replacement for every task you do that requires wikipedia.

There are a whole range of tasks that can’t be done today with an LLM because of the hallucination issues. You can’t rely on the information it gives you when writing a research paper, for example.


For starters because one of the first products people decided to use these models for is a search engine, and I don't think it is a stretch to argue that search engines should have a positive relationship, rather than indifference, towards facts and the truth.


You can make up any reality that you want, just consume these first and don’t ask me where I found them.

In all seriousness though, what you are asking is whether an objective reality exists which is not a settled debate. There is also the whole solipsism thing though many disregard as a valid view of the world because it can be used to justify anything and is not a particularly interesting position.

Of course there is also the whole local realism thing with QM and of course the whole relativity thing and time flowing at different speeds destroying a universal “now”.

Then there is the whole issue with our senses being fallible and our brains hallucinating reality in a manner that is as confident as GPT3.5 is when making up facts.

In fact, it’s all just information and information doesn’t need a medium.


Our senses lie to us all the time. What we perceive may have strong to almost no correlation to reality. Can you see in the ultraviolet? No human can. Flowers look completely different. Same goes for sounds and smells.


It can be exact and self-consistent, you can teach the rules of mathematics . There are some things that are provably unprovable but thats a known fact.


You can still express contradiction in math.

The rules don’t determine the interpretation.

An LLM will pretty much always respect the rules of language, but it can use them to tell you completely fake stuff.


math is language


In exact domains you can often validate the model with numerical simulations, or use the simulations for reinforcement learning or evolution. The model can learn from outcomes, not only from humans. In biology it is necessary to validate experimentally, like any other drug or procedure.


I am not so sure,

there seems to be accumulating evidence that "finding the optimal solutions" means (requires) building a world model. Whether it's consistent with ground truth probably depends on what you mean by ground truth.

Given the hypothesis that the optimal solution for deep learning presented with a given training set, is to represent (simulate) the formal systemic relationships that generated that set, by "modeling" such relationships (or discovering non-lossy optimized simplifications),

I believe an implicit corollary, that the fidelity of simulation is only bounded by the information in the original data.

Prediction: a big enough network, well enough trained, is capable of simulating with arbitrary fidelity, an arbitrarily complex system, to the point that lack of fidelity hits a noise floor.

The testable bit of interest being whether such simulations predict novel states and outcomes (real world behavior) well enough.

I don't see why they shouldn't, but the X-factor would seem to be the resolution and comprehensiveness of our training data.

I can imagine toy domains like SHRDLU which are simple enough that we should be able to build large models well enough already to "model" them and tease this sort of speculation experimentally.

I hope (assume) this is already being done...


> there seems to be accumulating evidence that "finding the optimal solutions" means (requires) building a world model.

Was this ever in doubt? This has been the case forever (even before "AI"), and I thought it was well-established. The fidelity of the model is the core problem. What "AI" is really providing is a shortcut that allows the creation of better models.

But no model can ever be perfect, because the value of them is that they're an abstraction. As the old truism goes, a perfect map of a terrain would necessarily be indistinguishable from the actual terrain.


But no model can ever be perfect, because the value of them is that they're an abstraction. As the old truism goes, a perfect map of a terrain would necessarily be indistinguishable from the actual terrain.

Not sure why but I find this incredibly insightful…


> Prediction: a big enough network, well enough trained, is capable of simulating with arbitrary fidelity, an arbitrarily complex system, to the point that lack of fidelity hits a noise floor.

That is a pretty good description of human brains/bodies. You could also say that quantum physics is where our noise floor might be.


Here's an alternative to a model for truth. There is no truth, only power. Suppose we completely abandon logical semantics and instead focused on social semantics. Instead of the usual boolean True/False variables and logic relations, we'll have people valued variables and like/dislike relations. I system entirely for reasoning about the amount of pull and persuasion is present without ever circuiting down to any ground truth reasons. In other words, a bullshit reasoning system. Can ordinary truth reasoning be jerryrigged out of this system?


Yes, it s called empiricism


This was rhetorical. My point was that a system or model which cares about something other than the truth can, upon reaching a certain level of sophistication, be able to handle reasoning about truth. Eg, an AI that cares entirely about approval for what it says rather than the actuality of what it says could still end up reasoning about truth, given that truth is most heavily correlated with approval. I reject the premise that there has to be an a priori truth model under the hood.


RLHF is basically just applying social power to the machine. It’s used for good (ChatGPT won’t help you spread Nazi memes) and hegemony (ChatGPT won’t help you overthrow capitalism).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: