> but are the hallucinations always wrong to the same degree No, but yes largely...

> but are the hallucinations always wrong to the same degree

No, but yes largely because you're asking the same types of questions with the same rough parameters, so it'll make up roughly the same sort of thing (ie, citations) again.

The issue is that the LLM is trained to generate plausible words, not to recite which piece of training data is also the best source. If you want to make an app using "AI" you need to target what it can do well. If you want it to write citations you need to give it your list of references and tell it to use only those.

> I'm imagining an investigator with reams and reams of information about a murder case and suspect. Then, prompting an LLM trained on all the case data and social media history and anything else available about their main suspect, "where did so-and-so hide the body?". Would the response, being what's most probable based on the data, be completely worthless or would it be worth the investigator's time to check it out?

That specific question would produce results about like astrology, because unless the suspect actually wrote those words directly it'd be just as likely to hallucinate any other answer that fits the tone of the prompt.

But trying to think of where it would be helpful ... if you had something where the style was important, like matching some of their known, or writing similar style posts as bait, etc wouldn't require it to make up facts so it wouldn't.

And maybe there's an English suspect taunting police and using the AI could let an FBI agent help track them down by translating cockney slang, or something. Or explaining foreign idiom that they might have missed.

Anything where you just ask the AI what the answer is, is not realistic.

> Would the investigator have any idea if the response is worthless or not?

They'd have to know what types of things it can't answer, because it's not like it can be trusted when it can be shown to not have hallucinated, it's that it is not and can't be used as a information-recall-from-training tool and all such answers are suspect.