It's not strictly that though, it's next word prediction with regularization. An... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		astrange 11 months ago \| parent \| context \| favorite \| on: Tracing the thoughts of a large language model It's not strictly that though, it's next word prediction with regularization. And the reason LLMs are interesting is that they /fail/ to learn it, but in a good way. If it was a "next word predictor" it wouldn't answer questions but continue them. Also, it's a next token predictor not a word predictor - which is important because the "just a predictor" theory now can't explain how it can form words at all!

Philpax 11 months ago [–]

Yes, I know; I was clarifying their immediate misunderstanding using the same terminology as them.

There's obviously a lot more going on behind the scenes, especially with today's mid- and post-training work!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact