Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's not strictly that though, it's next word prediction with regularization.

And the reason LLMs are interesting is that they /fail/ to learn it, but in a good way. If it was a "next word predictor" it wouldn't answer questions but continue them.

Also, it's a next token predictor not a word predictor - which is important because the "just a predictor" theory now can't explain how it can form words at all!



Yes, I know; I was clarifying their immediate misunderstanding using the same terminology as them.

There's obviously a lot more going on behind the scenes, especially with today's mid- and post-training work!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: