Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thank you. If you don't mind...

> tweaking its parameters to more likely say the correct thing next time.

Is this entirely, or just partially done via human feedback on models like GPT-4 and LLama-3, for example?



Training is done by partially hiding (masking) text and making a (decoder-only) model predict what's missing. Training text which humans prefer as model responses is called RLHF. It's an expensive and small text dataset.


Look into agent training.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: