Thank you. If you don't mind... > tweaking its parameters to more likely say the... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		consumer451 on April 30, 2024 \| parent \| context \| favorite \| on: Run llama3 locally with 1M token context Thank you. If you don't mind... > tweaking its parameters to more likely say the correct thing next time. Is this entirely, or just partially done via human feedback on models like GPT-4 and LLama-3, for example?

a_wild_dandan on May 1, 2024 | [–]

Training is done by partially hiding (masking) text and making a (decoder-only) model predict what's missing. Training text which humans prefer as model responses is called RLHF. It's an expensive and small text dataset.

hirako2000 on April 30, 2024 | [–]

Look into agent training.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact