Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I thought whenever the knowledge cutoff increased that meant they’d trained a new model, I guess that’s completely wrong?




They add new data to the existing base model via continuous pre-training. You save on pre-training, the next token prediction task, but still have to re-run mid and post training stages like context length extension, supervised fine tuning, reinforcement learning, safety alignment ...

Continuous pretraining has issues because it starts forgetting the older stuff. There is some research into other approaches.

Typically I think, but you could pre-train your previous model on new data too.

I don’t think it’s publicly known for sure how different the models really are. You can improve a lot just by improving the post-training set.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: