I thought whenever the knowledge cutoff increased that meant they’d trained a ne...

rockinghigh · 2025-12-11T19:46:17 1765482377

They add new data to the existing base model via continuous pre-training. You save on pre-training, the next token prediction task, but still have to re-run mid and post training stages like context length extension, supervised fine tuning, reinforcement learning, safety alignment ...

astrange · 2025-12-12T00:15:09 1765498509

Continuous pretraining has issues because it starts forgetting the older stuff. There is some research into other approaches.

brokencode · 2025-12-11T18:37:54 1765478274

Typically I think, but you could pre-train your previous model on new data too.

I don’t think it’s publicly known for sure how different the models really are. You can improve a lot just by improving the post-training set.