Rumors said that GPT4.5 is an order of magnitude larger. Around 12 trillion para...

glenstein · on March 2, 2025

This is all excellent detail. Wondering if there's any good suggestions for further reading on the inside baseball of what happened with GPT 4.5?

qeternity · on March 2, 2025

Well, it's not...it gets most details wrong.

glenstein · on March 2, 2025

Can you elaborate?

qeternity · on March 2, 2025

GPT-4 was rumored to be 1.8T params...not 1.2

And the successor model was called "Orion", not "Omni".

glenstein · on March 2, 2025

Appreciate the corrections, but I'm still a bit puzzled. Are they wrong about 4.5 having 12 trillion parameters, it originally intending to be Orion (not omni), or an expected successor to GPT 4? And do you have any related reading that speaks to any of this?

az226 · on March 3, 2025

GPT-4 was 1.3T. 221B active. 2 experts active. 16 experts total.

qeternity · on March 6, 2025

That sounds correct except for that total parameter count. 110B per expert at 16 experts puts you just shy of 1.8T. Are you suggesting there are ca. 30B shared params between experts?

qeternity · on March 2, 2025

GPT-4 was rumored to be 1.8T params...not 1.2

And the successor model was called "Orion", not "Omni".

ljlolel · on March 2, 2025

the gpt-4o ("omni") is probably a distilled 4.5; hence why not much quality difference

sigmoid10 · on March 2, 2025

4o has been out since May last year, while omni (now rechristened as 4.5) only finished training in October/November.

cubefox · on March 2, 2025

4.5 was called Orion, not Omni.

zaptrem · on March 3, 2025

You're thinking of "Orion" not "Omni" (GPT 4o stands for "Omni" since it's natively multimodal with image and audio input/output tokens)

Leary · on March 2, 2025

How does this compare with Grok 3's parameter count? I know Grok 3 was trained on a larger cluster (100k-200k) but GPT 4.5 used distributed training.