Yes, openAI is dumping the market with chat-gpt 3.5. Vulture capital behaviour at its finest, and I'm sure government regulations will definitely catch on to this in 20 or 30 years...
It's cheaper than the ELECTRICITY cost of running a llama-70 on your own M1.Max (very energy efficient chip) assuming free hardware.
I guess they are also getting a pretty good cache hit rate - there are only so many questions people ask at scale. But still, it's dumping.
Based on my research, GPT-3.5 is likely significantly smaller than 70B parameters, so it would make sense that it's cheaper to run. My guess is that OpenAI significantly overtrained GPT-3.5 to get as small a model as possible to optimize for inference. Also, Nvidia chips are way more efficient at inference than M1 Max. OpenAI also has the advantage of batching API calls which leads to better hardware utilization. I don't have definitive proof that they're not dumping, but economies of scale and optimization seem like better explanations to me.
I also do not have proof of anything here, but can't it be both?
They have lots of money now and the market lead. They want to keep the lead and some extra electricity and hardware costs are surely worth it for them, if it keeps the competition from getting traction.
gpt3.5 turbo is (mostly likely) Curie which is (most likely) 6.7b params. So, yeah, makes perfect sense that it can't compete with a 70b model on cost.
You think they are caching? Even though one of the parameters is temperature? Can of worms, and should be reflected in the pricing if true, don't get me started if they are charging per token for cached responses.
It's cheaper than the ELECTRICITY cost of running a llama-70 on your own M1.Max (very energy efficient chip) assuming free hardware.
I guess they are also getting a pretty good cache hit rate - there are only so many questions people ask at scale. But still, it's dumping.