OpenAI is valued at 90 billion and all they do is make GPT; Twitter is valued at 40 billion and this was essentially a vanity side-project by a cowboy CEO. Presuming that benchmarks and general “it’s about the level of 3.5” is accurate, it’s inefficient, but not incredibly inefficient imho
They weren't even 44B when elon took the keys - he specifically tried to back out of the deal because 44B was insane peak '21 asset bubble price. In truth they were probably like 10-15B at that moment. And now that bunch of advertisers left due to we know who it's probably about 10B
twitter was valued around 30 billion when musk tried getting out of buying it (then the market cap went up when it became clear that he would be forced to pay full price)
Since it is MoE, quantized it could be able to run on cheaper hardware with just consumer networking inbetween instead of needing epyc/xeon levels of PCI-e lanes, nvlink, or infiniband type networking. Or it could even run with people pooling smaller systems over slow internet links.
Nothing about using data in "real time" predicates that the model parameters need to be this large, and is likely quite inefficient for their "non-woke" instructional use-case.
Agreed. We have been building our real-time GPT flows for news & social as part of Louie.AI, think monitoring & and investigations... long-term, continuous training will become amazing, but for the next couple of years, most of our users would prefer GPT4 or Groq vs what's here and much smarter RAG. More strongly, the interesting part is how the RAG is done. Qdrant is cool but just a DB w a simple vector index, so nothing in Grok's release is tech we find relevant to our engine.
Eg, there is a lot of noise in social data, and worse, misinfo/spam/etc, so we spend a lot of energy on adverserial data integration. Likewise, queries are often neurosymbolic, like on a data range or with inclusion/exclusion criteria. Pulling the top 20 most similar tweets to a query and running through a slow, dumb, & manipulated LLM would be a bad experience. We have been pulling in ideas from agents, knowledge graphs, digital forensics & SNA, code synthesis, GNNS, etc for our roadmap, which feels quite different from what is being shown here.
We do have pure LLM work, but more about fine-tuning smaller or smarter models, and we find that to be a tiny % of the part people care about. Ex: Spam classifications flowing into our RAG/KG pipelines or small model training is more important to us than it flowing into a big model training. Long-term, I do expect growing emphasis on the big models we use, but that is a more nuanced discussion.
(We have been piloting w gov types and are preparing for next cohorts, in case useful on real problems for anyone.)