$6 billion dollars raised in 2 months for their series C is blowing my mind. What does Anthropic have that OpenAI or other LLM startups don't? What other companies have raised that much in a single round?
I’m not saying your experience is wrong, but for my use case (code, mainly), I’ve found Claude to be far more reliable (not to mention faster) than GPT4.
It's such a small playing field with too few players.
I have a slide in a presentation on the topic that's an inverted pyramid, as pretty much the entire LLM field rests on only a few companies, who aren't necessarily even doing everything correctly.
The fact that they even have a seat at such a small table with so much in the pot already and much more forecasted means they command a large buy in regardless of their tech. They don't ever need to be in the lead, they just need to maintain their seat at the table and they'll still have money being thrown their way.
The threat of missing the next wave or letting a competitor gain exclusive access is too high at this point.
Of course, FOMO driving investments is also the very well known pattern of a bubble, and we may have a bit of a generative AI bubble among the top firms where large sums of money are going to go down the drain on investing into overvalued promises because the cost of missing a promise that will actually come to fruition is considered too high.
Ironically the real payoff is probably in focusing on integration layers at this point, particularly given the gains in performance over the past year in research by developing improved interfacing with SotA models.
LLMs at a foundational pretrained layer are in a race towards parity. Having access to model A isn't going to be much more interesting than having access to model B. But if you have a plug and play intermediate product that can hook into either model A or B and deliver improved results to direct access to either - that's where the money is going to be for cloud providers in the next 18 months.
The "big context window" talk reminds me of high wattage speakers in the 80s. Yes, it's loud. "Does it sound good?", is the question.
Having a large context window is pointless unless the model is able to attend to attention on the document submitted. As RAG is basically search, this helps set attention regardless of the model's context window size.
Stuffing random thing into a prompt doesn't improve things, so RAG is always required, even if it's just loading the history of the conversation into the window.
RAG is not always required. If you can fit a whole document/story/etc in the context length, it often performs better, especially for complex questions (not just basic retrieval).
Exactly. RAG is great if you need certain paragraphs to answer specific questions but there are limits.
Assuming a new unheard of play called "Romeo and Juliet " RAG can answer "Who is Rosaline?".
But a large context window can answer "How does Shakespeare use foreshadowing to build tension and anticipation throughout the play?" "How are gender roles and societal expectations depicted in the play" or "What does the play suggest about the power of love to overcome adversity?".
In other words, RAG doesn't help you answer questions that pertain to the whole document and aren't keyword driven retrieval queries. It's a pretty big limitation if you aren't just looking for a specific fact.
RAG limits you to answers where the information can be contained in your chunk size.
This is just a prompting hack you can use with any LLM, not exclusive to Claude. But I do like the fact that they include these tricks in their documentation.
The large(100k tokens) context window together with the fact that it can actually use the information in that context window. From personal experience other models including open ai fail to properly answer when provided large(more than 5k tokens) inputs as context even when the model officially accepts much larger contexts. But Claude 2 models are uncannily good at taking all that context into consideration.
I regularly compare queries with both GPT-4 and Claude 2 and I find I prefer Claude's answers most often. The 100k token context is a major plus too.
Claude is more creative and that comes with a higher rate of hallucinations. I hope I'm not misremembering but GPT-4 was also initially more prone to hallucinations but also more capable in some ways.
If you ran a successful lemonade stand, and suddenly I discovered that Al Capone gave you the money to buy lemons, I would find it relevant, although not sure how I would act on the information.
Also, possibly there is some money laundering aspect to these investments that we don't fully see through.