The "big context window" talk reminds me of high wattage speakers in the 80s. Yes, it's loud. "Does it sound good?", is the question.
Having a large context window is pointless unless the model is able to attend to attention on the document submitted. As RAG is basically search, this helps set attention regardless of the model's context window size.
Stuffing random thing into a prompt doesn't improve things, so RAG is always required, even if it's just loading the history of the conversation into the window.
RAG is not always required. If you can fit a whole document/story/etc in the context length, it often performs better, especially for complex questions (not just basic retrieval).
Exactly. RAG is great if you need certain paragraphs to answer specific questions but there are limits.
Assuming a new unheard of play called "Romeo and Juliet " RAG can answer "Who is Rosaline?".
But a large context window can answer "How does Shakespeare use foreshadowing to build tension and anticipation throughout the play?" "How are gender roles and societal expectations depicted in the play" or "What does the play suggest about the power of love to overcome adversity?".
In other words, RAG doesn't help you answer questions that pertain to the whole document and aren't keyword driven retrieval queries. It's a pretty big limitation if you aren't just looking for a specific fact.
RAG limits you to answers where the information can be contained in your chunk size.
Having a large context window is pointless unless the model is able to attend to attention on the document submitted. As RAG is basically search, this helps set attention regardless of the model's context window size.
Stuffing random thing into a prompt doesn't improve things, so RAG is always required, even if it's just loading the history of the conversation into the window.