VLLM automatic prefix / prompt caching

danielhanchen · on Aug 25, 2024

vLLM, an open source library for LLM serving has Automatic Prefix or Prompt caching like what Claude provides. However, it's more on the fly prefix caching, which ChatGPT and other chat systems probably already enable by default in the backend, but just not exposing the cost reductions to the user.