Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
VLLM automatic prefix / prompt caching (vllm.ai)
2 points by danielhanchen on Aug 25, 2024 | hide | past | favorite | 1 comment


vLLM, an open source library for LLM serving has Automatic Prefix or Prompt caching like what Claude provides. However, it's more on the fly prefix caching, which ChatGPT and other chat systems probably already enable by default in the backend, but just not exposing the cost reductions to the user.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: