Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The gold standard of local-only model inference for LLaMA, alpaca, and friends is LLaMA-cpp, https://github.com/ggerganov/llama.cpp No dependencies, no GPU needed, just point it to a model snapshot that you download separately on bittorrent. Simple CLI tools that are usable (somewhat) from shell scripts.

Hoping they add support for llama 2 soon!



The real gold standard is https://github.com/oobabooga/text-generation-webui

Which includes the llama.cpp backend, and a lot more.

Unfortunately, despite claiming to be the "Automatic1111" of text generation, it doesn't support any of the prompt engineering capabilities (i.e. negative prompts, prompt weights, prompt blending, etc) available in Automatic1111, despite the fact that it's not difficult to implement - https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c...

Luckily for Ooga Booga, no one else supports it either. Why this is? I have no explanation except that the NLP community doesn't know jack about prompt engineering, which is Kafkaesque


> Hoping they add support for llama 2 soon!

The 7B and 13B models use the same architecture, so llama.cpp already supports it. (Source: running 13b-chat myself using llama.cpp GPU offload.) It's only 70B that has additional extensions that llama.cpp would need to implement.


Using their default CLI tools from a shell script is sadly a little bit tricky.

I opened a feature request a while back suggesting they add a --json mode to make that easier, hasn't gained much traction though: https://github.com/ggerganov/llama.cpp/issues/1739


Llama2 7B and 13B GGML are up and work with existing llama.cpp no changes needed! The 70B does require a llama.cpp change, but I'm sure it won't take long.


Are there new system requirements known for these?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: