The gold standard of local-only model inference for LLaMA, alpaca, and friends i...

Der_Einzige · on July 19, 2023

The real gold standard is https://github.com/oobabooga/text-generation-webui

Which includes the llama.cpp backend, and a lot more.

Unfortunately, despite claiming to be the "Automatic1111" of text generation, it doesn't support any of the prompt engineering capabilities (i.e. negative prompts, prompt weights, prompt blending, etc) available in Automatic1111, despite the fact that it's not difficult to implement - https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c...

Luckily for Ooga Booga, no one else supports it either. Why this is? I have no explanation except that the NLP community doesn't know jack about prompt engineering, which is Kafkaesque

LoganDark · on July 19, 2023

> Hoping they add support for llama 2 soon!

The 7B and 13B models use the same architecture, so llama.cpp already supports it. (Source: running 13b-chat myself using llama.cpp GPU offload.) It's only 70B that has additional extensions that llama.cpp would need to implement.

simonw · on July 19, 2023

Using their default CLI tools from a shell script is sadly a little bit tricky.

I opened a feature request a while back suggesting they add a --json mode to make that easier, hasn't gained much traction though: https://github.com/ggerganov/llama.cpp/issues/1739

mikeravkine · on July 19, 2023

Llama2 7B and 13B GGML are up and work with existing llama.cpp no changes needed! The 70B does require a llama.cpp change, but I'm sure it won't take long.

bcjordan · on July 19, 2023

Are there new system requirements known for these?