Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> adjust your paths as necessary. It has a tendency to talk to itself.

This is always a fun surprise. What I've seen help, especially with chat models, is to use a prompt template. Some tools (e.g. https://ollama.ai/ – mentioned in the article) use a default, model-specific prompt template when you run the model. This is easier for users since they can just input your chat messages and get answers. The hard part is every model is trained (and behaves) differently.

With llama.cpp you'd need wrap your prompt text with the right template. For llama 2, the facebook developers' generation code wraps the system prompt and user prompts in specific tags (<<SYS>>{system prompt}<</SYS>> and [INST]{user prompt}[/INST]) respectively): https://github.com/facebookresearch/llama/blob/main/llama/ge....

Worth noting that customizing prompt templates can be fun – you don't have to use "prescribed" one – I've had a model generate a conversation between a few characters that talk to each other for example – it's pretty entertaining!



As a followup on LLama2's prompting, here's a good thread with some more details: https://www.reddit.com/r/LocalLLaMA/comments/155po2p/get_lla... (see also: https://www.reddit.com/r/LocalLLaMA/comments/1561vn5/here_is... - it's a bit complex and people are still wrapping their heads around it)

Note, it is possible to reinject <<SYS>> prompts during the conversation to keep the LLM on target: https://twitter.com/overlordayn/status/1681631554672513025 (but obviously this should be the first thing you filter and track if you are running an LLM that is accessible to end-users - rebuff is a lib w/ some good ideas to start with)

With new fine tunes coming out, it's worth noting again that different datasets/models all use slightly different prompt formats (many are using multiple datasets these days, so it's hard to say just how much it matters now): https://www.reddit.com/r/LocalLLaMA/comments/13lwwux/comment...

Ideally, fine tunes for LLama2 would regularize all datasets to the official tokens/format and inference interfaces could standardize too (or there could be some metadata collected for what model uses what, but some of the low hanging fruit that's out for the future I guess). One other thing to keep in mind is that all the benchmarks/leaderboards don't use instruct formatting at all, and don't really represent capabilities for a model. IMO, ELO-style rankings against models for specific tasks would probably be more representative.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: