I think it is misleading to suggest today that tool-calling for nontrivial stuff really works with local models. It just works in demos because those tools always accept one or two arguments, usually string literals or numbers.
In the real world functions take more complex arguments, many arguments, or take a single argument that's an object with multiple attributes, etc. You can begin to work around this stuff by passing function signatures, typing details, and JSON-schemas to set expectations in context, but local models tend to fail at handling this kind of stuff long before you ever hit limits in the context window. There's a reason demos are always using 1 string literal like hostname, or 2 floats like lat/long. It's normal that passing a dictionary with a few strict requirements might need 300 retries instead of 3 to get a tool call that's syntactically correct and properly passed arguments. Actually `ping --help` for me shows like 20 options, and for any attempt to 1:1 map things with more args I think you'd start to see breakdown pretty quickly.
Zooming in on the details is fun but doesn't change the shape of what I was saying before. No need to muddy the water; very very simple stuff still requires very big local hardware or a SOTA model.
You and I clearly have a different idea of what "very very simple stuff" involves.
Even the small models are very capable of stringing together a short sequence of simple tool calls these days - and if you have 32GB of RAM (eg a ~$1500 laptop) you can run models like gpt-oss:20b which are capable of operating tools like bash in a reasonably useful way.
This wasn't true even six months ago - the local models released in 2025 have almost all had tool calling specially trained into them.
You mean like a demo for simple stuff? Something like hello world type tasks? The small models you mentioned earlier are incapable of doing anything genuinely useful for daily use. The few tasks they can handle are easier and faster to just write yourself with the added assurance that no mistakes will be made.
Iād love to have small local models capable of running tools like current SOTA models, but the reality is that small models are still incapable, and hardly anyone has a machine powerful enough to run the 1 trillion parameter Kimi model.
Yes, I mean a demo for simple stuff. This whole conversation is attached to an article about building the simplest possible tool-in-a-loop agent as a learning exercise for how they work.
Zooming in on the details is fun but doesn't change the shape of what I was saying before. No need to muddy the water; very very simple stuff still requires very big local hardware or a SOTA model.