Hacker Newsnew | past | comments | ask | show | jobs | submit | riidom's commentslogin

if you are willing to change, Vivaldi has that.

That feature list is what I don't want

In Chrome, I can disable them via flags/settings


That mlx is for apple hardware only, though? Or did I misunderstand something.

It needs a llama.cpp fork, too; so the stock runtime (based on stock llama.cpp) used by LM Studio presumably won't work for it.

Not a word about the tok/sec, unfortunately.

It won’t be meaningful considering the architecture: it’s a harness around the model that generated multiple solutions in multiple passes using the test to measure compliance and repair broken solutions. The resulting program won’t be streamed to you because it has existed for minutes as it goes through the cycle. It’s more for an asynchronous use-case.

I, too, was interested because I am always eager to use local models in my claw-like. It looks like this could be useful for an async portion of the harness but it wouldn’t work in interactive contexts.

Very cool ensemble of techniques, particularly because they’re so accessible. I think I will use this form for reusable portions of web browsing functionality in my personal agent.


> A single patched llama-server runs on K3s, providing both generation with speculative decoding (~100 tok/s)

There seems to be at least some detail on that point.


on phone: 2FA Manager from OpenStore on UBports phone

on work laptop: 1PW


Came here for this. In one of Nathan's blog posts he describes the notification as noticably driving the donations. Personally I haven't seen it ever myself, and what I also haven't seen is any complaints about it.

Would have been a better comparison than Wikimedia I guess, but aside that, the LibreOffice team still has a valid point that the reactions are unjustified.

In a week nobody will talk about it anymore though, so LO team, just sit it through :)


Oh yes. Ok, that's probably on bash, but you look at the script and it's like 200 lines of code. Then you read the alternate install instructions and it goes like "download binary, make executable, add to $PATH, run" - ???


Text is misleading too. 5-7 tok/sec is not reading speed, it's a tad slower. For me, at least, and I am an experienced reader, not especially schooled in quick-reading though.

I happened to "live" on 7.0-7.5 tok/sec output speed for a while, and it is an annoying experience. It is the equivalent of walking behind someone slightly slower on a footwalk. I dealt with this by deliberately looking away for a minute until output was "buffered" and only then started reading.

For any local setup I'd try to reach for 10 tok/sec. Sacrifice some kv cache and shove a few more layers on your GPU, it's worth it.


An example for this is the Blender addon ecosystem. Blender moves very fast, breaking API changes every few versions. Now I am not an addon developer myself, but from github issues I follow, changes are fairly often trivial to do.

Yet, someone has to do them. Ideally it is the creator of the addon, sometimes it's the users who do it, when the addon is not maintained anymore (in case of trivial changes).

It kinda works that way, but it also is some kind of gamble for the user. When you see a new addon (and a new addon developer), you can't know if they gonna stick to it or not.

If you have to pay for the addon, it's more likely they maintain it, of course. But also not a guarantee.


LM Studio has an option on model load that I believe does what you describing here: "K Cache Quantization Type" (and similar for "V"). It's marked as experimental and says the effect is basically hard to predict. Never tried myself, though.


These 200 LOC install scripts turn me heavily off as well. But at least in this case, you can also just download the correct zip, extract the binary and do "./llmfit".


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: