I use 4.6, because 4.7 is super lazy, deflects responsibility, and assumes it is good and I am bad, and avoids checking reality. It looks like it's trained on lazy humans instead of good engineers.
Should I try 4.8? I am happy with 4.6. I am not happy with 4.7.
I still use 4.7. I don’t know what I’m doing wrong but 4.7 frequently tells me to it’s time to sleep at all hours of the day while working. I’ve tried clearing all my memory/agents files.
I’m hoping the “go to sleep” behavior has been rlhf’d away in 4.8.
The go to sleep issue is common and has nothing to do with your setup. I suspect it's because for the agent to predict the End of Response token, its response needs some kind of closing, and the most final kind of closing is something like "get some rest".
I also had this behaviour sometimes, it’s specifically called out in the system card in section 6.2.1.1 - although I didn’t actually see if they said they decisively fixed the issue.
One big difference I see in how minds work is their ability to model. A few people I know focus on end-to-end model thinking — trying to maximize against some functions.
The rest of them seem to avoid thinking outside a small info bubble/closed system, not sure why, but it looks like it creates anxiety when they start feeding too much info. Instead of trying to extract abstractions that make it possible to model the complexity and fuzz the non-important parts.
The same people, when they come up with a product implementation idea, avoid thinking about all the things required to be in place for the product to actually satisfy some real need/want. The product ends up detached from user need. The tech doesn't work.
And it doesn't matter if they can code or if Claude can work, because the directions required to define what needs to be there are not present.
I realized I need to take the thinking of these people as their ideas and advice — as inputs that need to be sanitized. The easy way I found is to remodel why they think the way they think — it's like having a safe VM running their code (thinking) through my computer (my brain).
These same people, usually years later, realize you were telling them things they were not able to comprehend at the time, but they realized those things were so important and would have saved them from suffering so much. They start to treat your voice with reverence instead of thinking through with their own minds and stress-testing against reality.
I would love to read some research about this and how to take advantage of it, or at least avoid the toxic influence such minds can have against one's own well-being and success.
When I was reasonably experienced, and about 6 months into a new job, one of my co-workers, Fred, had been there for a very long time.
Fred was very knowledgeable, and Fred could think. But Fred had deeply held preconceived notions that interfered with the utility of his thoughts.
Fred also had something else, though -- he could set aside those deeply preconceived notions. Not for himself, mind you, but for someone else.
I could, and did on a very regular basis, go to Fred and say "Fred! Assume X, Y, and Z. What happens if A and B?" Now even though Fred was completely sure that I was wrong about at least Y and Z, he could set aside those prejudices, and (correctly!) reason through what would happen.
Fred reminds me of the people you describe, except that instead of having Fred's code running through my brain, I was able to insert my code into Fred's brain.
There has to be some language for what you're describing, because I've experienced similar perceptions of people. I've been on both sides of the "you don't actually understand what is being said until years later" discussions as well.
One thing I'd posit is that it might not be the people, but rather an ability or skill people can improve. It takes difficulty and focus and time, but it can be honed. I know this to be true; otherwise I would not be able to understand that which I dismissed years ago!
The problem with this, if it is an active ability rather than a person's inherent traits, is that it becomes impossible to recognize by any identifiable markers. I've often found that some of the most intelligent and "look at the problem from all angles" kind of minds often have very glaring blind spots (look no further than political subjects, for example). These blind spots manifest even in the most generally intelligent thinkers, that's what makes people human after all. So it can be said that good thinking is a very delicate thing, and it can be difficult to recognize against noise no matter who is speaking or what they are discussing.
I hope they can figure out a cheaper and less bulky option. It's pretty cool to unlock the ability to have a screen and apps popping up in the real space without the constraint of a screen.
I would not pay all that money to have something on that is that bulky and get annoying after a couple of hours
I wonder how much of this reasoning will make sense in the future. How much of the way you are thinking is based on the past curves reality worked before? Are you taking into account exponential acceleration? I guess abundance will flow in such a way that the idea of debt will be a thing of the past.
I shared a couple of days ago why they were not doing like Google and offering oss models, but damn, offering Anthropic models after all the badmouthing. Next news: OpenAI models live on Colossus 2
I want to just talk to the Mac and have it do things. I tried computer use and other alternatives, but the latency made it unusable.
I want to be able to control both Mac, apps and the browser. I also need it to figure out things by itself given a goal.
Claude Code with the --chrome flag is kind of good, but it's too slow. I wanted to try faster APIs, like the one hosted on Cerebras, but it's too expensive.
Do you want to do something that can't be done through AppleScript, macOS accessibility APIs, and something like Puppeteer to control the browser?
Or something you don't understand how to do manually?
Because I guess I don't understand the attraction of using an LLM for system automation where existing interfaces exist, other than as a form of documentation, or to write code using these interfaces.
Grok is pretty bad. No wonder usage is low. I think they messed up when they removed the human annotation team and went in the direction of automation.
The bet can eventually pay off when they figure out how to train without human help and also generate useful models. Imagine is terrible too.
More competition is great for us users. I hope they recover. In the meantime why not hosting oss models like google does?
My understanding is that inference (running existing models) is around 1/4th of the average compute budget for AI companies. Training new models takes up about 3/4ths.
As such, using only 11% of their GPUs indicates that they've elected not to do as much training as they are capable of.
I tried all the versions, starting from Grok3. Grok3 was the one I was getting real work done with. After that just terrible experiences.
I use LLMs for asking questions and coding. Generated answers are bad. Generated code is bad. Images and videos look super fake. And there is no fixed subscription to use it in CLI terminals.
If it was as good as other LLMs, as you say, why 11% usage, and why selling compute to Claude after all the badmouthing? I prefer deepseek 10x.
I was interested in trusted execution environments and how safe they were. If you look on google scholar and start reading, they seem super vulnerable. The feeling is that the industry has no better option and that they are a way to tell customers they are safe when they're not
Should I try 4.8? I am happy with 4.6. I am not happy with 4.7.
reply