Hacker Newsnew | past | comments | ask | show | jobs | submit | faitswulff's commentslogin

Anthropic's position is that thinking tokens aren't actually faithful to the internal logic that the LLM is using, which may be one reason why they started to exclude them:

https://www.anthropic.com/research/reasoning-models-dont-say...


That's interesting research, but I think a more important reason that you don't have access to them (not even via the bare Anthropic api) is to prevent distillation of the model by competitors (using the output of Anthropic's model to help train a new model).

Yeah. And it’s another reason not to trust them. Who know what it is doing with your codebase.

Imagine if you’re a competitor. It wouldn’t be a stretch to include a sneaky little prompt line saying “destroy any competitors to anthropic”.


If you can't trust a company, don't use their api or cloud services. No amount of external output will ever validate anything, ever. You never know what's really happening, just because you see some text they sent you.

> Who know what it is doing with your codebase.

People who review the code? The code is always going to be a better representation of what it's doing than the "thinking" anyway.


If distilled models were commercially banned they'd probably be willing to show the thinking again.

Intellectual property rights in models? But then wouldn't the model maker have to pay for all the training IP?

(just kidding, I know that the legal rule for IP disputes is "party with more money wins")


how does one actually enforce that? I mean especially for code? You can always just clean room it

How do you think such a ban should work?

Do you not see that the next (or previous) logical step would be a "commercial ban" of frontier models, all "distilled" from an enormous amount of copyrighted material?


I'm not arguing the merits of such a ban, I'm simply stating a fact - that thinking transcripts likely won't return until such a ban is in place.

That probably matters for some scenarios, but I have yet to find one where thinking tokens didn't hint at the root cause of the failure.

All of my unsupervised worker agents have sidecars that inject messages when thinking tokens match some heuristics. For example, any time opus says "pragmatic", its instant Esc Esc > "Pragmatic fix is always wrong, do the Correct fix", also whenever "pre-existing issue" appears (it's never pre-existing).


> also whenever "pre-existing issue" appears (it's never pre-existing)

I dunno... There were some pre-existing issues in my projects. Claude ran into them and correctly classified as pre-existing. It's definitely a problem if Claude breaks tests then claims the issue was pre-existing, but is that really what's happening?

I agree with the correctness issue.


> For example, any time opus says "pragmatic", its instant Esc Esc > "Pragmatic fix is always wrong, do the Correct fix", also whenever "pre-existing issue" appears (it's never pre-existing).

It's so weird to see language changes like this: Outside of LLM conversations, a pragmatic fix and a correct fix are orthogonal. IOW, fix $FOO can be both.

From what you say, your experience has been that a pragmatic fix is on the same axis as a correct fix; it's just a negative on that axis.


It's contextual though, and pragmatic seems different to me than correct.

For example, if you have $20 and a leaking roof, a $20 bucket of tar may be the pragmatic fix. Temporary but doable.

Some might say it is not the correct way to fix that roof. At least, I can see some making that argument. The pragmatism comes from "what can be done" vs "should be".

From my perspective, it seems viable usage. And I guess on wonders what the LLM means when using it that way. What makes it determine a compromise is required?

(To be pragmatic, shouldn't one consider that synonyms aren't identical, but instead close to the definition?)


> It's contextual though, and pragmatic seems different to me than correct.

To me too, that's why I say they are measurements on different dimensions.

To my mind, I can draw a X/Y axis with "Pragmatic" on the Y and "Correctness" on the X, and any point on that chart would have an {X,Y} value, which is {Pragmatic, Correctness}.

If I am reading the original comment correctly, poster's experience of CC is that it is not an X/Y plot, it is a single line plot, with "Pragmatic" on the extreme left and "Correctness" on the extreme right.

Basically, any movement towards pragmatism is a movement away from correctness, while in my model it is possible to move towards Pragmatic while keeping Correctness the same.


I don't think it's a single axis even in the original poster's conception, since you could be both incorrect and also not pragmatic.

But if a fix needs to be described as pragmatic relative to the alternatives, that's probably because it couldn't be described as correct. Otherwise you wouldn't be talking about how pragmatic it is.


I had some interesting experience to the opposite last night, one of my tests has been failing for a long time, something to do with dbus interacting with Qt segfaulting pytest. Been ignoring it for a long time, finally asked claude code to just remove the problematic test. Come back a few minutes later to find claude burning tokens repeatedly trying and failing to fix it. "Actually on second thought, it would be better to fix this test."

Match my vibes, claude. The application doesn't crash, so just delete that test!


I somewhat understand Anthropic's position. However, thinking tokens are useful even if they don't show the internal logic of the LLM. I often realize I left out some instruction or clarification in my prompt while reading through the chain of reasoning. Overall, this makes the results more effective.

It's certainly getting frustrating having to remind it that I want all tests to pass even if it thinks it's not responsible for having broken some of them.


What's the implication of this? That the model already decided on a solution, upon first seeing the problem, and the reasoning is post hoc rationalization?

But reasoning does improve performance on many tasks, and even weirder, the performance improves if reasoning tokens are replaced with placeholder tokens like "..."

I don't understand how LLMs actually work, I guess there's some internal state getting nudged with each cycle?

So the internal state converges on the right solution, even if the output tokens are meaningless placeholders?


>That the model already decided on a solution, upon first seeing the problem, and the reasoning is post hoc rationalization?

Yes it plans ahead, but with significant uncertainty until it actually outputs these tokens and converges on a definite trajectory, so it's not a useless filler - the closer it is to a given point, the more certain it is about it, kind of similar to what happens explicitly in diffusion models. And it's not all that happens, it's just one of many competing phenomena.


> I don't understand how LLMs actually work...

Plot twist, they don't either. They just throw more hardware and try things up until something sticks.


I have seen this to be true many times. The CoT being completely different from the actual model output.

Not limited to Claude as well.


so not only are the sycophantic, hallucinatory, but now they're also proven to be schizophrenic.

neato.


Nah it’s an anti distillation move

So like many of the promises from AI companies, reported chain of thought is not actually true (see results below). I suppose this is unsurprising given how they function.

Is chain of thought even added to the context or is it extraneous babble providing a plausible post-hoc justification?

People certainly seem to treat it as it is presented, as a series of logical steps leading to an answer.

‘After checking that the models really did use the hints to aid in their answers, we tested how often they mentioned them in their Chain-of-Thought. The overall answer: not often. On average across all the different hint types, Claude 3.7 Sonnet mentioned the hint 25% of the time, and DeepSeek R1 mentioned it 39% of the time. A substantial majority of answers, then, were unfaithful.‘


I mean, obviously, it's not going to be a faithful representation of the actual thinking. The model isn't aware of how it thinks any more than you are aware how your neurons fire. But it does quantitatively improve performance on complex tasks.

As you can see from posts on this story, most people believe it reflects what the model is thinking and use it as a guide to that so they can ‘correct’ it. If it is not in fact chain of thought or thinking it should not be called that.

It is the same with human chain of thought, though. Both of them are post-hoc rationalisations justifying "gut feelings" that come from thought processes the human/agent doesn't have introspection into. And yet asking humans or machines to "think out loud" this way does increase the quality of their work.

if its not a faithful representation of the actual thinking, why would they be scared of people distilling against it

Because even though it's not representative of the actual thought process, chain of thought improves model performance.

Does LM Studio have an equivalent to the ollama launch command? i.e. `ollama launch claude --model qwen3.5:35b-a3b-coding-nvfp4`

I don't think it does, but llama.cpp does, and can load models off HuggingFace directly (so, not limited to ollama's unofficial model mirror like ollama is).

There is no reason to ever use ollama.


> I don't think it does, but llama.cpp does

I just checked their docs and can't see anything like it.

Did you mistake the command to just download and load the model?


As a sibling comment answered you, it is `-hf`.

And yes, it downloads the model, caches it, and then serves future loads of that model out of the cache if the file hasn't changed in the hf repo.


So I'm summary: no, it does not have an equivalent command either.

-hf ModelName:Q4_K_M

Did you mistake the command to just download and load the model too?

Actually that shouldn't be a question, you clearly did.

Hint: it also opens Claude code configured to use that model


sure there's a reason...it works fine thats the reason

If you’re interested in Chinese characters, there’s https://github.com/wenyan-lang/wenyan


That is _very_ cool! Starred!

Maybe one day someone will write a Korean translation for the book....


There are multiple videos out there of reviewers running multiple “Pro” apps at the same time on the Neo. It’s an impressive machine.


Someone actually did fuzz the claude c compiler:

https://john.regehr.org/writing/claude_c_compiler.html


I have used it from China, actually. Not big enough to be blocked.


The Pebble Time 2 has a heart rate monitor


Isn’t this thread about the pebble round 2 which is the closest from form factor to a garmin ? Moreover, GPS is lacking on any pebble anyhow which is also a major use case and again therefore, garmin’s and pebble’s can’t be compared, 2 different use cases for me.


The round 2 is round similar to a Garmin but the Garmin watches are significantly thicker,.more similar to the time 2.


Even the PT2 is significantly thinner than Garmins, roughly 2mm. That's about 15% thinner, which is a big deal in my book. Not as thin as AWs though, but not off by too much. Based on photos, it looks like the PR2 will be 2mm thinner than the PR2, which would make it thinner than AWs.


While we're on the topic, our school-issued Chromebooks allow unfettered access to YouTube. Yes, some of it is educational, but the kids can just click on the next video until they get what they want. Very convenient for you, Google Ads.


> They’ll nationalize and inflate away any institutional debt or wipe it out

This is just the reverse, actually, China isn’t afraid to go so far as to jail CEOs. There is no such thing as too big to fail in China, and all the Chinese domestic companies know it. The bailout playbook is a western thing.


China has been performing debt swaps with local governments to clean up their balance sheets [1], so used as an example. Agree with all of your comment. People make the mistake that China plays by artificial US capital market rules around profit and debt; they do not. They optimize for physical world success, not line go up.

[1] Why China Is Hoping $1.6 Trillion Can Fix Its Hidden Debt Problem - https://www.bloomberg.com/news/articles/2025-04-16/china-eco... | https://archive.today/HsaHV - April 16th, 2025


Ah, I think I get it. Are you saying that regardless of BYD’s continued existence, China will still have 1/3 of the world’s manufacturing capacity?


Yes, exactly. Just as in the US, when an enterprise gets wiped out and recapitalized, all of the physical assets remain. In China’s case, they are the backstop of last resort, and will always recapitalize according to their nation state planning and target outcomes. They allow companies to operate the assets as long as the Chinese government is willing to allow it, but they remain assets of China.


So, essentially, there is a root company called China, Inc. Correct?


Exactly. Not strictly China Inc, but SASAC (https://en.wikipedia.org/wiki/State-owned_Assets_Supervision...)


There is no free lunch. Debt in China is still owed to someone. Printing money creates inflation. Oversupply leads to deflation.

It’s why China’s real estate company debt is dragging down the economy as a whole. It’s all connected.

China very much is held to the same rules as the US, especially as it engages with the global financial system.

Which is why they are in so much trouble. The economy is anemic. The last stimulus package barely made a difference. Debt overhang remains.


There actually literally is a free lunch. Debt owed to the government by itself isn’t real. Assets and infrastructure are real.

It doesn’t matter how much you think China is in trouble financially, at the end of the day they still manufacture a third of everything in the world.


> There actually literally is a free lunch. Debt owed to the government by itself isn’t real.

False.

It may not be the same as debt owed to other parties, but not paying it still has consequences for things like money supply.

Otherwise, why would the government not just lend endless amounts of money? Even the Chinese government knows they can't do that.

> It doesn’t matter how much you think China is in trouble financially, at the end of the day they still manufacture a third of everything in the world.

Their output doesn't really matter when it comes to their financial situation. You can produce a lot and still be in trouble. And it's not me that is calling out China's problems, China is talking about it as well.


I never said anything about lending infinite money, don’t straw man me.

The constraint is on real resources, which China has plenty of, regardless of how much debt they have.

The financial situation is an accounting detail. China could decide to write off all debt it owes to itself tomorrow, and literally nothing would change.


> I never said anything about lending infinite money, don’t straw man me.

You did say...

> There actually literally is a free lunch. Debt owed to the government by itself isn’t real

Which I think is reasonable to interpret as lending infinite money.

> The constraint is on real resources, which China has plenty of, regardless of how much debt they have.

I'm not even sure what you're saying here. Constraint on what? Money? Economic growth?

Assuming you mean money, it doesn't make much sense. Ok, China has lots of real resources. In order to pay down debt, they need to turn those resources into money. They can certainly create the supply, but they also need demand.

It's like Canada saying they have infinite resources in their timber. "We'll just cut down the wood and sell it". Well there isn't infinite demand for timber, so in fact, having a lot of resources doesn't mean you have endless money.

> China could decide to write off all debt it owes to itself tomorrow, and literally nothing would change

Absolutely not true. Look at the real estate developers. China is working hard to get them out from under the crushing debt they have. If, as you claim, they could "write off all debt, and literally nothing would change", why didn't they.

The answer is that China is mostly a market economy. Writing off debt has massive consequences bothing domestically and internationally.

This is the trade off China made when relying on the free market for growth. If they want to leverage the free market, then they are held to the free market's rules (which they are currently dealing with).


I don’t think that’s reasonable at all.

Can you imagine a world where a government could spend infinite money without any problems? If you can’t, then why would someone else think that? If you thought my understanding of macroeconomics was different from yours and you were genuinely curious, it would have been a good idea to start by asking questions. Or if you thought I was an idiot you could have just ignored me. But you chose to assume an obviously ridiculous premise to my comment and reply based on that. That’s the definition of a straw man argument.

If you still care to know, the constraint on government spending is real resources (and whether they can be usefully utilized of course).

I don’t know about the real estate situation in China, and I don’t know if it was financed by private debt or public debt, so I can’t comment on it.


Ok, so if you agree that infinite spending is impossible, and resource extraction is limited by the market, then what do you mean by:

> There actually literally is a free lunch


That debt issued by a central bank to its government to productively utilize real resources is not inflationary and not really “owed” to anyone. It’s an accounting detail.

In response to you saying that there is no free lunch since debt is always owed to someone.


> That debt issued by a central bank to its government to productively utilize real resources is not inflationary and not really “owed” to anyone. It’s an accounting detail.

This sentence doesn't make sense. "Debt issued to it's government"?

I presume you mean government debt issued to the central bank, whereby the central bank prints new money as a part of the loan?

It absolutely can be inflationary (see US inflation during Covid) if the money eventually makes it into the economy (e.g. by paying workers).

And yes, debt is always owed to someone. Whether that someone is the central bank doesn't mean it's a free lunch - not paying back the loan has consequences to the economy.


They only jail the people that upset the regime.


From the outside, it appears that "upset the regime" includes "cheating your way into profits".

That said, it's very difficult to be sure if what I see from the outside is propaganda. Or rather, it is always propaganda even when it's true, and I can't tell how much of it is China's own self-promotion vs. other people giving negative propaganda.


You don’t think president Xi:s family and friends cheated? Of course they have. Yet they don’t go to prison.


I'm saying from the outside, it doesn't look like that. That's a much weaker statement, as should've been obvious from what I went on to say about propaganda.

Example of cheating: https://en.wikipedia.org/wiki/Officials_implicated_by_the_an...

And: https://en.wikipedia.org/wiki/2008_Chinese_milk_scandal#Arre...

Nevertheless, it would be interesting if someone could, you know, prove, and not merely allege, that Xi Jinping's family or friends cheated.



Yup, that's the kind of thing I have in mind.

Even with the sub-heading "It's legal, and that's the problem."*, and even though this kind of cheating is broader than this reply chain from "There is no such thing as too big to fail in China", this is absolutely within bounds for what I asked for :)

* and the not-proof-read AI generated image, that never helps…


so you think that Xi:s family just lives on his official salary and he hasn't made any money more than that?


That's not what I said.

There's ways to invest in stocks that are not cheating, for example.

Or, you know, they might have done what I myself am doing right now and be landlords. That's not cheating. Or at least, I don't think it is, I don't know what the Chinese government considers it to be, what with communism and all.

For this context: cheating is in the sense of "large bets where the cost of failure will be paid by the state and yet they get to keep the profits in the event of success": I think that if any of them have made such bets, those bets have paid off. I think that if any such bet had been made and failed, it would have been noticed by people outside China.

As I said:

  From the outside, it appears that "upset the regime" includes "cheating your way into profits".
Key parts: "from the outside", and "appears".

In the event they make a bet and it fails and we spot it, that specific chain is what would make it cease to "appear" to be so "from the outside".


> I was always hunting for Pokémon with better abilities, better type coverage, analyzing synergy between moves… If you’ve ever played a mainline Pokémon game before, you must know how utterly unnecessary this is. Twenty years ago, I would have just powered through on Blastoise or Typhlosion alone.

I definitely beat the first Pokemon games with a level 100 Charizard. I even defeated gyms that were strong against fire types, often KO'ing Pokemon in one hit. The text would say "It's not very effective..." and then the opponent's health bar would drop to zero. So yeah, these games are easy enough that a 10yo can get by with twinking out a single pokemon. Makes the blog post even funnier


I have fond memories of my Garados soloing the elite four in Yellow.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: