More

dkdcio · 2026-01-11T14:34:05 1768142045

not the same person but in the flow of doing things those little pauses (tens of milliseconds) do matter. I open/close nvim (and less-so tmux) a ton, and run lots of commands per day. I don’t want to wait

and once you get used to things being that fast, it’s hard to go back (analogous to what people say about high-refresh screens/monitors)

all that said the speed of the default mac terminal (and other emulators I tried) was always fine for me, performance was not why I switched to Ghostty

dkdcio · 2026-01-11T11:54:39 1768132479

I think you’re correct. the reproduction isn’t very precise and the solution doesn’t seem right (I’m not seeing anything about the non-standard pages not being freed). I’d guess this was ignored because it was wrong…

dkdcio · 2026-01-11T11:45:36 1768131936

I started heavy usage in April 2025 (Codex CLI -> some Claude Code and trying other CLIs + a bit of Cursor -> Warp.dev -> Claude Code) and I’m still learning as well (and constantly trying to get more efficient)

(I had been using GitHub Copilot for 5+ years already, started as an early beta tested, but I don’t really consider that the same)

I like to say it’s like learning a programming language. it takes time, but you start pattern matching and knowing what works. it took me multiple attempts and a good amount of time to learn Rust, learning effective use of these tools is similar

I’ve also learned a ton across domains I otherwise wouldn’t have touched

dkdcio · 2026-01-11T11:30:24 1768131024

I have tons of examples of AI not committing secrets. this is one screenshot from twitter? I don’t think it makes your point

CPUs are billions of transistors. sometimes one fails and things still work. “probabilistic quicksand” isn’t the dig you think it is to people who know how this stuff works

rvz · 2026-01-11T12:19:45 1768133985

> I have tons of examples of AI not committing secrets.

"Trust only me bro".

It takes 10 seconds to see the many examples of API keys + prompts on GitHub to verify that tweet. The issue with AI isn't limited to that tweet which demonstrates its probabilistic nature; Otherwise why do need a sandbox to run the agent in the first place?

Nevermind, we know why: Many [0] such [1] cases [2]

> CPUs are billions of transistors. sometimes one fails and things still work. “probabilistic quicksand” isn’t the dig you think it is to people who know how this stuff works

Except you just made a false equivalence. CPUs can be tested / verified transparently and even if it does go wrong, we know exactly why. Where as you can't explain why the LLM hallucinated or decided to delete your home folder because the way it predicts what it outputs is fundamentally stochastic.

[0] https://old.reddit.com/r/ClaudeAI/comments/1pgxckk/claude_cl...

[1] https://old.reddit.com/r/ClaudeAI/comments/1jfidvb/claude_tr...

[2] https://www.google.com/search?q=ai+deleted+files+site%3Anews...

dkdcio · 2026-01-11T12:27:23 1768134443

you could find tons of API keys on GitHub before these “agentic” tools too. that was my point, one screenshot from twitter vs one anecdote from me. I don’t think either proves the point, but posting a screenshot from twitter like it’s proof of some widespread problem is what I was responding to (N=2, 1 vs 1)

my point is more “skill issue” than “trust me this never happens”

my point on CPUs is people who don’t understand LLMs talk like “hallucinations” are a real thing — LLMs are “deciding” to make stuff up rather than just predicting the next token. yes it’s probabilistic, so is practically everything else at scale. yet it works and here we are. can you really explain in detail how everything you use works? I’m guessing I can explain failure modes of agentic systems (and how to avoid them so you don’t look silly on twitter/github) and how neural networks work better than most people can explain the technology they use every day

rvz · 2026-01-11T13:38:27 1768138707

> you could find tons of API keys on GitHub before these “agentic” tools too. that was my point, one screenshot from twitter vs one anecdote from me. I don’t think either proves the point, but posting a screenshot from twitter like it’s proof of some widespread problem is what I was responding to (N=2, 1 vs 1)

That doesn't refute the probabilistic nature of LLMs despite best prompting practices. In fact it emphasises it. More like your 1 anecdotal example vs my 20+ examples on GitHub.

My point tells you that not only it indeed does happen, but a previous old issue is now made even worse and more widespread, since we now have vibe-coders without security best practices assuming the agent should know better (when it doesn't).

> my point is more “skill issue” than “trust me this never happens”

So those that have this "skill issue" are also those who are prompting the AI differently then? Either way, this just inadvertently proves my whole point.

> yes it’s probabilistic, so is practically everything else at scale. yet it works and here we are.

The additional problem is can you explain why it went wrong as you scale the technology? CPUs circuit design go through formal verification and if a fault happens, we know exactly why; hence it is deterministic in design which makes them reliable.

LLMs are not and don't have this. Which is why OpenAI had to describe ChatGPT's misaligned behaviour as "sycophancy", but could not explain why it happened other than tweaking the hyper-parameters which got them that result.

So LLMs being fundamentally probabilistic and are hence, more unexplainable being the reason why you have the screenshot of vibe-coders who somehow prompted it wrong and the agent committed the keys.

Maybe that would never have happened to you, but it won't be the last time we see more of this happening on GitHub.

dkdcio · 2026-01-11T13:52:54 1768139574

I was pointing out one screenshot from twitter isn’t proof of anything just to be clear; it’s a silly way to make a point.

yes AI makes leaking keys on GH more prevalent, but so what? it’s the same problem as before with roughly the same solution

I’m saying neural networks being probabilistic doesn’t matter — everything is probabilistic. you can still practically use the tools to great effect, just like we use everything else that has underlying probabilities

OpenAI did not have to describe it as sycophancy, they chose to, and I’d contend it was a stupid choice

and yes, you can explain what went wrong just like you can with CPUs. we don’t (usually) talk about quantum-level physics when discussing CPUs; talking about neurons in LLMs is the wrong level of abstraction

rvz · 2026-01-11T14:48:13 1768142893

> I was pointing out one screenshot from twitter isn’t proof of anything just to be clear; it’s a silly way to make a point.

Verses your anecdote being a proof of what? Skill issue for vibe coders? Someone else prompting it wrong?

You do realize you are proving my entire point?

> yes AI makes leaking keys on GH more prevalent, but so what? it’s the same problem as before with roughly the same solution

Again, it exacerbates my point such that it makes the existing issue even worse. Additionally, that wasn't even the only point I made on the subject.

> I’m saying neural networks being probabilistic doesn’t matter — everything is probabilistic.

When you scale neural networks to become say, production-grade LLMs, then it does matter. Just like it does matter for CPUs to be reliable when you scale them in production-grade data centers.

But your earlier (fallacious) comparison ignores the reliability differences between them (CPUs vs LLMs.) and determinism is a hard requirement for that; which the latter, LLMs are not.

> OpenAI did not have to describe it as sycophancy, they chose to, and I’d contend it was a stupid choice

For the press, they had to, but no-one knows the real reason, because it is unexplainable; going back to my other point on reliability.

> and yes, you can explain what went wrong just like you can with CPUs. we don’t (usually) talk about quantum-level physics when discussing CPUs; talking about neurons in LLMs is the wrong level of abstraction

It is indeed wrong for LLMs because not even the researchers can practically give an explanation why a single neuron (for every neuron in the network) gives different values on every fine-tune or training run. Even if it is "good enough", it can still go wrong at the inference-level for other unexplainable reasons other than it "overfitted".

CPUs on the other hand, have formal verification methods which verify that the CPU conforms to its specification and we can trust that it works as intended and can diagnose the problem accurately without going into atomic-level details.

dkdcio · 2026-01-11T15:51:12 1768146672

…what is your point exactly (and concisely)? I’m saying it doesn’t matter it’s probabilistic, everything is, the tech is still useful

rvz · 2026-01-11T16:08:55 1768147735

No one is arguing that it isn't useful. The problem is this:

> I’m saying it doesn’t matter it’s probabilistic, everything is,

Maybe it doesn't matter for you, but it generally does matter.

The risk level of a technology failing is far higher if it is more random and unexplainable than if it is expected, verified and explainable. The former eliminates many serious use-cases.

This is why your CPU, or GPU works.

LLMs are neither deterministic, no formal verification exists and are fundamentally black-boxes.

That is why many vibe-coders reported many "AI deleted their entire home folder" issues even when they told it to move a file / folder to another location.

If it did not matter, why do you need sandboxes for the agents in the first place?

dkdcio · 2026-01-11T17:06:54 1768151214

I think we agree then? the tech is useful; you need systems around them (like sandboxes and commit hooks that prevent leaking secrets) to use them effectively (along with learned skills)

very little software (or hardware) used in production is formally verified. tons of non-deterministic software (including neural networks) are operating in production just fine, including in heavily regulated sectors (banking, health care)

rvz · 2026-01-11T23:28:52 1768174132

> I think we agree then? the tech is useful; you need systems around them (like sandboxes and commit hooks that prevent leaking secrets) to use them effectively (along with learned skills)

No.

> very little software (or hardware) used in production is formally verified. tons of non-deterministic software (including neural networks) are operating in production just fine, including in heavily regulated sectors (banking, health care)

It's what happens when it all goes wrong.

You have to explain exactly why, a system failed in heavily regulated sectors.

Saying 'everything is probabilistic' as the reason for the cause of an issue, is a non answer if you are a chip designer, air traffic controller, investment banker or medical doctor.

So your point does not follow.

dkdcio · 2026-01-12T00:24:50 1768177490

that’s not what I said. you honestly seem like you just want to argue about stuff (e.g. not elaborating on the “no” when I basically repeated and agreed with what you said). and you seem to consistently miss my point (in the second part of your response; I’m saying these non-deterministic neural networks are already widespread in industry with these regulations, and it’s fine. they can be explained despite your repeated assertions they cannot be. also the entire point on CPUs which you may have noticed I dropped from my responses because you seemed distracted arguing about it). this is not productive and we’re both clearly stubborn, glhf

rvz · 2026-01-12T01:04:52 1768179892

> that’s not what I said. you honestly seem like you just want to argue about stuff (e.g. not elaborating on the “no” when I basically repeated and agreed with what you said). and you seem to consistently miss my point

I have repeated myself many times and you decide to continue to ignore the reliability points that inherently impede LLMs in many use-cases which exclude them in areas where predictability in critical systems is required in production.

Vibe coders can use them, but the gulf between useful for prototyping and useful for production is riddled with hard obstacles as such a software like LLMs are fundamentally unpredictable hence the risks are far greater.

> I’m saying these non-deterministic neural networks are already widespread in industry with these regulations, and it’s fine.

So when a neural network scales beyond hundreds of layers and billions of parameters, equivalent to a production-grade LLM, explain exactly how is such a black-box on that scale explainable when it messes up and goes wrong?

> they can be explained despite your repeated assertions they cannot be.

With what methods exactly?

Early on, I said formal verification and testing on CPUs for explaining when they go wrong at scale. It is you that provided absolutely nothing of your own assertions with the equivalent for LLMs other than "they can be explained" without providing any evidence.

> also the entire point on CPUs which you may have noticed I dropped from my responses because you seemed distracted arguing about it). this is not productive and we’re both clearly stubborn, glhf

You did not make any point with that as it was a false equivalence, and I explained why the reliability of a CPU isn't the same as the reliability of a LLM.

Mawr · 2026-01-11T12:53:21 1768136001

I have tons of examples of drivers not running into objects.

dkdcio · 2026-01-11T12:57:55 1768136275

like my other comment, my point is one screenshot from twitter vs one anecdote. neither proves anything. cool snarky response though!

dkdcio · 2026-01-11T11:23:24 1768130604

this is a straw man, nobody serious is promising that. it is a skill like any other that requires learning

robot-wrangler · 2026-01-11T13:06:32 1768136792

I agree about skills actually, but it's also obvious that parent is making a very real point that you cannot just dismiss. For several years now and far short of wild AGI promises, the answer to literally every issue with casual or production AI has been something like "but the rate of model improvement.." or "but the tools and ecosystem will evolve.."

If you believe that uncritically about everything else, then you have to answer why agentic workflows or MCP or whatever is the one thing that it can't evolve to do for us. There's a logical contradiction here where you really can't have it both ways.

dkdcio · 2026-01-11T13:19:16 1768137556

I’m not understanding your point… (and would be genuinely curious to)? the models and systems around them have evolved and gotten better (over the past few years for LLMs and decades for “AI” more broadly)

oh I think I do get your point now after a few rereads (correct if wrong but you’re saying it should keep getting better until there’s nothing for us to do). “AI”, and computer systems more broadly, are not and cannot be viable systems. they don’t have agency (ironically) to affect change in their environment (without humans in the loop). computer systems don’t exist/survive without people. all the human concerns around what/why remain, AI is just another tool in a long line of computer systems that make our lives easier/more efficient

robot-wrangler · 2026-01-11T14:38:02 1768142282

AI Engineer to Software Engineer: Humans writing code is a waste of time, you can only hope to add value by designing agentic workflows

Prompt Engineer to AI Engineer: Designing agentic workflows is a waste of time, just pre/postfix whatever input you'd normally give to the agentic system with the request to "build or simulate an appropriate agentic workflow for this problem"

sensanaty · 2026-01-12T01:08:27 1768180107

Nobody serious, like every single AI CEO out there? I mean I agree, nobody should be taking them seriously, yet we're fast on track for a global financial meltdown because of these fraudsters and their "non-serious" words.

fabianholzer · 2026-01-11T13:53:34 1768139614

> nobody serious is promising that

There is a staggering number of unserious folks in the ears of people with corporate purchasing power.

Ekaros · 2026-01-11T11:30:42 1768131042

OpenAI is going to get to AGI. And AGI should in minutes build a system that takes vague input and produces fully functioning product out of it. Isn't singularity being promised by them?

dkdcio · 2026-01-11T11:35:56 1768131356

you’re just repeating the straw man. if you can’t think critically and just regurgitate every dumb thing you hear idk what to tell you. nobody serious thinks a “singularity” is coming. there’s not even a proper definition of “AGI”

your argument amounts to “some people said stupid shit one time and I took it seriously”

dkdcio · 2026-01-09T14:49:02 1767970142

(i.e. competitors can still use Claude models but haven’t achieved the same DevEx as CC so far, at least in my opinion and many others)

also while I was initially on the “they should open source” boat, and I’m happy Codex CLI did, there are a ton of benefits to keeping to closed source. just look at how much spam and dumb community drama OpenAI employees now have to deal with on GitHub. I increasingly think it’s a smart moved to keep it closed source and iterate without as direct community involvement on the codebase for now

rovr138 · 2026-01-09T15:56:17 1767974177

They could open source and not take contributions fwiw.

They could close the issues and only allow discussions.

There was a project mentioned here recently that did just that.

*Edit

It was Ghostty,

"Why users cannot create Issues directly" - https://news.ycombinator.com/item?id=46460319

joelwilliamson · 2026-01-09T17:54:20 1767981260

They could open source it and not even have a Github project associated. Just provide a read-only git repo on anthropic.com or drop a source tarball every release.

hxugufjfjf · 2026-01-09T19:15:46 1767986146

Then a ton of vibe-coded Claude Code forks out of their control would pop up on GitHub and people would be even more frustrated at Anthropic for not fixing their issues.

dkdcio · 2026-01-09T14:21:13 1767968473

in this case they’re just British mate

dkdcio · 2026-01-09T04:37:57 1767933477

how so?

dkdcio · 2026-01-08T21:45:01 1767908701

pretty much…they have their own system prompts, you can customize the model, the tools they use, etc.

CC has built in subagents (including at least one not listed) that work very well: https://code.claude.com/docs/en/sub-agents#built-in-subagent...

this was not the case in the past, I swore off subagents, but they got good at some point

dkdcio · 2026-01-08T21:43:08 1767908588

late Feb. 2025: https://www.anthropic.com/news/claude-3-7-sonnet