More

jpollock · 2026-04-03T20:11:07 1775247067

That's deterministic dispatch, as soon as it forks or communicates, it is non deterministic again?

Don't you need something like a network clock to get deterministic replay?

It can't use immediate return on replay, or else the order will change.

This makes me twitchy. The dependencies should be better modelled, and idempotency used instead of logging and caching.

jpollock · 2026-04-01T00:13:08 1775002388

Yes, it sets the reviewer's expectations around how much effort was spent reviewing the code before it was sent.

I regularly have tool-generated commits. I send them out with a reference to the tool, what the process is, how much it's been reviewed and what the expectation is of the reviewer.

Otherwise, they all assume "human authored" and "human sponsored". Reviewers will then send comments (instead of proposing the fix themselves). When you're wrangling several hundred changes, that becomes unworkable.

jpollock · 2026-03-31T04:43:28 1774932208

That can have some very extreme legal ramifications.

Consider - it's a voip dialing client which has a requirement to provide location for E911 support.

If the OS vendor starts providing invalid data, it's the OS vendor which ends up being liable for the person's death.

e.g. https://www.cnet.com/home/internet/texas-sues-vonage-over-91...

which is from 2005, but gives you an idea of the liability involved.

pocksuppet · 2026-03-31T05:41:36 1774935696

Phone companies are required to make sure 911 works on their phones. Random people on the internet aren't required to make sure 911 works on random apps, even if they look like phones.

squeaky-clean · 2026-03-31T15:57:48 1774972668

The comment you're replying to literally has an example of an internet calling service being fined $20,000 for not properly directing 911 calls.

I guess Vonage should try to appeal the case and say pocksuppet said they're not required to do that.

pocksuppet · 2026-04-02T07:33:13 1775115193

Vonage sells phone services that happen to use the internet. This is not the same as being WhatsApp.

eviks · 2026-03-31T05:40:19 1774935619

It can't have "extreme ramifications", Google's own phone couldn't call 911 for a while.

And you can manually force only the voip dialing apps instead of everyone

jpollock · 2026-03-08T22:09:12 1773007752

Documentation rots a lot more quickly than the code - it doesn't need to be correct for the code to work. You are usually better off ignoring the comments (even more so the design document) and going straight to the code.

hinkley · 2026-03-08T22:13:14 1773007994

I maintain you’re either grossly misappropriating the time and energy of new and junior devs if this is the case on your project, or you have gone too long since hiring a new dev and your project is stagnating because of it.

New eyes don’t have the curse of knowledge. They don’t filter out the bullshit bits. And one of the advantages of creating reusable modules is you get more new eyes on your code regularly.

This may also be a place where AI can help. Some of the review tools are already calling us out on making the code not match the documentation.

habinero · 2026-03-09T07:55:12 1773042912

No, they're 100% correct. This has been my experience at every place I've worked at in SV, from startup to FAANG.

You write the code so you can scan it easily, and you build tools to help, and you ask for help when you need it, but you still gotta build that mental map out

jpollock · 2026-03-06T08:16:02 1772784962

The last time I tried AI, I tested it with a stopwatch.

The group used feature flags...

    if (a) {
       // new code
    } else {
       // old code
    }

    void testOff() {
       disableFlag(a);
       // test it still works
    }
    
    void testOn() {
        enableFlag(a);
        // test it still works
    }

However, as with any cleanup, it doesn't happen. We have thousands of these things lying around taking up space. I thought "I can give this to the AI, it won't get bored or complain."

I can do one flag in ~3minutes. Code edit, pr prepped and sent.

The AI can do one in 10mins, but I couldn't look away. It kept trying to use find/grep to search through a huge repo to find symbols (instead of the MCP service).

Then it ignored instructions and didn't clean up one or the other test, left unused fields or parameters and generally made a mess.

Finally, I needed to review and fix the results, taking another 3-5 minutes, with no guarantee that it compiled.

At that point, a task that takes me 3 minutes has taken me 15.

Sure, it made code changes, and felt "cool", but it cost the company 5x the cost of not using the AI (before considering the token cost).

Even worse, the CI/CD system couldn't keep up the my individual velocity of cleaning these up, using an automated tool? Yeah, not going to be pleasant.

However, I need to try again, everyone's saying there was a step change in December.

laserlight · 2026-03-06T09:42:43 1772790163

I did my own experiment with Claude Code vs Cursor tab completion. The task was to convert an Excel file to a structured format. Nothing fancy at all.

Claude Code took 4 hours, with multiple prompts. At the end, it started to break the previous fixes in favor of new features. The code was spaghetti. There was no way I could fix it myself or steer Claude Code into fixing it the right way. Either it was a dead-end or a dice roll with every prompt.

Then I implemented my own version with Cursor tab completion. It took the same amount of time, 4 hours. The code had a clear object-oriented architecture, with a structure for evolution. Adding a new feature didn't require any prompts at all.

As a result, Claude Code was worse in terms of productivity: the same amount of time, worse quality output, no possibility of (or at best very high cost of) code evolution.

thesamethrowawa · 2026-03-06T09:49:14 1772790554

Are you able to share your prompts to Claude Code? I assume not, they are probably not saved - but this genuinely surprised me, it seems like exactly the type of task an LLM would excel at (no pun intended!). What model were you using OOI?

laserlight · 2026-03-06T10:03:58 1772791438

> this genuinely surprised me

Me too. After listening to all the claims about Claude Code's productivity benefits, I was surprised to get the result I got.

I'm not able to share details of my work. I was using Claude Opus 4.5, if I recall correctly.

shinycode · 2026-03-06T09:51:18 1772790678

The exact same prompt ? Everything depends on the prompt and it’s different tools. These days the quality and what’s build around the prompt matters as much as the code. We can’t feed generic query.

sensanaty · 2026-03-06T12:26:36 1772799996

Similar happened to me just now. Claude whatever-is-the-latest-and-greatest, in Claude Code. I also tried out Windsurf's Arena Mode, with the same failure. To intercept the inevitable "holding it wrong" comments, we have all the AGENTS.md and RULES.md files and all the other snake oil you're told to include in the project. It has full context of the code, and even the ticket. It has very clear instructions on what to do (the kind of instructions I would trust an unpaid intern with, yet alone a tool marketed as the next coming of Cyber Jesus that we're paying for), in a chat with minimal context used up already. I manually review every command it runs, because I don't trust it running shell scripts unsupervised.

I wanted it to finish up some tests that I had already prefilled, basically all the AI had to do was convert my comments into the final assertions. A few minutes later of looping, I see it finishes and all tests are green.

A third of the tests were still unfilled, I guess left as an exercise for the reader. Another third was modified beyond what I told it to do, including hardcoding some things which made the test quite literally useless and the last third was fine, but because of all the miscellaneous changes it made I had to double check those anyways. This is about the bare minimum where I would expect these things to do good work, a simple take comment -> spit out the `assert()` block.

I ended up wasting more time arguing with it than if I had just done the menial task of filling out the tests myself. It sure did generate a shit ton of code though, and ran in an impressive looking loop for 5-10 minutes! And sure, the majority of the test cases were either not implemented or hardcoded so that they wouldn't actually catch a breakage, but it was all green!!

That's ultimately where this hype is leading us. It's a genuinely useful tool in some circumstances, but we've collectively lost the plot because untold billions have poured into these systems and we now have clueless managers and executives seeing "tests green -> code good" and making decisions based on that.

embedding-shape · 2026-03-06T10:50:50 1772794250

What model, what harness and about how long was your prompt to fire off this piece of work? All three matters a lot, but importantly missing from your experience.

jpollock · 2026-03-04T19:24:05 1772652245

Won't that show up in roi numbers?

praptak · 2026-03-05T10:19:01 1772705941

There is no base to compare against.

jpollock · 2026-03-04T19:23:00 1772652180

There are definite discontinuities in there. What works for a team of 5 is different to 50 is different to 500.

Even just taking fault incidence rates, assuming constant injection per dev hour...

jpollock · 2026-03-03T22:51:41 1772578301

If the llm is able to code it, there is enough training data that youight be better off in a different language that removes the boilerplate.

jpollock · 2026-02-26T23:52:12 1772149932

There are a couple of ways to figure out.

open a terminal (OSX/Linux) and type:

    man dup

open a browser window and search for:

    man dup

Both will bring up the man page for the function call.

To get recursive, you can try:

    man man unix

(the unix is important, otherwise it gives you manly men)

Bender · 2026-02-26T23:57:06 1772150226

otherwise it gives you manly men

That's only just after midnight [1][2]

[1] - https://www.youtube.com/watch?v=XEjLoHdbVeE

[2] - https://unix.stackexchange.com/questions/405783/why-does-man...

ifh-hn · 2026-02-27T09:04:41 1772183081

I love that this situation occured.

trashb · 2026-02-27T08:56:55 1772182615

you may also consider gnu info

  info dup

jpollock · 2026-02-20T23:11:03 1771629063

The severity of the DoS depends on the system being attacked, and how it is configured to behave on failure.

If the system is configured to "fail open", and it's something validating access (say anti-fraud), then the DoS becomes a fraud hole and profitable to exploit. Once discovered, this runs away _really_ quickly.

Treating DoS as affecting availability converts the issue into a "do I want to spend $X from a shakedown, or $Y to avoid being shaken down in the first place?"

Then, "what happens when people find out I pay out on shakedowns?"

staticassertion · 2026-02-20T23:25:55 1771629955

If the system "fails open" then it's not a DoS, it's a privilege escalation. What you're describing here is just a matter of threat modeling, which is up to you to perform and not a matter for CVEs. CVEs are local properties, and DoS does not deserve to be a local property that we issue CVEs for.

otabdeveloper4 · 2026-02-21T06:29:28 1771655368

You're making too much sense for a computer security specialist.

michaelt · 2026-02-20T23:27:44 1771630064

> If the system is configured to "fail open", and it's something validating access (say anti-fraud),

The problem here isn't the DoS, it's the fail open design.

jpollock · 2026-02-20T23:47:56 1771631276

If the majority of your customers are good, failing closed will cost more than the fraud during the anti-fraud system's downtime.

prmoustache · 2026-02-21T13:50:13 1771681813

If that is the mindset in your company, why even bother looking for vulnerabilities?

jpollock · 2026-02-22T08:51:13 1771750273

There is _always_ fraud, and you can't stop it all. All you can do is try to minimize the cost of the fraud.

There is an "acceptable" fraud rate from a payment processor. This explains why there are different rates for "card present" and "card not present" transactions, and why things like Apple Pay and Google Pay are popular with merchants.

everforward · 2026-02-21T14:31:17 1771684277

You are really running with scissors there. If anyone with less scrupulous morals notices, you’re an outage away from being in deep, deep shit.

The best case is having your credit card processing fees like quadruple, and the worst case is being in a regulated industry and having to explain to regulators why you knowingly allowed a ton of transactions with 0 due diligence.

TeMPOraL · 2026-02-22T12:12:56 1771762376

The concept of due diligence recognizes the limits, past which it becomes too much, or undue.

lazyasciiart · 2026-02-21T04:32:13 1771648333

Until any bad customer learns about the fail-open.

eru · 2026-02-21T11:09:50 1771672190

If bad actors learn about the fail-close, they can conceivably cause you more harm.

gopher_space · 2026-02-21T19:18:51 1771701531

This is a losing money vs. losing freedom situation.

eru · 2026-02-22T15:01:59 1771772519

Maybe. But for a company everything is fungible.

paulddraper · 2026-02-22T17:48:36 1771782516

Okay, then the “vulnerability” is de facto simply transitioning the system to an acceptable state.

TeMPOraL · 2026-02-22T12:11:24 1771762284

> Treating DoS as affecting availability converts the issue into a "do I want to spend $X from a shakedown, or $Y to avoid being shaken down in the first place?"

But that is what security is in the real world anyway. Once you move past the imaginary realms of crypto and secure coding that some engineers daydream in, the ultimate reality is always about "do I want to spend $X dealing with consequences of ${specific kind of atack}, or $Y on trying to prevent it" - and the answer is to consider how much $X is likely to be, and how much it'll be reduced by spending $Y, and only spending while the $Y < reduction in $X.

vasco · 2026-02-21T06:08:38 1771654118

> Treating DoS as affecting availability converts the issue into a "do I want to spend $X from a shakedown, or $Y to avoid being shaken down in the first place?"

> Then, "what happens when people find out I pay out on shakedowns?"

What do you mean? You pay to someone else than who did the DoS. You pay your way out of a DoS by throwing more resources at the problem, both in raw capacity and in network blocking capabilities. So how is that incentivising the attacker? Or did you mean some literal blackmailing??

jpollock · 2026-02-21T08:27:06 1771662426

Literal blackmailing, same as ransomware.

eru · 2026-02-21T11:09:02 1771672142

Also in eg C code, many exploits start out would only be a DoS, but can later be turned into a more dangerous attack.

staticassertion · 2026-02-21T14:32:49 1771684369

If you're submitting a CVE for a primitive that seems likely to be useful for further exploitation, mark it as such. That's not the case for ReDOS or the vast majority of DoS, it's already largely the case that you'd mark something as "privesc" or "rce" if you believe it provides that capability without necessarily having a full, reliable exploit.

CVEs are at the discretion of the reporter.