More

mccr8 · 2026-04-24T18:08:48 1777054128

You can't just use a linter to fix buffer overflows, or people would have done it already.

mccr8 · 2026-04-22T15:54:28 1776873268

The basic technique (as has been publicly described by Anthropic) is you ask one agent to come up with a test case that triggers, say, an ASan use-after-free. Then you have a second agent that validates the test case. This eliminates a lot of false positives. It gets a little tricky when you allow the first agent to modify the code, which is necessary for things like sandbox escapes where you want to demonstrate that sending bad IPC causes problems.

ygjb · 2026-04-22T17:25:20 1776878720

And the following part that is more important is the agent that attempts the fix in code, the agent that tests the fix and reports on perfomance and functional impacts, and the agent the triggers the build and release to production.

Everything up to finding and validating the bug is a huge win in vuln/exploit development, everything after validating the bug is a huge win for defensive security and a massive gap until the tools are generally available :S

mccr8 · 2026-04-20T14:22:33 1776694953

Not really. The models were pointed specifically at the location of the vulnerability and given some extra guidance. That's an easier problem than simply being pointed at the entire code base.

0cf8612b2e1e · 2026-04-20T15:43:19 1776699799

Surely the Anthropic model also only looked at one chunk of code at a time. Cannot fit the entire code base into context. So supplying an identical chunk size (per file, function, whatever) and seeing if the open source model can find anything seems fair. Deliberately prompting with the problem is not.

mccr8 · 2026-04-20T14:20:53 1776694853

The flood of reports that open source projects like curl, Linux and Chromium are getting are presumably due to public models like Open 4.6 that released earlier this year, and not models with limited availability.

mccr8 · 2026-04-08T02:50:10 1775616610

No, they stopped paying bounties.

mccr8 · 2026-03-07T03:10:13 1772853013

You should generally assume that in a web browser any memory corruption bug can, when combined with enough other bugs and a lot of clever engineering, be turned into arbitrary code execution on your computer.

himata4113 · 2026-03-07T03:11:42 1772853102

The most important bit being the difficulty, AI finding 21 easily exploitable bugs is a lot more interesting than 21 that you need all the planets to align to work.

mccr8 · 2026-03-06T17:15:06 1772817306

Google already has an AI-powered security vulnerability project, called Big Sleep. It has reported a number of issues to open source projects: https://issuetracker.google.com/savedsearches/7155917?pli=1

mccr8 · 2026-03-06T17:06:52 1772816812

The bugs that were issued CVEs (the Anthropic blog post says there were 22) were all real security bugs.

The level of AI spam for Firefox security submissions is a lot lower than the curl people have described. I'm not sure why that is. Maybe the size of the code base and the higher bar to submitting issues plays a role.

minraws · 2026-03-17T20:52:53 1773780773

The issue is what each of the projects considers viable bug if you consider all localized assertion failures possible bugs then that's different from give me something that practically affects users.

Further browsers have a much larger surface area for even minor fuzzing bugs. Curl's much smaller surface area is already well fuzzed and tested.

Chrome has better fuzzing and tests too. Firefox has had fewer resources compared to Google ofc, so understable.

Ofc not saying it wasn't good. But given the LLM costs I find it hard believe it was worth it, compared to just better and more innovative fuzzing which would possibly scale better.

mccr8 · 2026-02-20T23:53:21 1771631601

I think the trick to making the "shorts" feature stop showing scantily clad women is to use it actively a bit, and only watch the videos that are decidedly something else. I did that for awhile and now my videos are like "let's see what happens when you pour lava on some soda bottles" which I'm not sure I care that much about but at least it isn't embarrassing.

mccr8 · 2026-01-11T15:02:35 1768143755

Rust did exist in some form in 2011. Source: I ate lunch with part of the Rust team in 2011.

pjmlp · 2026-01-11T15:06:33 1768143993

Some form, meaning not market relevant.