More

nbardy · 2026-04-08T06:18:04 1775629084

There is step changes that actually merit this though. And a zero day machine IS one of those. It went from 4% zero day success rate to 85% on firefox.

Can you not see the significance of that?

ofjcihen · 2026-04-08T06:21:22 1775629282

I mean I work in this world and overhype is constant.

Additionally those numbers are somewhat meaningless without more context.

jstummbillig · 2026-04-08T18:38:36 1775673516

Can you explain why they are meaningless without more context?

ofjcihen · 2026-04-08T20:09:10 1775678950

A 0 day is just a vulnerability that wasn’t known before now.

What’s the criticality of these? Are they realistically exploitable? En mass? Through a complex and highly contextual set of actions? What’s the impact? Etc etc etc.

Yes those numbers are a big change but they’re also not spelling doom for us in the security world until we actually know what they mean.

The demonstrated ones that they have on the red team blog are neat, the kernel chain is impressive and fun. But nothing I’m seeing here is as world ending as the presser implies.

jstummbillig · 2026-04-09T07:24:21 1775719461

> The demonstrated ones that they have on the red team blog are neat, the kernel chain is impressive and fun

So by your estimation, for rogue actors being able to uncover hundreds of this class in each major software product roughly for free would not be a big issue?

ofjcihen · 2026-04-09T13:26:21 1775741181

We must have read two different red team blogs from Anthropic if that’s what you think is happening. But let’s go ahead and assume what you’re asking at face value.

It would not be a doomsday issue as implied, no. Org security has gone far beyond static detections and “just exclude some IPs that fail to log in too much and we’re good”. SOAR exists. Behavioral analysis and monitoring exists. Layered defenses exist.

Believe it or not for those of us in security in large highly targeted companies we’ve been dealing with the potential for multiple chained 0 days for years and the processes, monitoring, and (yes, automated) response architecture is already there.

I get that this is absolutely frightening for some and that causes panic but for us this is Tuesday.

nbardy · 2026-03-10T07:41:07 1773128467

You can estimate on tok/second

The Trillions of parameters claim is about the pretraining.

It’s most efficient in pre training to train the biggest models possible. You get sample efficiency increase for each parameter increase.

However those models end up very sparse and incredibly distillable.

And it’s way too expensive and slow to serve models that size so they are distilled down a lot.

nbardy · 2026-03-04T02:37:14 1772591834

How much of your RAM does that use including kv cache. Is there enough left to run real dev workloads AND the llm?

Also can you run batchwise effectively like vllm on cuda?

Enough to run multiple agents at the same time with throughput?

nbardy · 2026-03-03T04:18:53 1772511533

Why does apple want to make this hardware hard to access?

What actual benefits do they get?

I guess they can have their own models run faster than the competition on their hardware? But they don't even really have anything that consumers use on the ANE as far as I can tell and local LLMs are taking off on macs and could really benefit from this

owlbite · 2026-03-03T04:53:44 1772513624

I suspect main benefits are they have no need to maintain the hardware or software for any longer than it makes sense for their own needs, and don't have to handhold users through a constantly evolving minefield of performance and technical capabilities.

nbardy · 2026-02-14T16:26:25 1771086385

They are far behind. Go check re-swe bench to see the overfitting measured

Or just try to use them. They don’t generalize as well.

They are benchmaxxed.

nbardy · 2026-01-27T10:07:35 1769508455

They should probably fund their military first.

It’s petulant the way the EU is throwing a hissy fit after we’ve had lop-sided trade deals for years and funding the entire NATO alliance ourselves.

They act like we’re going to war with them when we’re asking for parity and for their self reliance to increase.

michaelsshaw · 2026-01-27T11:55:14 1769514914

>They act like we’re going to war with them when we’re asking for parity and for their self reliance to increase.

The US is literally threatening to invade an EU overseas territory.

sambuccid · 2026-01-27T11:59:55 1769515195

That's because not everyone thinks that the trade deals were lop-sided, and it's difficult to objectively determine if they are, given that trade deals are just another lever in the relationship between 2 countries, one lever among millions of levers, one that is constantly calibrated and moved depending on the other ones. In a system like this I think it's pretty difficult to say who's getting more and who's getting less. But Trump doesn't care what is true of false, so for him it's easy to just say what suits him best.

Regarding the war, I can assure you that Trump not excluding to take Greenland my force has been seen by the EU as threat of starting a war, giving that Greenland is part of the EU. Also applying tariffs when European NATO countries sent some troops in Greenland has been perceived as: "Trump wanted to invade Greenland, he felt like EU countries wanted to defend it, so he imposed tariffs because he wanted to invade".

I'm not saying everyone in EU is thinking this, but I think a lot of people did, and this is some context for you to try and understand europe's point of view.

reactormonk · 2026-01-29T09:32:30 1769679150

> They act like we’re going to war with them when we’re asking for parity and for their self reliance to increase.

Threatening to take over Greenland by force isn't considered "going to war" for you?

varispeed · 2026-01-27T11:34:15 1769513655

Comrade, what is the weather in St. Petersburg?

perlgeek · 2026-01-27T11:00:09 1769511609

> They should probably fund their military first.

They should do both. Resilience must be achieved in depth.

> It’s petulant the way the EU is throwing a hissy fit after we’ve had lop-sided trade deals for years and funding the entire NATO alliance ourselves.

Most of the outrage in the EU right now is about Trump's threats against another NATO country (Denmark / Greenland). The funding of the NATO has been slowly shifting for a few years already.

urbandw311er · 2026-01-27T14:03:13 1769522593

If you’re honestly OK with the maths Trump used to calculate the trade deficits then I’m not really sure you’re going to fit in here at HN.

nbardy · 2026-01-19T05:28:37 1768800517

No it’s not. I have written cuda kernels and 8bit optimizers with this.

They’re actually very good at speed optimization and can iterate very quickly taking notes on trials and failures and benchmarks. I’ve had it write 10 different attempts in around an hour and benchmark them all then merge and beat very strong baselines in torch

nbardy · 2026-01-18T03:44:44 1768707884

> Claude Code officially added native support for the Language Server Protocol (LSP) in version 2.0.74, released in December 2025.

I think from training it's still biased towards simple tooling.

But also, there is real power to simple tools, a small set of general purpose tools beats a bunch of narrow specific use case tools. It's easier for humans to use high level tools, but for LLM's they can instantly compose the low level tools for their use case and learn to generalize, it's like writing insane perl one liners is second nature for them compared to us.

If you watch the tool calls you'll see they write a ton of one off small python programs to test, validate explore, etc...

If you think about it any time you use a tool there is probably a 20 line python program that is more fit to your use case, it's just that it would take you too long to write it, but for an LLM that's 0.5 seconds

frumplestlatz · 2026-01-18T10:18:57 1768731537

> but for LLM's they can instantly compose the low level tools for their use case and learn to generalize

Hard disagree; this wastes enormous amounts of tokens, and massively pollutes the context window. In addition to being a waste of resources (compute, money, time), this also significantly decreases their output quality. Manually combining painfully rudimentary tools to achieve simple, obvious things -- over and over and over -- is *not* an effective use of a human mind or an expensive LLM.

Just like humans, LLMs benefit from automating the things they need to do repeatedly so that they can reserve their computational capacity for much more interesting problems.

I've written[1] custom MCP servers to provide narrowly focused API search and code indexing, build system wrappers that filter all spurious noise and present only the material warnings and errors, "edit file" hooks that speculatively trigger builds before the LLM even has to ask for it, and a litany of other similar tools.

Due to LLM's annoying tendency to fall back on inefficient shell scripting, I also had to write a full bash syntax parser and shell script rewriting ruleset engine to allow me to silently and trivially rewrite their shell invocations to more optimal forms that use the other tools I've written, so that they don't have to do expensive, wasteful things like pipe build output through `head`/`tail`/`grep`/etc, which results in them invariably missing important information, and either wandering off into the weeds, or -- if they notice -- consuming a huge number of turns (and time) re-running the commands to get what they need.

Instead, they call build systems directly with arbitrary options, | filters, etc, and magically the command gets rewritten to something that will produce the ideal output they actually need, without eating more context and unnecessary turns.

LLMs benefit from an IDE just like humans do -- even if an "IDE" for them looks very different. The difference is night and day. They produce vastly better code, faster.

[1] And by "I've written", I mean I had an LLM do it.

forty · 2026-01-18T11:26:15 1768735575

Note that the Claude code LSP integration was actually broken for a while after it was released, so make sure you have a very recent version if you want to try it out.

However as parent comment said, it seems to always grep instead, unless explicitly said to use the LSP tool.

cududa · 2026-01-18T05:05:16 1768712716

Correct. If you try to create a coding agent using the raw Codex or Claude code API and you build your own “write tool”, and don’t give the model their “native patch tool”, 70%+ of the time it’s write/ patch fails because it tries to do the operation using the write/ patch tool it was trained on.

htrp · 2026-01-18T18:02:30 1768759350

part of the value add of owning both the model and the tooling

cm2187 · 2026-01-18T09:01:24 1768726884

We are back to RISC vs CISC!

htrp · 2026-01-18T18:03:55 1768759435

history doesn't repeat but it definitely rhymes

nbardy · 2025-12-25T09:33:54 1766655234

Your way off, this reads more like anti capitalist political rhetoric than real reasoning.

Look at Nvidia nemotron series. They hav become a leading open source training lab themselves and they’re releasing the best training data, training tooling, and models at this point.

nbardy · 2025-12-16T06:06:00 1765865160

When are people going to drop the immigration is good at all costs assumption.

We need a well managed set of immigration polices or country WILL take advantage of US. These are our military rivals and we sell our most advanced math, physics and engineering seats to the highest bidder. It’s a self districting disaster and it’s not just on us to treat people better.

Look at the rate of Indian asylum seekers in Canada to see the most extreme case. It happens anywhere you extend naivety and boundless good will.