More

unsupp0rted · 2026-04-10T13:01:23 1775826083

How about adjusted for people who are welding things underwater or throwing 25 kilo garbage bins into a truck?

lkey · 2026-04-10T13:27:20 1775827640

Thanks for being a big strong tall sexy garbage welder man from Europe commenting on an US specific article.

When women put on scuba gear or touch a arc welder their ovaries explode, killing them instantly.

55lbs also is well known to instantly disable women who do, in fact, labor in physical jobs.

The moment a child turns 7 and reaches that weight, women lose the ability to move the child at all, leading to tragic outcomes.

In all seriousness, just because a certain job is socialized a certain way, doesn't imply a fundamental lack of ability.

Being the only woman on a certain kind of male team sucks so so much more than having to put on scuba gear or lifting something heavy.

unsupp0rted · 2026-04-10T14:42:40 1775832160

This isn't the best way to use HN.

> Be kind. Don't be snarky. Converse curiously;

https://news.ycombinator.com/newsguidelines.html

lkey · 2026-04-10T16:35:49 1775838949

Do you think your comment was any of those things?

Why are you free from the bounds of courtesy, but fall back on them the moment someone responds in kind?

Do you really believe that women cannot arc weld? Do you really believe that women cannot scuba dive? Do you really believe that women cannot lift 25kg or take out the trash?

If you do, then it's rather unkind and incurious to cast half of the human population as so fundamentally incapable, and you should bring evidence to support your assertion.

If you don't actually believe it, then the substance of your comment is unkind snark, no?

Your original comment is just a warmed-over version of "women wouldn't last a day in a real man job"

We've been hearing that repeated ad infinitum as women have continued to prove the assertion wrong.

For the record though, I don't think anyone should do a job that has a 1/15 fatality rate. Men shouldn't be cannon fodder either.

Tangurena2 · 2026-04-10T13:17:57 1775827077

Underwater welding does pay extremely well. But to break into that field needs a commercial diving license and lots of time/skill at saturation diving. Basically, it is easier to train a diver to weld than it is to teach a welder to be a commercial diver. Many (US) entrants are ex-SEAL. There's about 20-25k such people worldwide and the death rate is about 1 in 15. Career ending injuries are about the same. If you need nightmare fuel: "Byford Dolphin".

defrost · 2026-04-10T13:30:57 1775827857

Who still lifts garbage cans into a truck, male or female?

It's been cab and camera operated hydraulic lifts on wheelie bins for decades here.

esseph · 2026-04-10T13:41:38 1775828498

Uh, anywhere around me in the US.

unsupp0rted · 2026-04-08T11:26:52 1775647612

Because it takes years to get paradigm-shifting tech off the ground.

You need hyper-obsessive nerds (the kind who were hyper-obsessive in school about D&D and LotR) to bring it to fruition.

The non-LotR namers give up early.

colesantiago · 2026-04-08T11:41:02 1775648462

This makes absolutely no sense.

unsupp0rted · 2026-04-08T15:26:26 1775661986

To be fair it makes some sense. Otherwise where do you get Anduril?

fearmerchant · 2026-04-08T12:10:46 1775650246

And yet somehow it works...

unsupp0rted · 2026-04-08T11:19:44 1775647184

It’s great on AI Studio. Harness issues, I agree.

unsupp0rted · 2026-04-08T10:46:01 1775645161

Codex 5.4 medium couldn’t figure out how to run tests in my staging Cloudflare so it went ahead and ran those tests against prod. Mission accomplished.

Yes, agents.md yells not to mess with prod.

Orygin · 2026-04-08T14:47:53 1775659673

> Yes, agents.md yells not to mess with prod.

Probably what nudged it to run on prod in the first place

unsupp0rted · 2026-04-07T12:30:58 1775565058

I’m sure they did demand that

unsupp0rted · 2026-04-07T10:25:48 1775557548

It's better PR to close issues and tell users they're holding it wrong, and meanwhile quietly fix the issue in the background. Also possibly safer for legal reasons.

unsupp0rted · 2026-04-06T22:16:02 1775513762

It's fairly simple to identify very smart people, but it takes some time. You ask them what their goals and predictions are, and then watch for a while.

I've noticed the smarter a person is, the fewer qualms they have about sharing exactly what they're aiming to do.

This approach is also a simple way to identify stupid people, but for stupid people there are much quicker methods. And stupid people tend to be cagey, because they have fewer tools for identifying when somebody is trying to take advantage of them, and because they've got experience being taken advantage of.

neonstatic · 2026-04-06T22:31:52 1775514712

I disagree with some of your observations.

Being taken advantage of is not only a function of intelligence. It's also a function of emotional health. Sure, if the person is incapable of understanding they are being taken advantage of, they will be. But one can be perfectly capable of understanding that, see it happen in real time, and let it happen anyway. That has been the case with me for a long time. I could see, but I could not stop it, because I have been emotionally conditioned to allow it. Took a long time to fix.

There is also a risk of confusing a smart person with a person who speaks well. We have a built-in heuristic, that language signals intelligence. To a large extent it does, of course, but it can be deceptive. I've grown very weary of well-spoken people, who seem to want me to think they are also very smart.

Lastly, higher intelligence does not mean the person is a better human being. I find that there is an obsession with intelligence in the West. "Stupid" people can be really lovely and better companions than smart ones. There is something to be said about kindness and honesty.

> I've noticed the smarter a person is, the fewer qualms they have about sharing exactly what they're aiming to do.

I used to be like that. Openly speaking about what I aim to do and how. I ended up moderating that quality a fair bit after noticing some people began copying my ideas or outright stealing them. I was to slow to execute.

unsupp0rted · 2026-04-06T22:37:05 1775515025

The nice thing about observing whether someone is accomplishing what they set out to accomplish is it doesn't matter how well spoken they seemed.

I've found that especially smart people have preternatural bullshit detectors, even when they lack "emotional health" or the ability to socialize well with others.

Smart people can be lovely, stupid people can be lovely, golden retrievers can be lovely... but that's tangential.

neonstatic · 2026-04-06T22:46:32 1775515592

> I've found that especially smart people have preternatural bullshit detectors

I really disagree with that. So many smart people fell for obvious bullshit because it appealed to their intellectualism. Look at all the communist sympathizers in the West. Morons, but also intelligent people most of the time. They believed stories spread by the soviet propaganda, because they wanted to believe them.

> The nice thing about observing whether someone is accomplishing what they set out to accomplish is it doesn't matter how well spoken they seemed.

It's funny that you say that - there's another poster in this thready who claims that looking at the output is the stupid people's way of evaluating intelligence. Seems like we really have no idea how to tell (except for an actual IQ test)

> Smart people can be lovely, stupid people can be lovely, golden retrievers can be lovely... but that's tangential.

Yep, I was just making a note, that intelligence might be overrated as a trait.

unsupp0rted · 2026-04-07T00:10:22 1775520622

Pretty much by definition: if you fall for bullshit so long as it appeals to your vanity, then you’re not especially smart.

neonstatic · 2026-04-07T00:53:20 1775523200

Is Chomsky smart?

unsupp0rted · 2026-04-07T09:48:28 1775555308

Is a 0.01% outlier a member of the class I'm describing? No, he's not.

Because when I say "smart people", I'm not saying "every single smart person on the planet".

unsupp0rted · 2026-04-06T21:33:39 1775511219

I suspect there's some other category, which isn't really a sociopath and isn't really a not-sociopath, which we don't have a good definition for.

We only say a lot of CEOs are sociopaths because they're in that third category we haven't named, where they're very good at manipulating people, but also can feel conscience, guilt, remorse, etc, perhaps just muted or easier to justify against.

E.g. if you think you're doing something for the betterment of mankind, it doesn't really matter if you lie to some board members some year during the multi-decade pursuit.

xg15 · 2026-04-06T23:05:22 1775516722

That's not a third category, that's just a sociopath as seen by themself.

unsupp0rted · 2026-04-07T00:06:01 1775520361

I doubt most sociopaths, when they’re honest, would agree they feel much guilt or remorse at all.

Whereas the people in the category I’m describing might feel those things, but prioritize those feelings far below the benefits of achieving what they set out to achieve.

game_the0ry · 2026-04-07T01:09:05 1775524145

> I doubt most sociopaths, when they’re honest, would agree they feel much guilt or remorse at all.

Yes that is the core trait I highlighted in the 1st bullet.

game_the0ry · 2026-04-07T01:08:24 1775524104

> I suspect there's some other category, which isn't really a sociopath and isn't really a not-sociopath, which we don't have a good definition for.

There is -- I call it "corpo sociopath." The corpo sociopath really comes out in the workplace, less so in personal life.

NeutralCrane · 2026-04-08T14:22:32 1775658152

I think it’s learned sociopathy. People who start out knowing that a particular behavior is wrong, but over time are conditioned to feel like it’s fine, at least in certain situations (the corporate world being a prime example).

unsupp0rted · 2026-04-06T21:26:54 1775510814

Many of us prefer OpenAI's Codex, because we think it's a better product.

No comment on the CEO: I just find the product superior in everything but UI/UX and conversation. It's better at quality code.

mliker · 2026-04-06T21:35:10 1775511310

Who is “us”? It does seem that some scientists prefer Codex for its math capabilities but when it comes to general frontend and backend construction, Claude Code is just as good and possibly made better with its extensive Skills library.

Both codex and Claude code fail when it comes to extremely sophisticated programming for distributed systems

keldaris · 2026-04-07T00:56:29 1775523389

As a scientist (computational physicist, so plenty of math, but also plenty of code, from Python PoCs to explicit SIMD and GPU code, mostly various subsets of C/C++), I can confirm - Codex is qualitatively better for my usecases than Claude. I keep retesting them (not on benchmarks, I simply use both in parallel for my work and see what happens) after every version update and ever since 5.2 Codex seems further and further ahead. The token limits are also far more generous (and it matters, I found it fairly easy to hit the 5h limit on max tier Claude), but mostly it's about quality - the probability that the model will give me something useful I can iterate on as opposed to discard immediately is much higher with Codex.

For the few times I've used both models side by side on more typical tasks (not so much web stuff, which I don't do much of, but more conventional Python scripts, CLI utilities in C, some OpenGL), they seem much more evenly matched. I haven't found a case where Claude would be markedly superior since Codex 5.2 came out, but I'm sure there are plenty. In my view, benchmarks are completely irrelevant at this point, just use models side by side on representative bits of your real work and stick with what works best for you. My software engineer friends often react with disbelief when I say I much prefer Codex, but in my experience it is not a close comparison.

Scene_Cast2 · 2026-04-07T13:32:20 1775568740

Have you tried the latest (3.1 pro) Gemini? In my experience, it's notably better for a similar type of problems than Opus 4.6. However, I don't really use OpenAI products to compare.

keldaris · 2026-04-07T21:25:33 1775597133

I actually haven't - I tried Gemini 3.0 Pro in Antigravity and was disappointed enough that I didn't pay much attention to the 3.1 release, it was notably worse than Opus and GPT at the time, and much more prone to "think" in circles or veer off into irrelevant tangents even with fairly precise instruction. I'll give 3.1 a try tomorrow, see what happens.

physicsguy · 2026-04-07T07:58:23 1775548703

I've tried both against similar and haven't found it such a clear cut difference. I still find neither are able to fully implement a complex algorithm I worked on in the past correctly with the same inputs. Not sharing exactly the benchmark I'm using but think about something for improving performance of N^2 operations that are common in physics and you can probably guess the train of thought.

keldaris · 2026-04-07T21:22:25 1775596945

I've had reasonable success using GPT for both neighbor list and Barnes-Hut implementations (also quad/oct-trees more generally), both of which fit your description, haven't tried Ewald summation or PME / P3M. However, when I say "reasonable success", I don't mean "single shot this algo with a minimal prompt", only that the model can produce working and decently optimized implementations with fairly precise guidance from an experienced user (or a reference paper sometimes) much faster than I would write them by hand. I expect a good PME implementation from scratch would make for a pretty decent benchmark.

physicsguy · 2026-04-08T08:04:13 1775635453

Think another level of complexity of algorithm, different expansion bases plus a mix of input sources. Also not trying to one-shot it.

tirutiru · 2026-04-07T20:20:57 1775593257

I can roughly guess the train of thought and I am a bit surprised that Claude is failing you.

That said, I am puzzled at the algorithms that Claude & GPT "get" and ones that they do not.

(former physicist here. would love to know the kind of things you're working on. email on my profile)

ricksunny · 2026-04-07T02:48:47 1775530127

>As a scientist (computational physicist,

Is there one that you prefer for, i dunno, physics?

zeroxfe · 2026-04-06T22:13:28 1775513608

I'm in that camp -- I have the max-tier subscription to pretty much all the services, and for now Codex seems to win. Primarily because 1) long horizon development tasks are much more reliable with codex, and 2) OpenAI is far more generous with the token limits.

Gemini seems to be the worst of the three, and some open-weight models are not too bad (like Kimi k2.5). Cursor is still pretty good, and copilot just really really sucks.

the__alchemist · 2026-04-07T01:41:35 1775526095

Claude Code, Codex, and Cursor are old news. If you're having problems, it's because you're not using the latest hotness: Cludge. Everyone is using it now - don't get left behind.

outside1234 · 2026-04-07T03:34:57 1775532897

Cludge has been left behind by Clanker, that’s the new hotness. 45B valuation!

p-t · 2026-04-07T13:07:59 1775567279

ive heard that poob has it for you!

unsupp0rted · 2026-04-06T21:47:34 1775512054

Us = me and say /r/codex or wherever Codex users are. I've tried both, liked both, but in my projects one clearly produces better results, more maintainable code and does a better job of debugging and refactoring.

sampullman · 2026-04-06T21:54:28 1775512468

That's interesting, I actively use both and usually find it to be a toss up which one performs better at a given task. I generally find Claude to be better with complex tool calls and Codex to be better at reviewing code, but otherwise don't see a significant difference.

SOLAR_FIELDS · 2026-04-06T23:45:29 1775519129

If you want to find an advocate for Codex that can give a pretty good answer as to why they think it's better, go ask Eric Provencher. He develops https://repoprompt.com/. He spends a lot of time thinking in this space and prefers Codex over Claude, though I haven't checked recently to see if he still has that opinion. He's pretty reachable on Discord if you poke around a bit.

hirako2000 · 2026-04-07T08:25:53 1775550353

Quite irrelevant what factions think. This or that model may be superior for these and those use cases today, and things will flip next week.

Also. RLHF mean that models spit out according to certain human preference, so it depends what set of humans and in what mood they've been when providing the feedback.

SOLAR_FIELDS · 2026-04-07T14:31:00 1775572260

On the contrary, I very much care about what the other factions think because I want to know if things have already flipped and the easiest way to do so is just ask someone who's been using the tool. Of course the correct thing to do is to set up some simple evals, but there is a subjective aspect to these tools that I think hearing boots on the ground anecdata helps with.

tharkun__ · 2026-04-08T04:41:04 1775623264

Haven't done it in a while, but I've done some tasks with both Codex and Claude to compare. In all cases I asked both to put their analysis and plans for implementation into a .md file. Then I asked the other agent to analyze said file for comparison.

In general, Claude was impressed by what Codex produced and noted the parts where it (i.e. Claude) had missed something vs. Codex "thinking of it".

From a "daily driver" perspective I still use Claude all the time as it has plan mode, which means I can guarantee that it won't break out and just do stuff without me wanting it to. With Codex I have to always specify "Don't implement/change, just tell me" and even then it sometimes "breaks out" and just does stuff. Not usually when I start out and just ask it to plan. But after we've started implementation and I review, a simple question of "Why did you do X?" will turn into a huge refactoring instead of just answering my question.

To be fair, that's what most devs do too (at least at first), when you ask them "Why did you do X" questions. They just assume that you are trying to formulate a "Do Y instead of X" as a question, when really you just don't understand their reasoning but there really might be a good reason for doing X. But I guess LLMs aren't sure of themselves, so any questioning of their reasoning obliterates their ego and just turns them into submissive code monkeys (or rather: exposes them as such) vs. being software engineers that do things for actual reasons (whether you agree with them or not).

cher88 · 2026-04-08T14:04:48 1775657088

Codex has plan mode too - /plan

aswanson · 2026-04-06T22:31:26 1775514686

Any difference in performance on mobile development?

sampullman · 2026-04-07T00:07:03 1775520423

For that I'm not so sure. I tried both early 2025 and was disappointed in their ability to deal with a TCA based app (iOS) and Jetpack compose stuff on Android, but I assume Opus 4.6 and GPT 5.4 are much better.

rocketpastsix · 2026-04-06T23:44:51 1775519091

yea Im not in this "us" you speak of.

Finbel · 2026-04-07T06:52:52 1775544772

Of course you're not one of "us" if you're one of "them".

zem · 2026-04-06T23:18:12 1775517492

I've found claude startlingly good at debugging race conditions and other multithreading issues though.

josephg · 2026-04-06T23:47:31 1775519251

My rule of thumb is that its good for anything "broad", and weaker for anything "deep". Broad tasks are tasks which require working knowledge of lots of random stuff. Its bad at deep work - like implementing a complex, novel algorithm.

LLMs aren't able to achieve 100% correctness of every line of code. But luckily, 100% correctness is not required for debugging. So its better at that sort of thing. Its also (comparatively) good at reading lots and lots of code. Better than I am - I get bogged down in details and I exhaust quickly.

An example of broad work is something like: "Compile this C# code to webassembly, then run it from this go program. Write a set of benchmarks of the result, and compare it to the C# code running natively, and this python implementation. Make a chart of the data add it to this latex code." Each of the steps is simple if you have expertise in the languages and tools. But a lot of work otherwise. But for me to do that, I'd need to figure out C# webassembly compilation and go wasm libraries. I'd need to find a good charting library. And so on.

I think its decent at debugging because debugging requires reading a lot of code. And there's lots of weird tools and approaches you can use to debug something. And its not mission critical that every approach works. Debugging plays to the strengths of LLMs.

DeathArrow · 2026-04-07T07:26:44 1775546804

Many paying customers say that Anthropic degraded the capability of Opus and Claude Code in the last months and the outcomes are worse. There are even discussions on HN about this.

Last one is from yesterday: https://news.ycombinator.com/item?id=47660925

lhl · 2026-04-07T06:10:38 1775542238

As some other people mentioned, using both/multiple is the way to go if it's within your means.

I've been working on a wide range of relatively projects and I find that the latest GPT-5.2+ models seem to be generally better coders than Opus 4.6, however the latter tends to be better at big picture thinking, structuring, and communicating so I tend to iterate through Opus 4.6 max -> GPT-5.2 xhigh -> GPT-5.3-Codex xhigh -> GPT-5.4 xhigh. I've found GPT-5.3-Codex is the most detail oriented, but not necessarily the best coder. One interesting thing is for my high-stakes project, I have one coder lane but use all the models do independent review and they tend to catch different subsets of implementation bugs. I also notice huge behavioral changes based on changing AGENTS.md.

In terms of the apps, while Claude Code was ahead for a long while, I'd say Codex has largely caught up in terms of ergonomics, and in some things, like the way it let's you inline or append steering, I like it better now (or where it's far, far, ahead - the compaction is night and day better in Codex).

(These observations are based on about 10-20B/mo combined cached tokens, human-in-the-loop, so heavy usage and most code I no longer eyeball, but not dark factory/slop cannon levels. I haven't found (or built) a multi-agent control plane I really like yet.)

kasey_junk · 2026-04-07T11:37:35 1775561855

Codex won me over with one simple thing. Reliability. It crashed less, had less load shedding and its configuration is well designed.

I do regular evaluation of both codex and Claude (though not to statistical significance) and I’m of the opinion there is more in group variance on outcome performance than between them.

baq · 2026-04-07T08:35:49 1775550949

This is the way. Eg. IME Gemini is really damn good at sql.

Razengan · 2026-04-07T14:39:29 1775572769

I have been using Codex AND Claude side by side for the same project*, with the same prompts.

Codex has been consistently better on almost every level.

* (an open source framework for 2D games in Godot 4.6 GDScript, mostly using AI to review existing code)

7thpower · 2026-04-06T23:00:21 1775516421

Not a scientist and use codex for anything complex.

I enjoy using CC more and use it for non coding tasks primarily, but for anything complex (honestly most of what I do is not that complex), I feel like I am trading future toil for a dopamine hit.

baq · 2026-04-07T08:33:12 1775550792

I’m one of those ‘us’, Claude’s outputs require significant review and iteration effort (to put it bluntly they get destroyed by gpt and Gemini). I’m basically using sonnet to do code search and write up since it is a better (more human-like) writer than gpt and faster and more reliable than gemini, but that’s about it.

bko · 2026-04-07T00:26:01 1775521561

I also find Codex much more generous in terms of what you get with a Pro ($20/mo) subscription. I use it pretty much non-stop and I have yet to hit a limit. Weekly reset is much better as well.

DeathArrow · 2026-04-07T07:38:36 1775547516

I prefer GLM 5.1 and MiniMax 2.7. With a better harness like Forge Code, I have better results for way less money than by using GPT and Opus.

jbergqvist · 2026-04-07T11:55:44 1775562944

Usage limits are more generous and GPT 5.4 is a good model, but yes, UI/UX lags behind Claude Code. Currently I'm especially missing /rewind with code restoration and proper support for plugin marketplaces

KaiserPro · 2026-04-07T08:14:49 1775549689

GPT/claude/gemini is pretty interchangeable at this point.

baq · 2026-04-07T08:37:04 1775551024

Absolutely not the case. They're complementary.

shevy-java · 2026-04-07T07:12:44 1775545964

Does this work for people? To me having a "better product" would be completely irrelevant if the use cases are evil.

thaoanh404 · 2026-04-07T08:56:10 1775552170

i find myself being more productive with codex/copilot on coding tasks, but claude does seem to be better at planning

aaa_aaa · 2026-04-07T04:45:01 1775537101

Shill talk

unsupp0rted · 2026-04-05T23:17:08 1775431028

I've tried z.ai, qwen, deepseek, minmax... they've all been barely half as capable as a middling Codex model.

muyuu · 2026-04-06T00:18:32 1775434712

i would try kimi

Qwen has also been improving recently, in fact most have, so depending on when you last tried them you can try again and see how they work for you

My local Qwen is decent for some things, Kimi is decent for most things and occasionally it has been able to do better than Opus and GPT 5.4 on particular tasks

it will soon be very costly to stay in just one provider