More

cmovq · 2026-03-27T20:40:08 1774644008

Even the function names are identical :/

Rexxar · 2026-03-28T00:53:57 1774659237

The difference in size seems to be mainly due to the missing documentation before each function.

cmovq · 2026-03-26T18:43:56 1774550636

I mean it kind of is considering that's comparable to a 5070 which has 672 GB/s? Benefit of NVIDIA being the only one using GDDR7 for now I guess.

daemonologist · 2026-03-26T18:48:09 1774550889

7800 XT has 624 GB/s as well, and can be found for $400 used. 16 GB of course.

BizarroLand · 2026-03-27T17:20:23 1774632023

I've heard ROCm is still a crapshoot though. Is that true?

daemonologist · 2026-03-28T13:26:07 1774704367

If you stick with your OS/package manager-distributed version, installation isn't painful anymore (provided that version approximately overlaps with your generation of GPU). It's okay for inference, and okay for training if you don't stray too far beyond plain torch. If you want to run code from a paper or other more esoteric stuff you're still going to have a bad time.

I don't have an Intel dGPU, but I suspect the situation there is even worse. I mean you go to the torch homepage: https://pytorch.org/get-started/locally/ and Intel isn't even mentioned. (It's here though: https://docs.pytorch.org/docs/stable/notes/get_start_xpu.htm...)

cmovq · 2026-03-20T21:21:31 1774041691

I've always assumed minifiers were a kind of lossless compression. I guess this optimization makes it lossy? Even if we can't tell the difference between oklch(0.659432 0.304219 234.75238) and oklch(.659 .304 234.752) they're still different colors.

cyanydeez · 2026-03-21T09:50:08 1774086608

Tge whole contexr of the article is answering your question: a "different color" is a specification laden structure and the answer is no according to spec.

cmovq · 2026-03-20T15:56:42 1774022202

When you're using a programming language that naturally steers you to write slow code you can't only blame the programmer.

I was listening to someone say they write fast code in Java by avoiding allocations with a PoolAllocator that would "cache" small objects with poolAllocator.alloc(), poolAllocator.release(). So just manual memory management with extra steps. At that point why not use a better language for the task?

andai · 2026-03-20T20:06:52 1774037212

I decompiled Project Zomboid (written in Java) a while back, because I was curious about the performance issues I was having with the game. (Very laggy on my 10 year old laptop, while looking like The Sims 1.) I figured, best case scenario I find some easy bottlenecks and I can patch in a fix.

Well, the whole thing was standard Java OOP, except they also had a bunch of functional programming stuff on top of that. I can relate to that -- I think they were university students when they started, and I definitely had an OOP and FP phase. But then they just... kept it, 10+ years later.

So while it's true that you can write C in any language... those kind of folks don't tend to use Java in the first place ;)

--

(Except Notch? Well, his code looks like C, not sure if it's actually fast! I really enjoyed his 4 kilobyte java games back in the day, I think he published the source for each one too.)

EDIT: Found it!

https://web.archive.org/web/20120317121029/http://www.mojang...

Edit 2: This one has a download, still works!

https://web.archive.org/web/20120301015921/http://www.mojang...

bigwheels · 2026-03-20T21:30:57 1774042257

What is the ending of your story!? Did you find and fix some bottlenecks?

ivan_gammel · 2026-03-20T17:24:06 1774027446

TBH, I do not see how Java as a language steers anyone to use one those shotguns. E.g. the knowledge about algorithmic complexity is foundational, the StringBuilder is junior-level basic knowledge.

nightpool · 2026-03-20T19:55:02 1774036502

How would you handle validating numeric input in a hot path then? All of the solutions proposed in #5 are incomplete or broken, and it stems from the fact that Java's language design over-uses exceptions for error handling in places where an optional value would be much safer and faster.

ivan_gammel · 2026-03-20T20:02:22 1774036942

Normally in 100% cases, with parseInt/parseDouble etc. Getting NumberFormatException so frequently on a hot path that it impacts performance means, that you aren’t solving the parsing number problem, you are solving a guessing type problem, which is out of scope for standard library and requires custom parser.

nightpool · 2026-03-20T20:07:40 1774037260

Okay, but this contradicts your original statement that "Java doesn't steer anyone to use these [footguns]". Every language has a way to parse integers, and most developers do not need a custom parser. Only in Java does that suddenly become a performance footgun.

ivan_gammel · 2026-03-20T20:29:26 1774038566

It does not. If you need to parse a number, you use standard library and you will be fine. The described case with huge impact on hot path is the demonstration why using brains is important. The developer that will get into this mess is the one who will find the way to suffocate his code with performance bottlenecks in thousand other ways. It’s not a language or library problem.

dionian · 2026-03-20T21:02:30 1774040550

Yes, parseInt et al work very fast for good inputs. What percentage of your inputs are invalid numbers and why ?

ivan_gammel · 2026-03-20T21:19:46 1774041586

> What percentage of your inputs are invalid numbers and why ?

This is a wrong question to ask in this context. The right question to ask is when actually exceptional flow becomes a performance bottleneck. Because, obviously, in a desktop or even in a server app validating single user input even 99% of wrong inputs won’t cause any trouble. It may become a problem with bulk processing, but then, and I have to repeat myself here, it is no longer a number parsing problem, it’s a problem of not understanding what your input is.

kerblang · 2026-03-20T21:54:02 1774043642

> Java's language design over-uses exceptions for error handling

No, library authors' design over-uses exceptions. Also refer to people using exceptions to generate 404 http responses in web systems - hey, there's an easy DDOS... This can include some of Java's standard libraries, although nothing springs to mind.

Exceptions are not meant for mainstream happy-path execution; they mean that something is broken. Countless times I have had to deal garbage-infested logs where one programmer is using exceptions for rudimentary validation and another is dumping the resulting stack traces left and right as if the world is coming to end.

It is a problem, but it's an abuse problem, not a standard usage problem.

nightpool · 2026-03-23T20:57:07 1774299427

I agree with you that the root problem is that the library author's design over-uses exception. But when the library in question is the standard library and the operation is as basic as Integer.parseInt, then I think it's fair to criticize that as a language issue, because the standard library sets the standard for what is idiomatic + performant for a language.

kerblang · 2026-03-24T17:31:36 1774373496

There is nothing wrong with Integer.parseInt(). It blows up if you give it invalid input. That's standard idiomatic behavior.

It might be helpful to have Integer.validateInt(String), but currently it's up to author to do that themselves.

cxr · 2026-03-20T19:36:17 1774035377

The problem with comments like these is that guessing what "better language" a commentator has in mind is always an exercise left up to the reader. And that tends to be by design—it's great for potshots and punditry, because it means not having make a concrete commitment to anything that might similarly be confronted and torn apart in the replies—like if the "better language" alluded to is C (and it generally is)—the language where the standard library "steers" you towards quadratic string operations because the default/natural way to refer to a string's length is O(n).

ablob · 2026-03-20T16:11:05 1774023065

You might have an application for which speed is not important most of the time. Only one or two processes might require allocation-free code. For such a case, why would you burden all of the other code with the additional complexity? Calling out to a different language then may come with baggage you'd rather avoid.

A project might also grow into these requirements. I can easily imagine that something wasn't problematic for a long time but suddenly emerged as an issue over time. At that point you wouldn't want to migrate the whole codebase to a better language anymore.

kykat · 2026-03-20T20:33:45 1774038825

I saw something like that being suggested when working with GIS data with many points as classes in Java, the object overhead for storing XYZ doubles is quite crazy. The optimization was to build a global double array and use "pointers" to get and set the number in the array.

Even JavaScript is much better for this, much, much better.

spankalee · 2026-03-20T21:24:11 1774041851

JavaScript has the exact same issue - objects are on the heap and require allocation and pointer dereferencing. For huge collections of numbers, arrays might be better.

But JS has another problem: there's no way to force a number to be unboxed (no primitive vs boxed types), so the array of doubles might very well be an array of pointers to numbers[1].

But with hidden class optimizations an object might be able to store a float directly in a field, so the array of objects will have one box per (x,y,z), while an array of "numbers" might have one box per number, so 3x as many. My guess is, without benchmarking, is that JS is much worse than Java then, because the "optimization" will end up being worse.

[1]: Most JS engines have an optimization for small ints, called SMIs, that use pointer tagging to support either an int or a references, but I don't think they typically do this optimization for floats.

lern_too_spel · 2026-03-20T20:44:38 1774039478

You're describing array of structs vs. struct of arrays. Even in JavaScript, you would have to manually do the latter.

kykat · 2026-03-20T20:45:46 1774039546

V8 automatically optimizes objects with the same shape into efficient structs, making array of objects much more efficient than in Java.

And the manually manager int array acts more like system memory, it's not continuous, so you could have point i 0 and 2 and the data would be: [1, 2, 3, x, x,x, 3, 2, 1] (3D points).

So I am not describing a struct of arrays.

lern_too_spel · 2026-03-21T00:29:35 1774052975

"Efficient structs" in v8 are just inefficient Java classes. https://www.dashlane.com/blog/how-is-data-stored-in-v8-js-en...

The optimization you discussed for GIS data is called struct of arrays. JavaScript does not do that automatically for you. You would have to do the same thing manually in JavaScript to avoid per-triple object overhead.

spankalee · 2026-03-20T21:25:57 1774041957

Hidden class optimizations just make JavaScript objects behave a little more like Java class instances, where the VM knows where to find each field, rather than having to look it up like a map.

It doesn't make JS faster than Java, it makes it almost as fast in some cases.

kykat · 2026-03-20T22:35:11 1774046111

I can only say what I observed in testing, and that's that having millions of instances of a class like Point3D{x, y, z} in JS uses significantly less memory than in Java (this was tested on Android, not sure if relevant). It was quite some time ago so I don't remember the details.

gf000 · 2026-03-21T07:39:54 1774078794

Well, Android is not running Java, it runs Android runtime (dalvik) byte code. In general, depending on when it was, that runtime is much much simpler and does a lot more at compile time at the expense of less JIT optimization.

It's also many versions behind the Java API (depending on when it happened).

So your data point is basically completely irrelevant.

ekkeke · 2026-03-20T18:41:53 1774032113

This point gets raised every single time managed languages and low latency development come up together. The trade off is running "fast" all of the time, even when you don't have to, vs running slow most of the time and tinkering when you need to go fast.

I've spent a fair few years developing lowish (10-20us wire to wire) latency trading systems and the majority of the code does not need to go fast. It's just wasted effort, a debugging headache, and technical debt. So the natural trade off is a bit of pain to make the hot path fast through spans, unsafe code, pre-allocated object pools, etc and in return you get to use a safe and easy programming language everywhere else.

In C# low latency dev is not even that painful, as there are a lot of tools available specifically for this purpose by the runtime.

cogman10 · 2026-03-20T16:21:33 1774023693

Bad idea. I've made a pool allocator before, but that was for expensive network objects and expensive objects dealing with JNI.

Doing it to avoid memory pressure generally means you simply have a bad algorithm that needs to be tweaked. It's very rarely the right solution.

gf000 · 2026-03-20T22:00:16 1774044016

Not sure why you are down voted. Depending on how its used it could actually be detrimental to performance.

The JVM may optimize many short lived objects better than a pool of objects with less reasonably lifetimes.

cogman10 · 2026-03-20T22:46:02 1774046762

This is the second time this week on HN that I've seen people suggesting object pools to solve memory pressure problems.

I generally think it's because people aren't experienced with diagnosing and fixing memory pressure. It's one of the things I do pretty frequently for my day job. I'm fortunate enough to be the "performance" guy at work :).

It'll always depend on what the real issue is, but generally speaking the problem to solve isn't reinventing garbage collection, but rather to eliminate the reason for the allocation.

For example, a pretty common issue I've seen is copying a collection to do transformations. Switching to streams, combining transformation operations, or in an extreme case, I've found passing around a consumer object was the way to avoid a string of collection allocations.

Even the case where small allocations end up killing performance, for example like the autoboxing example of the OP, often the solution is to either make something mutable that isn't, or to switch to primitives (Valhalla can't come soon enough).

Heck, sometimes even an object cache is the right solution. I've had good success reducing the size of objects on the heap by creating things like `Map<String, String>` and then doing a `map.computeIfAbsent(str, Function.identity());` (Yes, I know about string interning, no I don't want these added to the global intern cache).

Regardless, the first step is profiling (JFRs and heap dumps) to see where memory is spent and what is dominating the allocation rate. That's a first step that people often skip and jump straight to fixing what they think is broken.

laughing_man · 2026-03-20T19:20:38 1774034438

Java doesn't steer you into object pools. I wrote Java code for 20 years and never used a cache to avoid allocating objects, and never saw a colleague use one. The person you were talking to doesn't know what he's doing.

d_burfoot · 2026-03-20T18:59:50 1774033190

> So just manual memory management with extra steps

This is actually the perfect situation: you are allowed to do it carefully and manually for 1% of code on the hot path, but you don't have to worry about it for the 99% of the code that's not.

wiseowise · 2026-03-20T22:52:09 1774047129

> At that point why not use a better language for the task?

Such as?

cmovq · 2026-03-19T06:42:34 1773902554

This comment assumes game companies throw away all their code and start from scratch on their next title. Which is completely untrue, games are built on decades old code, like most software. There is absolutely a need for maintainable code.

awesome_dude · 2026-03-19T08:52:57 1773910377

I think that there might be a distinction on indie vs big publishers (like EA) - I would expect someone like EA to enforce coding standards, but they do so whilst living off the income of previous hits.

cmovq · 2026-03-12T20:35:40 1773347740

> we had an unusual discussion about a Python oddity

There are so many discussions about "X language is so weird about it handles numbers!" and it's just IEEE 754 floats.

swiftcoder · 2026-03-12T20:43:11 1773348191

The oddity here is not the float itself, it's that Python provided a default hash implementation for floats

JuniperMesos · 2026-03-12T20:53:08 1773348788

Yeah IEEE 754 floating point numbers should probably not be hashable, and the weird (but standard-defined) behaviour with respect to NaN equality is one good reason for this.

_cogg · 2026-03-12T21:57:03 1773352623

Python supports arithmetic on mixed numeric types, so it makes sense that floats and ints should have a hash function that behaves somewhat consistently. I don't write a lot of python, but having used other scripting languages it wouldn't surprise me if numeric types get mixed up by accident often enough. You probably want int(2) and float(2) to be considered the same key in a dictionary to avoid surprises.

See: https://docs.python.org/3/library/stdtypes.html#hashing-of-n...

swiftcoder · 2026-03-13T07:37:51 1773387471

> You probably want int(2) and float(2) to be considered the same key in a dictionary to avoid surprises

There are a variety of problems here:

- floats can't represent whole numbers exactly as they get further from zero.

- error accumulates in floating point arithmetic, so calculated keys may not match constant keys

- Do you really want 2.0 and 2.00001 to refer to different objects?

happytoexplain · 2026-03-13T13:18:38 1773407918

>floats can't represent whole numbers exactly as they get further from zero.

>error accumulates in floating point arithmetic, so calculated keys may not match constant keys

Yes, but the parent used a toy example. It's a programming fundamental that you shouldn't be converting magic numbers to floats or doing math with floats without considering inaccuracy. These problems apply everywhere programmers use floats - they must understand them, regardless of whether they are using them as hash keys or any other purpose.

>Do you really want 2.0 and 2.00001 to refer to different objects?

Yes, very much so. They are different values. (Assuming you are using 2.00001 as shorthand for "a float close to 2")

The advantage of common primitives being hashable is theoretically very high. In Swift (where NaN hashes to a different value every time, since that makes sense), hashable primitives makes hashability composable, so anything composed of hashable parts is hashable. And hashability is important not just for storage in maps, but all sorts of things (e.g. set membership, comparison for diffs, etc).

The upsides of floats being hashable is potentially much higher than the downsides, which mostly stem from a programmer not understanding floats, not from floats being hashable.

swiftcoder · 2026-03-13T14:31:18 1773412278

> It's a programming fundamental that you shouldn't be converting magic numbers to floats or doing math with floats without considering inaccuracy.

This is all well and good in a language that makes you declare distinct floating point types, but in python things are maybe not so clear cut - the language uses arbitrary precision integers by default, and it's not always crystal clear when they are going to be converted to a (lossy) floating point representation

Yeah, the programmer should probably be aware of the ins and out here, but python folks often aren't all that in the weeds with the bits and the bytes

cmovq · 2026-03-12T16:00:17 1773331217

For a graphics programmer acos(dot(x, y)) always raises an eyebrow. Since most of the time you actually want cos(theta) and even when you think you need the angle you probably don’t.

cmovq · 2026-03-11T18:22:27 1773253347

Not in this case because the dependencies are the same:

Naive: https://godbolt.org/z/Gzf1KM9Tc

Horner's: https://godbolt.org/z/jhvGqcxj1

cmovq · 2026-03-11T18:04:49 1773252289

Compilers cannot do this optimization for floating point [1] unless you're compiling with -ffast-math. In general, don't rely on compilers to optimize floating point sub-expressions.

[1]: https://godbolt.org/z/8bEjE9Wxx

woadwarrior01 · 2026-03-11T18:26:50 1773253610

Right, I totally forgot about floating point non associativity.

cmovq · 2026-03-11T17:44:39 1773251079

> After all of the above work and that talk in mind, I decided to ask an LLM.

Impressive that an LLM managed to produce the answer from a 7 year old stack overflow answer all on its own! [1] This would have been the first search result for “fast asin” before this article was published.

[1]: https://stackoverflow.com/a/26030435

def-pri-pub · 2026-03-11T18:26:38 1773253598

I did see that, but isn't the vast majority of that page talking about acos() instead?

seanhunter · 2026-03-11T21:15:36 1773263736

That’s equivalent right? acos x = pi/2 - asin x

So if you’ve got one that’s fast you have them both.