Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I can only give you some guesses, but I think there's some false sharing type of bug going on. There's also an issue with the fact that LLVM hasn't released a target for my CPU yet (I use a Zen 4), and who knows what bugs are caused by targeting the wrong CPU.

In a single threaded version, it beats C#, though not by as much as I would have expected. The essence is that I have to run the same calculation on a large array of doubles that spits out another array of doubles, so I parallelize it with SIMD and threads. In C# I max out at AVX-2 for instruction level parallelism, but for Rust, I use AVX-512, and it's not even 2x faster, though it is faster--it should be more than 2x faster because AVX-512 has better instructions to work with. But when I combine this with doing the calculation in threaded parallel chunks on the array, it goes far slower than it should.



> I use AVX-512, and it's not even 2x faster, though it is faster--it should be more than 2x faster because AVX-512 has better instructions to work with. But when I combine this with doing the calculation in threaded parallel chunks on the array, it goes far slower than it should.

You might be saturating your memory bandwidth to the point where it just can't go any faster. Since it seems your problem is easy to parallelize, you might want to experiment with the rust-gpu ecosystem.


I will say that when I do the same parallelization scheme using non-avx operations, it accelerates properly and goes far faster than the avx versions. One interesting caveat is when the compiler autovectorizes non-intrinsic code, the problem persists.


Is it not because AVX-512 sets your CPU frequency to be lower?


Unclear, but the sources I've read say that's just an Intel issue.

https://www.phoronix.com/review/amd-zen4-avx512/6




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: