Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s surprisingly difficult to emulate the `vcvtps2ph` instruction. I recently tried to, and tested a few open-source libraries. None of them did the trick, they all failed. I made a test to convert all 4 billion of possible FP32 values, and compare the outcomes.

I gave up, and that software now requires a CPU with F16C. Fortunately it’s not that bad anymore, the ISA extension was introduced in AMD Bulldozer (2011) and Intel Ivy Bridge (2012). Most people are using computer newer than 10 years old, I think in 2023 that’s a reasonable requirement.



Assuming that you're after "round to nearest with ties toward even", then the quoted numpy code gets very close to `vcvtps2ph`, and one minor tweak gets it to bitwise identical: replace `ret += (ret == 0x7c00u)` with `ret |= 0x200`. Alternatively, the quoted Maratyszcza code gets to the same place if you replace `& 0x7c00u` with `& 0x7dffu`.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: