The many ways of converting FP32 to FP16

Joker_vD · on April 23, 2023

> Sadly, at some point, new hardware makes software tricks obsolete. On x86, said new hardware was the F16C instruction set, which (amongst other things) adds a VCVTPS2PH instruction for converting (a vector of) FP32 to FP16.

"Sadly"? That's subjective, I guess: I personally feel that the less "softfloat" code there is in use around, the better. All this twisted bitshifting/juggling with conditional propagation is pretty well suited for baking into sillicon.

diydsp · on April 23, 2023

Im solving this problem by writing fp16 routines for 6502 :)

fuzzfactor · on April 24, 2023

In my case it wasn't exactly conversion but I had pages of FORTRAN algorithm based on precision numerical constants in FP32, and the outcome needed to be mimicked in 8-bit BASIC using 0.5K of memory for program plus data.

Not enough memory to even store the text of the FORTRAN code.

Integer math was required :/

kzrdude · on April 23, 2023

The converse is also deserving of a 'sadly': the new hardware features can be widely available but still hard to use since they are not going to be used by platform defaults when you compile your program.

stephencanon · on April 23, 2023

F16C has been available for well over a decade (it was introduced in the Ivybridge uArch), and the equivalent conversions are supported on all ARMv7 CPUs since around the same time (and all ARMv8 CPUs unconditionally); most software should be able to safely assume their availability by now.

There exist some niches where this doesn’t hold, but that’s why we have software fallbacks.

corsix · on April 23, 2023

The first niche that came to mind was x86 code running under Rosetta 2; despite ARM having an equivalent to F16C, Rosetta 2 doesn’t translate AVX, and F16C doesn’t have a non-AVX encoding.

stephencanon · on April 23, 2023

Indeed. Worth noting that Accelerate.framework provides fast and correct bulk f16 <-> f32 conversions as `vImageConvert_Planar16FtoPlanarF` and `vImageConvert_PlanarFtoPlanar16F`, and that the arm conversion instructions are unconditionally available for apps that compile for arm64 (they're part of the base ARMv8 ISA), so any _new_ code shouldn't need to worry about this.

Const-me · on April 23, 2023

It’s surprisingly difficult to emulate the `vcvtps2ph` instruction. I recently tried to, and tested a few open-source libraries. None of them did the trick, they all failed. I made a test to convert all 4 billion of possible FP32 values, and compare the outcomes.

I gave up, and that software now requires a CPU with F16C. Fortunately it’s not that bad anymore, the ISA extension was introduced in AMD Bulldozer (2011) and Intel Ivy Bridge (2012). Most people are using computer newer than 10 years old, I think in 2023 that’s a reasonable requirement.

corsix · on April 24, 2023

Assuming that you're after "round to nearest with ties toward even", then the quoted numpy code gets very close to `vcvtps2ph`, and one minor tweak gets it to bitwise identical: replace `ret += (ret == 0x7c00u)` with `ret |= 0x200`. Alternatively, the quoted Maratyszcza code gets to the same place if you replace `& 0x7c00u` with `& 0x7dffu`.

User23 · on April 23, 2023

Having done some work in the formal programming language space, I can say without equivocation that floating point is a hot mess. I’m not saying it’s bad, I’m saying it’s basically impossible to formally reason about in any kind of useful way.

necroforest · on April 26, 2023

the genius of bfloat16 is that conversion is just truncation / zero padding