We do use the host FPU for a subset of floating point operations now. However it only really works for clean 32/64 IEEE FP so anything that goes through the x87 still needs software emulation.
Since v4.0.0 (see https://git.qemu.org/?p=qemu.git;a=commitdiff;h=a94b783952cc...). We are always up for improving the code generation quality but of course there is a trade off with JITs given we are not compilers. I suspect there are still big wins if we can come up with a reasonable solution for re-generating a hot-path of basic blocks with much better optimisation.
Thank you. A tiered JIT will indeed cause much more optimisations to be made possible.
I made a Qemu backend targeting LLVM before, but that turned out to be way too heavy to be usable, I wonder if it's worth revisiting that idea nowadays...