That's quite interesting and a huge work has been done here, respect for that.
Here's what has jumped out at me: `noexcept` qualifier is not free in some cases, particularly, when a qualified function could actually throw, but is marked `noexcept`. In that case, a compiler still must set something up to fulfil the main `noexcept` promise - call `std::terminate()` if an exception is thrown. That means, that putting `noexcept` on each and every function blindly without any regard to whether the function could really throw or not (for example, `std::vector::push_back()` could throw on reallocation failure, hence if a `noexcept` qualified function call it, a compiler must take into account) doesn't actually test/benchmark/prove anything, since as the author correctly said, - you won't ever do this in a real production project.
It would be really interesting to take a look into a full code of cases that showed very bad performance, however, here we're approaching the second issue: if that's the core benchmark code: https://github.com/define-private-public/PSRayTracing/blob/a... then unfortunately it's totally invalid since it measures time with the `std::chrono::system_clock` which isn't monotonic. Given how long the code required to run, it's almost certain that the clock has been adjusted several times...
> in that case, a compiler still must set something up to fulfil the main `noexcept` promise - call `std::terminate()`
This is actually something that has been more of a problem in clang than gcc due to LLVM IR limitations... but that is being fixed (or maybe is already?) There was a presentation about it at the 2023 LLVM Developer's meeting which was recently published on their youtube channel https://www.youtube.com/watch?v=DMUeTaIe1CU
The short version (as I understand) is that you don't really need to produce any code to call std::terminate, all you need is tell the linker it needs to leave a hole in the table which maps %rip to the required unwind actions. If the unwinder doesn't know what to do, it will call std::terminate per the standard.
IR didn't have a way of expressing this "hole", though, so instead clang was forced to emit an explicit "handler" to do the std::terminate call
In MSVC we've also pretty heavily optimized the whole function case such that we no longer have a literal try/catch block around it (I think there is a single bit in our per function unwind info that the unwinder checks and kills the program if it encounters while unwinding). One extra branch but no increase in the unwind metadata size
The inlining case was always the hard problem to solve though
> then unfortunately it's totally invalid since it measures time with the `std::chrono::system_clock` which isn't monotonic. Given how long the code required to run, it's almost certain that the clock has been adjusted several times
monotonic clocks are mostly useful for short measurement periods. for long-term timing wall-time clocks (with their adjustments) are more accurate because they will drift less.
Ah, that's a great correction, thank you!
Yes, indeed, due to a drift, in order to discern second+ (?) differences on different machines (or same machines, but different OSes?), one definitely needs to use a wall-clock time, otherwise it's comparing apples to oranges. There's a lot of interesting questions related to that, but they out of the scope of the thread. If I'm not mistaken the author has also timed some individual small functions, which, if correct, still poses a problem to me, but for measuring huge long running tasks like a full suite running 10+ hours, they are probably right in choosing wall-clock timer indeed.
However, before researching into results any further (for example, -10% difference for `noexcept` case is extremely interesting to debug up to the root cause), I'd still like to understand how the code was run and measured exactly. I didn't find a plausible looking benchmark runner in their code base.
Here's what has jumped out at me: `noexcept` qualifier is not free in some cases, particularly, when a qualified function could actually throw, but is marked `noexcept`. In that case, a compiler still must set something up to fulfil the main `noexcept` promise - call `std::terminate()` if an exception is thrown. That means, that putting `noexcept` on each and every function blindly without any regard to whether the function could really throw or not (for example, `std::vector::push_back()` could throw on reallocation failure, hence if a `noexcept` qualified function call it, a compiler must take into account) doesn't actually test/benchmark/prove anything, since as the author correctly said, - you won't ever do this in a real production project. It would be really interesting to take a look into a full code of cases that showed very bad performance, however, here we're approaching the second issue: if that's the core benchmark code: https://github.com/define-private-public/PSRayTracing/blob/a... then unfortunately it's totally invalid since it measures time with the `std::chrono::system_clock` which isn't monotonic. Given how long the code required to run, it's almost certain that the clock has been adjusted several times...