That's quite interesting and a huge work has been done here, respect for that. H...

bodyfour · on Aug 5, 2024

> in that case, a compiler still must set something up to fulfil the main `noexcept` promise - call `std::terminate()`

This is actually something that has been more of a problem in clang than gcc due to LLVM IR limitations... but that is being fixed (or maybe is already?) There was a presentation about it at the 2023 LLVM Developer's meeting which was recently published on their youtube channel https://www.youtube.com/watch?v=DMUeTaIe1CU

The short version (as I understand) is that you don't really need to produce any code to call std::terminate, all you need is tell the linker it needs to leave a hole in the table which maps %rip to the required unwind actions. If the unwinder doesn't know what to do, it will call std::terminate per the standard.

IR didn't have a way of expressing this "hole", though, so instead clang was forced to emit an explicit "handler" to do the std::terminate call

terrymah · on Aug 6, 2024

In MSVC we've also pretty heavily optimized the whole function case such that we no longer have a literal try/catch block around it (I think there is a single bit in our per function unwind info that the unwinder checks and kills the program if it encounters while unwinding). One extra branch but no increase in the unwind metadata size

The inlining case was always the hard problem to solve though

zokier · on Aug 5, 2024

> then unfortunately it's totally invalid since it measures time with the `std::chrono::system_clock` which isn't monotonic. Given how long the code required to run, it's almost certain that the clock has been adjusted several times

monotonic clocks are mostly useful for short measurement periods. for long-term timing wall-time clocks (with their adjustments) are more accurate because they will drift less.

Arech · on Aug 6, 2024

Ah, that's a great correction, thank you! Yes, indeed, due to a drift, in order to discern second+ (?) differences on different machines (or same machines, but different OSes?), one definitely needs to use a wall-clock time, otherwise it's comparing apples to oranges. There's a lot of interesting questions related to that, but they out of the scope of the thread. If I'm not mistaken the author has also timed some individual small functions, which, if correct, still poses a problem to me, but for measuring huge long running tasks like a full suite running 10+ hours, they are probably right in choosing wall-clock timer indeed.

However, before researching into results any further (for example, -10% difference for `noexcept` case is extremely interesting to debug up to the root cause), I'd still like to understand how the code was run and measured exactly. I didn't find a plausible looking benchmark runner in their code base.