This is the wisdom from the old days, and it's still true in many cases.
But I believe compiler developers are trying to keep -O3/-Os working as advertised by their literal meaning, especially when it's combined with -march and -mtune, the cases of performance degradation should be fewer by now. For example, by using the knowledge of the subarchitecture, compilers can optimizing the code for Intel's MicroFusion rather than performing useless loop unrolling that actually degrades performance.
In all benchmarks on Phoronix since GCC 4.9, they showed -Os is almost always slower than -O2 and -O3.
Linux kernel used to prefer -Os at everywhere, but now it has -O2 and -O3 as well. I think there is a measurable performance improvement in benchmarks in some cases.
But I believe compiler developers are trying to keep -O3/-Os working as advertised by their literal meaning, especially when it's combined with -march and -mtune, the cases of performance degradation should be fewer by now. For example, by using the knowledge of the subarchitecture, compilers can optimizing the code for Intel's MicroFusion rather than performing useless loop unrolling that actually degrades performance.
In all benchmarks on Phoronix since GCC 4.9, they showed -Os is almost always slower than -O2 and -O3.
https://www.phoronix.com/scan.php?page=article&item=gcc_49_o...
Linux kernel used to prefer -Os at everywhere, but now it has -O2 and -O3 as well. I think there is a measurable performance improvement in benchmarks in some cases.
https://github.com/torvalds/linux/blob/15f5db60a13748f44e5a1...