So why does conventional wisdom say that compilers will, in the vast majority of the time, outperform programmers doing assembly by hand? It seems contradictory to me.
In that context, it's not very small, it's 20% (all instructions are register-to-register instructions, so they all have the same weight). It's huge.
Yes, there's the possibility that ecx is used elsewhere, and in that case, my second comment is irrelevant, because I was answering to the possibility that such big wart is to be expected from compilers because they crop up regularly.
But then again, it's unlikely that it's used elsewhere, because eax has the return value of the C snippet, there's nothing else to do, the function can return. So the original question remains: did this come from a C compiler? If yes, it's crappy code.
> In that context, it's not very small, it's 20% (all instructions are register-to-register instructions, so they all have the same weight). It's huge.
Do they? I put together two quick and dirty nonsense test programs this is option2:
int main (void) {
for (int i = 0; i < 1000000000; ++i) {
asm volatile (
".intel_syntax\n"
"mov eax, edi\n"
"sar eax, 31\n"
"add edi, eax\n"
"xor eax, edi\n"
:::);
}
return 0;
}
option1 has the extraneous mov ecx, eax, and then add with ecx.
I confirmed with objdump -d that the assembly hadn't been touched and that the loops were the same. On my otherwise mostly idle dual L5640 system and pinned to a single cpu (just in case), option1 consistently runs in 3.14 seconds and option2 consistently runs in 3.15 seconds.
Adding an extra zero, both option1 and option2 runs in 30.94-30.95 user seconds. The extraneous move doesn't seem to cost any actual time.
But if you look at your program that must go faster, and you see unnecessary moves in the hot section(s), go ahead and remove them, but don't be surprised if it doesn't change much.
If you went and did your whole program by hand, the debloating might also not change much. That's why there's a rule of thumb.
If you have the skill to make a change to the compiler so it can output a better sequence of instructions, I suspect thsat's pretty difficult, but it may make enough of a difference over a large number of programs to be worthwhile.