This is a great question, but I think you’ve fallen victim to the compiler’s dependency analysis.
The compiler only has to clear the high bits of
eax once, and they remain clear for the second version. The second version would have to pay the price to
xor eax, eax except that the compiler analysis proved it’s been left cleared by the first version.
The second version is able to “cheat” by taking advantage of work the compiler did in the first version.
How are you measuring times? Is it “(version one, followed by version two) in a loop”, or “(version one in a loop) followed by (version two in a loop)”?
Don’t do both tests in the same program (instead recompile for each version), or if you do, test both “version A first” and “version B first” and see if whichever comes first is paying a penalty.
Illustration of the cheating:
timer1.start(); double x1 = 2 * sqrt(n + 37 * y + exp(z)); timer1.stop(); timer2.start(); double x2 = 31 * sqrt(n + 37 * y + exp(z)); timer2.stop();
timer2 duration is less than
timer1 duration, we don’t conclude that multiplying by 31 is faster than multiplying by 2. Instead, we realize that the compiler performed common subexpression analysis, and the code became:
timer1.start(); double common = sqrt(n + 37 * y + exp(z)); double x1 = 2 * common; timer1.stop(); timer2.start(); double x2 = 31 * common; timer2.stop();
And the only thing proved is that multiplying by 31 is faster than computing
common. Which is hardly surprising at all — multiplication is far far faster than