Once I was told I should prefer comparing with zero whenever possible, inside for
loops, instead of comparing to other values, because it’s faster. But I never knew why. And I decided to understand what actually happens.
These are two programs with the same result and their assembly code. The first one is a normal for
loop, the other one is a reversed for
(going from n
down to 0).
int main() { int n = 10; int s = 0; for (int i = 1; i <= n; ++i) { s += i; } }
main: push rbp mov rbp, rsp mov DWORD PTR [rbp-12], 10 mov DWORD PTR [rbp-4], 0 mov DWORD PTR [rbp-8], 1 .L3: mov eax, DWORD PTR [rbp-8] cmp eax, DWORD PTR [rbp-12] jg .L2 mov eax, DWORD PTR [rbp-8] add DWORD PTR [rbp-4], eax add DWORD PTR [rbp-8], 1 jmp .L3 .L2: mov eax, 0 pop rbp ret
int main() { int n = 10; int s = 0; for (int i = n; i > 0; --i) { s += i; } }
main: push rbp mov rbp, rsp mov DWORD PTR [rbp-12], 10 mov DWORD PTR [rbp-4], 0 mov eax, DWORD PTR [rbp-12] mov DWORD PTR [rbp-8], eax .L3: cmp DWORD PTR [rbp-8], 0 jle .L2 mov eax, DWORD PTR [rbp-8] add DWORD PTR [rbp-4], eax sub DWORD PTR [rbp-8], 1 jmp .L3 .L2: mov eax, 0 pop rbp ret
The for
loops are represented by the L3
labels.
For the normal for
loop, the i
variable (rbp-8) is loaded into the eax
registry, then the registry is compared to n
(rbp-12).
mov eax, DWORD PTR [rbp-8] cmp eax, DWORD PTR [rbp-12]
As for the reversed for
loop, i
is always compared to 0 and this can be done directly, without first copying it into the eax
registry.
cmp DWORD PTR [rbp-8], 0
So the difference is of one instruction, the first for
does an extra copy.
With O3 optimization level, comparing to 0 does not need the cmp
instruction.
Does this matter? I know too little assembly to have a good opinion on this, but it could matter When a Microsecond Is an Eternity. Otherwise, it would be early optimization and maybe confusing for others.
It probably only has some significance if that load causes cache misses, but even then it’s a micro-optimization just for super critical systems.
The readability and expressing code intent by having “normal” comparisons is of greater value imho. These sort of things shouldn’t be your concern when writing code, but rather optimize if the profiler indicates a problem in that particular area.
Sane thinking. It’s at least fun to dig into these cases.