Once I was told I should prefer comparing with zero whenever possible, inside for
loops, instead of comparing to other values, because it’s faster. But I never knew why. And I decided to understand what actually happens.
These are two programs with the same result and their assembly code. The first one is a normal for
loop, the other one is a reversed for
(going from n
down to 0).
int main() { int n = 10; int s = 0; for (int i = 1; i <= n; ++i) { s += i; } }
main: push rbp mov rbp, rsp mov DWORD PTR [rbp-12], 10 mov DWORD PTR [rbp-4], 0 mov DWORD PTR [rbp-8], 1 .L3: mov eax, DWORD PTR [rbp-8] cmp eax, DWORD PTR [rbp-12] jg .L2 mov eax, DWORD PTR [rbp-8] add DWORD PTR [rbp-4], eax add DWORD PTR [rbp-8], 1 jmp .L3 .L2: mov eax, 0 pop rbp ret
int main() { int n = 10; int s = 0; for (int i = n; i > 0; --i) { s += i; } }
main: push rbp mov rbp, rsp mov DWORD PTR [rbp-12], 10 mov DWORD PTR [rbp-4], 0 mov eax, DWORD PTR [rbp-12] mov DWORD PTR [rbp-8], eax .L3: cmp DWORD PTR [rbp-8], 0 jle .L2 mov eax, DWORD PTR [rbp-8] add DWORD PTR [rbp-4], eax sub DWORD PTR [rbp-8], 1 jmp .L3 .L2: mov eax, 0 pop rbp ret
The for
loops are represented by the L3
labels.
For the normal for
loop, the i
variable (rbp-8) is loaded into the eax
registry, then the registry is compared to n
(rbp-12).
mov eax, DWORD PTR [rbp-8] cmp eax, DWORD PTR [rbp-12]
As for the reversed for
loop, i
is always compared to 0 and this can be done directly, without first copying it into the eax
registry.
cmp DWORD PTR [rbp-8], 0
So the difference is of one instruction, the first for
does an extra copy.
With O3 optimization level, comparing to 0 does not need the cmp
instruction.
Does this matter? I know too little assembly to have a good opinion on this, but it could matter When a Microsecond Is an Eternity. Otherwise, it would be early optimization and maybe confusing for others.