Once I was told I should prefer comparing with zero whenever possible, inside for loops, instead of comparing to other values, because it’s faster. But I never knew why. And I decided to understand what actually happens.
These are two programs with the same result and their assembly code. The first one is a normal for loop, the other one is a reversed for (going from n down to 0).
int main() {
int n = 10;
int s = 0;
for (int i = 1; i <= n; ++i) {
s += i;
}
}
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-12], 10
mov DWORD PTR [rbp-4], 0
mov DWORD PTR [rbp-8], 1
.L3:
mov eax, DWORD PTR [rbp-8]
cmp eax, DWORD PTR [rbp-12]
jg .L2
mov eax, DWORD PTR [rbp-8]
add DWORD PTR [rbp-4], eax
add DWORD PTR [rbp-8], 1
jmp .L3
.L2:
mov eax, 0
pop rbp
ret
int main() {
int n = 10;
int s = 0;
for (int i = n; i > 0; --i) {
s += i;
}
}
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-12], 10
mov DWORD PTR [rbp-4], 0
mov eax, DWORD PTR [rbp-12]
mov DWORD PTR [rbp-8], eax
.L3:
cmp DWORD PTR [rbp-8], 0
jle .L2
mov eax, DWORD PTR [rbp-8]
add DWORD PTR [rbp-4], eax
sub DWORD PTR [rbp-8], 1
jmp .L3
.L2:
mov eax, 0
pop rbp
ret
The for loops are represented by the L3 labels.
For the normal for loop, the i variable (rbp-8) is loaded into the eax registry, then the registry is compared to n (rbp-12).
mov eax, DWORD PTR [rbp-8] cmp eax, DWORD PTR [rbp-12]
As for the reversed for loop, i is always compared to 0 and this can be done directly, without first copying it into the eax registry.
cmp DWORD PTR [rbp-8], 0
So the difference is of one instruction, the first for does an extra copy.
With O3 optimization level, comparing to 0 does not need the cmp instruction.
Does this matter? I know too little assembly to have a good opinion on this, but it could matter When a Microsecond Is an Eternity. Otherwise, it would be early optimization and maybe confusing for others.