Comparing to zero - Andrei Avram

Once I was told I should prefer comparing with zero whenever possible, inside for loops, instead of comparing to other values, because it’s faster. But I never knew why. And I decided to understand what actually happens.

These are two programs with the same result and their assembly code. The first one is a normal for loop, the other one is a reversed for (going from n down to 0).

int main() {
    int n = 10;
    int s = 0;
    for (int i = 1; i <= n; ++i)  {
        s += i;
    }
}

main:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-12], 10
        mov     DWORD PTR [rbp-4], 0
        mov     DWORD PTR [rbp-8], 1
.L3:
        mov     eax, DWORD PTR [rbp-8]
        cmp     eax, DWORD PTR [rbp-12]
        jg      .L2
        mov     eax, DWORD PTR [rbp-8]
        add     DWORD PTR [rbp-4], eax
        add     DWORD PTR [rbp-8], 1
        jmp     .L3
.L2:
        mov     eax, 0
        pop     rbp
        ret

int main() {
    int n = 10;
    int s = 0;
    for (int i = n; i > 0; --i) {
        s += i;
    }
}

main:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-12], 10
        mov     DWORD PTR [rbp-4], 0
        mov     eax, DWORD PTR [rbp-12]
        mov     DWORD PTR [rbp-8], eax
.L3:
        cmp     DWORD PTR [rbp-8], 0
        jle     .L2
        mov     eax, DWORD PTR [rbp-8]
        add     DWORD PTR [rbp-4], eax
        sub     DWORD PTR [rbp-8], 1
        jmp     .L3
.L2:
        mov     eax, 0
        pop     rbp
        ret

The for loops are represented by the L3 labels.

For the normal for loop, the i variable (rbp-8) is loaded into the eax registry, then the registry is compared to n (rbp-12).

mov     eax, DWORD PTR [rbp-8]
cmp     eax, DWORD PTR [rbp-12]

As for the reversed for loop, i is always compared to 0 and this can be done directly, without first copying it into the eax registry.

cmp     DWORD PTR [rbp-8], 0

So the difference is of one instruction, the first for does an extra copy.

With O3 optimization level, comparing to 0 does not need the cmp instruction.

Does this matter? I know too little assembly to have a good opinion on this, but it could matter When a Microsecond Is an Eternity. Otherwise, it would be early optimization and maybe confusing for others.

3 thoughts on “Comparing to zero”

It probably only has some significance if that load causes cache misses, but even then it’s a micro-optimization just for super critical systems.

The readability and expressing code intent by having “normal” comparisons is of greater value imho. These sort of things shouldn’t be your concern when writing code, but rather optimize if the profiler indicates a problem in that particular area.

Andrei says:

February 12, 2021 at 21:41

Sane thinking. It’s at least fun to dig into these cases.

Reply

This is compiler/hardware depended. Modern systems have usually a lot of registers for this use, so they don’t load up the max value all the time, just keep it in some register for comparison.
One can argue that comparison with 0 is faster, specially when the hardware have it specially implement for that purpose

3 thoughts on “Comparing to zero”

Leave a Reply Cancel reply