## Comparing to zero

Once I was told I should prefer comparing with zero whenever possible, inside `for` loops, instead of comparing to other values, because it’s faster. But I never knew why. And I decided to understand what actually happens.

These are two programs with the same result and their assembly code. The first one is a normal `for` loop, the other one is a reversed `for` (going from `n` down to 0).

```int main() {
int n = 10;
int s = 0;
for (int i = 1; i <= n; ++i)  {
s += i;
}
}```
```main:
push    rbp
mov     rbp, rsp
mov     DWORD PTR [rbp-12], 10
mov     DWORD PTR [rbp-4], 0
mov     DWORD PTR [rbp-8], 1
.L3:
mov     eax, DWORD PTR [rbp-8]
cmp     eax, DWORD PTR [rbp-12]
jg      .L2
mov     eax, DWORD PTR [rbp-8]
jmp     .L3
.L2:
mov     eax, 0
pop     rbp
ret
```

```int main() {
int n = 10;
int s = 0;
for (int i = n; i > 0; --i) {
s += i;
}
}```
```main:
push    rbp
mov     rbp, rsp
mov     DWORD PTR [rbp-12], 10
mov     DWORD PTR [rbp-4], 0
mov     eax, DWORD PTR [rbp-12]
mov     DWORD PTR [rbp-8], eax
.L3:
cmp     DWORD PTR [rbp-8], 0
jle     .L2
mov     eax, DWORD PTR [rbp-8]
sub     DWORD PTR [rbp-8], 1
jmp     .L3
.L2:
mov     eax, 0
pop     rbp
ret
```

The `for` loops are represented by the `L3` labels.

For the normal `for` loop, the `i` variable (rbp-8) is loaded into the `eax` registry, then the registry is compared to `n `(rbp-12).

```mov     eax, DWORD PTR [rbp-8]
cmp     eax, DWORD PTR [rbp-12]
```

As for the reversed `for` loop, `i` is always compared to 0 and this can be done directly, without first copying it into the `eax` registry.

```cmp     DWORD PTR [rbp-8], 0
```

So the difference is of one instruction, the first `for` does an extra copy.

With O3 optimization level, comparing to 0 does not need the `cmp` instruction.

Does this matter? I know too little assembly to have a good opinion on this, but it could matter When a Microsecond Is an Eternity. Otherwise, it would be early optimization and maybe confusing for others.

## A more decoupled approach on static polymorphism

This is a follow-up of the Executing tasks based on static polymorphism article, which I recommend to be read for the full picture of what is about to come, as it offers details on why I study this approach and how I implemented it (compile-time iteration of a tuple).

My first attempt on C++ compile-time polymorphism is designed around a `task` struct. The requirement for a task is to implement an `Execute` method that will perform some work. This requires that the task struct is mine. Otherwise, if there’s some information provided by another library through some struct, I can wrap it in a task that has the required `Execute` method.

Inspired by some of Sean Parent’s talks about runtime polymorphism and some of its issues, I found another way of implementing static polymorphism. One that does not have any requirement for the input structs; they don’t need to implement a method nor they must be wrapped in other structs.

Along with the requirements in the previous article, I add these ones:

• A container with multiple objects of different types so I have a list of items
• For each object in the container, something different must be performed depending on its type (by iterating all objects, not handling it manually)
• Objects in the container are provided by someone else and cannot be changed
• C++11/14 compatible

This approach starts with some input structs in a tuple (the container); references to the objects constructed of the structs to prevent copies:

```namespace input {
struct A {
int a;
};

struct B {
int b;
};
}

using Objects = std::tuple<input::A&, input::A&, input::B&, input::A&>;
```

## Understanding reinterpret_cast

It’s recently that I needed to properly understand reinterpret_cast, which is a method of converting between data types. This type of cast reinterprets the value of a variable of one type as another variable of a different type. It is efficient because it does not copy the value. What it does copy is the pointer to the value. So the two variables can be seen as different projections over the same memory zone.

The good

A use case for `reinterpret_cast` is transporting data through a buffer. The data model is a well-defined struct that could be transferred between different systems as bytes buffer (which could be a char array).

```struct Input {
int a;
int b;
};

using Buffer = char[8];
```

The `Input` struct can be casted to the `Buffer` before being sent over the wire.

```Input in{};
in.a = 5;
in.b = 7;

auto buffer = reinterpret_cast<Buffer*>(&in);
```

Then the buffer, when received, can be converted back to the `Input` struct.

```auto input = reinterpret_cast<Input*>(buffer);
assert(input->a == 5);
assert(input->b == 7);
```

Update: As I was told, the sizes of the two types should be equal. This prevents possible data loss.

```static_assert(sizeof(Input) == sizeof(Buffer), "input and buffer size mismatch");
```

Casting implies a pointer copy, which is very cheap. Given a cast from a buffer to a struct:

```struct Input {
int a;
int b;
};

int main()
{
int buffer[2] = {5, 7};

auto input = reinterpret_cast<Input*>(buffer);
}
```

The generated assembly is:

```main:
push    rbp
mov     rbp, rsp
mov     DWORD PTR [rbp-16], 5
mov     DWORD PTR [rbp-12], 7
lea     rax, [rbp-16]
mov     QWORD PTR [rbp-8], rax
mov     eax, 0
pop     rbp
ret
```

## Executing tasks based on static polymorphism

If you want to skip reading and get to the code, then see some kind of C++ static polymorphism in the GitHub repository.

I’m a fan of tasks. I enjoy each time I implement any kind of system that executes tasks, from very simple ones (based on a list of tasks executed on a single thread) to multithreaded ones with features like cancellation and dynamic concurrency level.

My first encounter with the idea of having a list of objects that I iterate and call a function for each one of them was when I had to implement a feature that dynamically restricted users’ access to some content based on various conditions. You can think about a lot of `if` statements. I don’t like `if` statements, I struggle to avoid them because each one of them brings new responsibility, test cases, and maintenance costs. And I really enjoy seeing product and management people happy when their requirements are implemented at reasonable times. And being productive is a lot about code.

A way to decouple code is, of course, to separate concerns. And if you’re in a case with multiple concerns serving the same concept, you could use a polymorphic approach. You can have an interface for a task, some tasks implementations, and something to run all the tasks when needed. Everything you’ll see in this article is a light version because I’m focusing on the idea, not on details.

```#include <array>
#include <iostream>
#include <memory>

public:
virtual void Execute() = 0;
};

template <typename T>
struct Executor {

void Execute()
{
}
}
};

class A : public Task {
public:
void Execute() override { std::cout << 1; }
};

class B : public Task {
public:
void Execute() override { std::cout << 2; }
};

int main()
{
std::make_unique<A>(),
std::make_unique<B>(),
};

executor.Execute();
}
```

Now I’ll add some constraints for learning purposes. Sometimes performance matters and you can’t always use the latest C++ standards, so I’ll go with these:

• C++ 11
• No heap allocations
• Minimum stack allocations
• No virtual methods

And I’m entering the world of templates because this can help me achieve compile-time polymorphism (C++ static polymorphism), and that’s what I’m aiming for. Continue reading Executing tasks based on static polymorphism

## Explaining memory alignment and undefined behavior to myself

Memory alignment is important for the CPU to be able to read data. But it’s not always critical that the alignment is optimal.

A struct like the one below is aligned but comes with a size cost because the size of an int is larger than the size of a short and some padding is added.

```struct Model {
int a;
short b;
int c;
char d;
}
```

Probably in critical real-time software I would need to better align it by moving the second int above the first short value. And I will get a smaller size of the struct because the padding is not needed.

```struct Model {
int a;
int c;
short b;
char d;
}
```

I see a size of 16 bytes for the first version and 12 bytes for the second.

Another special case is serialization. When I want to transport data of a struct through a buffer, alignment is important because different compilers can handle padding in different ways. If I have a short (2 bytes) and an int (4 bytes), padding of 2 bytes will be added. But padding varies among compilers, so I should set the alignment of a struct to 1 byte, and thus memory is contiguous, with no padding. Therefore the compiler knows to read 2 bytes for the short and the following 4 bytes for the int.

```#pragma pack(push, 1)
struct Model {
short exists;
int items[2];
} model;
#pragma pack(pop)
```

## Compile-time array with runtime variable size

The code in this post is bad. Its purpose is only to show what I want to say.

An array has fixed size and compile-time allocation and it’s preferred in some real-time situations. If you want a variable number of elements, you could choose a vector. A vector has runtime allocation on the heap, which cannot always offer the predictability and performance required by some applications.

But what if you need both? Compile-time allocation and a different number of elements each time a function is called with some data from a source. You can use an array and a variable for the number of elements:

```// Declare the maximum number of elements and the array
constexpr std::size_t max_size = 10;
std::array<int, max_size> elements{4, 5};

// Each call, you will get the elements array and
// the number of elements sent
std::size_t number_of_elements = 2;

// You should make sure the number of elements is not greater
// than the maximum
if (number_of_elements > max_size) {
throw std::out_of_range{"too many elements"};
}

auto work = [](const std::array<int, max_size>& elements, const std::size_t number_of_elements) {
// Then iterate through all the elements passed this time
for (auto it = elements.begin(); it != elements.begin() + number_of_elements; ++it) {
// ...
}
};

work(elements, number_of_elements);
```

Some problems with this approach:

• all functions that need this array must include the number of elements in their signature
• the number of elements could be altered by mistake
• each time you iterate the array you must specify which is its end

It would great to have a container similar to an array/vector that would prevent the issues above. Maybe there is already an implementation for this but I didn’t find it. So I wrote a few lines which partially implement this container. The idea is to have perfect integration with all STD algorithms, to be able to use it like it was an array.

My container is far from the idea of perfect, it just shows some features. I’m actually wrapping an STD array and using two iterators: one for writing data into the array and one for reading. The usage is pretty simple. Continue reading Compile-time array with runtime variable size

## Polymorphism

Polymorphism is one of the principles I always guide myself by. It is, for me, a way of thinking. It always reminds me that any piece of code will be replaced one day by another. Or there will be other similar pieces that will be used in some cases.

Things will always change and many times I have no idea what will come. I could spend time trying to guess new situations (which can be a waste of resources) or I could be prepared for when the time comes.

I always think about each entity/object/model/structure of my application. What does it represent? Could there be any other similar entities? Could there be more entities exactly like it, no just one of it? What is its relation to other entities?

Let’s say I got to the need of more similar entities. How will I pass them to functions? What code will I change if I want to change only one of them, add a new one or remove an old one? How could I implement the behavior differences between them? If X then do this, if Y then do this, if Z then do this?

I know it’s easy to just throw some code, some if statements, and to duplicate some code because I need just one quick thing to do a little different. Why should I think of design and architecture? I just need some code to do something. And this is how projects end up, in weeks, months, or years, being very hard to maintain and understand. It’s always “just this one thing”, but 1 + 1 + 1 + 1 + 1 + 1 is 5. Oh, no, it’s 6.

It took me a lot of time to see these things and the learning never stops, but it pays off. I often read and practice to find better ways of understanding my data. How data is modeled is one of the most important aspects, because it will affect the entire project. The extra time invested now will replace the much more time required each time I need to change something.

Even small things can and should be prepared for the future and, if I have a keep-things-simple mindset, I don’t let myself fall into over-engineering. I don’t implement the future, I’m just ready for it. Are you?

Threads synchronization is a common task in multithreading applications. You cannot get away without some form of protecting the data that is accessed from multiple threads. Some of the concepts of protecting data are mutexes and atomic variables, and they are common for programming languages that support multithreading.

There is another concept that offers the same features in a different way. Instead of requiring you to explicitly protecting data, it forces you to think about how data “flows” through your application and implicitly on the threads you’re using. This is what a channel is for and one of the languages that offer it is Go. A channel feels very natural to use, hiding a lower-level implementation of using data across threads.

And this is something I wanted to implement in C++ when I found out what the standard library offers for multi-threading. Why? Just to practice thread-safe C++. A channel has a far more complex and different implementation than you’ll find here. What I have done is a synchronized queue with a channel feeling.

I wanted to have a container that is very easy to use. Data should get in on some threads and come out on some other threads, and the operations must be thread-safe:

```int in = 1;
in >> channel;

int out = 0;
out << channel; // out is 2
```

This is the most common and simple use case.

Another common situation is to continuously read data from a channel:

```while (true) {
out << channel;
}
```

Or, for a better C++ approach, using a range-based for loop:

```for (auto out : channel) {
// do something with "out"
}
```

## GCC bug in noexcept operator

When something goes wrong, the first thing is to ask myself what did I do wrong. I don’t like just throwing the blame. And when I saw the not declared in this scope error related to the noexcept operator,  I was sure I was missing something. Maybe I had accidentally used a compiler extension, maybe I used something that’s undefined behavior. But I didn’t think it could be a GCC noexcept compiler bug.

I have a container that encapsulates a vector, for which I wanted to have a noexcept swap method. Ignore the implementation itself, the idea is I wanted to declare the method noexcept if the swap operations on the items I wanted to swap are also noexcept. It can be any other case that implies noexcept.

```#include <vector>
#include <utility>
#include <cassert>

struct Container {
std::vector<int> elements{};
std::size_t count{};

void swap(Container &c) noexcept(noexcept(elements.swap(c.elements)) && noexcept(std::swap(count, c.count))) {
elements.swap(c.elements);
std::swap(count, c.count);
}
};

int main() {
Container a{{1, 2}, 2};
Container b{{3, 4, 5}, 3};

a.swap(b);

assert(a.elements == (std::vector<int>{3, 4, 5}));
assert(a.count == 3);

assert(b.elements == (std::vector<int>{1, 2}));
assert(b.count == 2);
}```

This tells that Container’s swap is noexcept if both swaps of elements and count are noexcept:

`void swap(Container &c) noexcept(noexcept(elements.swap(c.elements)) && noexcept(std::swap(count, c.count)));`

MSVC and Clang compile this, but GCC needs a newer version because on older ones it has the bug [DR 1207] “this” not being allowed in noexcept clauses, which I’ve found in this discussion.

If you see one of the following errors, try to update your compiler:

• ‘elements’ was not declared in this scope
• invalid use of incomplete type ‘struct Container’
• invalid use of ‘this’ at top level

While studying STD algorithms in C++, one simple exercise I did was masking an email address. Turning johndoe@emailprovider.tld into j*****e@emailprovider.tld, considering various cases like very short emails and incorrect ones (one could impose a precondition on the input, that it must be a valid email address to provide a valid output, but for this exercise, I wanted some edge cases).

To know what kinds of inputs I’m dealing with and what the corresponding valid outputs should be, I’ll start with the test data:

```const std::map<std::string, std::string> tests{
{"johndoe@emailprovider.tld", "j*****e@emailprovider.tld"},
{"jde@emailprovider.tld",     "j*e@emailprovider.tld"},
{"jd@emailprovider.tld",      "**@emailprovider.tld"},
{"j@emailprovider.tld",       "*@emailprovider.tld"},
{"@emailprovider.tld",        "@emailprovider.tld"},
{"wrong",                     "w***g"},
{"wro",                       "w*o"},
{"wr",                        "**"},
{"w",                         "*"},
{"",                          ""},
{"@",                         "@"},
};```

Besides solving the task itself, I was also curious about an aspect: What would be the differences between an implementation using no STD algorithms and one using various STD algorithms? I followed how the code looks and how it performs.

The first approach was the classic one, using a single iteration of the input string, during which each character is checked to see if it should be copied to the output as is or it should be masked. After the iteration, if the character @ was not found, the propper transformation is done.

```std::string mask(const std::string &email, const char mask) {
if (email[0] == '@') {
return email;
}

bool hide = true;
bool is_email = false;

for (size_t i = 0; i < email.size(); ++i) {
if (email[i] == '@') {
is_email = true;
hide = false;

if (i > 2) {
masked[i - 1] = email[i - 1];
}
}