Never trust user input. But who is the user?

Never trust user input

Never trust user input (or “Never trust your users”) is a well-known statement in software engineering. It’s about making sure that whatever information gets into your application/service/library/system will not cause you any issues (data validation).

Nobody can guarantee you what you will be sent. Data can be intentionally or unintentionally broken, leading to inconvenient situations or absolute madness with services being down for a long time (e.g.: the 2024 CrowdStrike incident; see technical root cause analysis here).

But who is the user?

Often, the user is considered to be someone outside your project. Someone who is using your project. The client who:

    • makes an HTTP request to your web server
    • or passes a file path as an argument to your CLI application
    • or makes a call to one of your APIs’ functions.

Imagine the following situation:

    • Your application/service/library/system has multiple components that communicate with each other.
    • Not all of them are facing the end user.
    • Given
      • Two components A (user-facing/public) and B (internal/private).
    • When
      • A uses B
      • and B gets input from A.
    • Then
      • A is the user of B, not your end user who uses the application
      • and B does not know where the input is coming from.

You, as the engineer who wrote these components, know how they are used. But you are a human and mistakes are just around the corner. Most of the time, B must validate the input as if it were a public component because you must… Continue reading Never trust user input. But who is the user?

To break or not to break… encapsulation

“The devil is in the details” 

A particularity of the C++ data validation concept I wrote about is passing that wrapped_value object as an argument to a function. The reason is for that wrapped value to behave like the type it wraps so that it has a natural usage. I should not know the actual value is hidden by a level of indirection, making it easy to control any mutation.

To achieve that feature, I have used the user-defined conversion function and it went smooth. I can have a wrapped value and pass it as an argument (by value or const reference) to a function:

int inc_by_value(int v) { return v + 1; }
int inc_by_const_ref(const int& a) { return a + 1; }

msd::wrapped_value<int, UpperBoundLimiter<int, 10>> value = 2;
assert(inc_by_value(value) == 3);
assert(inc_by_const_ref(value) == 3);

Pass by non-const reference

The user-defined conversion I initially implemented is the const reference overload:

operator const Value&() const noexcept { return value_; }

That’s why I can pass the wrapped value by value and const reference. To pass it as a non-const reference, I have to implement the non-const reference overload:

template <typename Value, typename... Wrappers>
class wrapped_value {
    // ...
    
    operator Value&() noexcept { return value_; }

    // ...
}

And I can pass by reference and mutate the value:

void inc_by_ref(int& a) { ++a; }

msd::wrapped_value<int, UpperBoundLimiter<int, 10>> value = 2;
inc_by_ref(value);
assert(value == 3);

or

void update(int& a) { a = 20; }

update(value);
assert(value == 10); // should be 10 because of the UpperBoundLimiter

But the last assertion fails: Assertion `value == 10′ failed. Continue reading To break or not to break… encapsulation

Another type of data validation in C++

A need that I met one day was to make sure some values are being properly controlled no matter who changes them.

struct Input {
    int a;
    int b;
};
    • a must be maximum 50 – if a greater value is assigned, 50 must be set
    • b must be minimum 50 – if a lower value is assigned, 50 must be set
    • if b is greater than 100, the flow cannot continue, the execution must be stopped

The fields in the struct can be changed by multiple layers of the application, so their values must be checked after each change. Possible solutions:

    • an API that assigns and verifies the values: each layer must use the API
    • an API that only verifies the values: after each layer updates the values, the API must be used by the caller
    • setters defined on the struct: SetA(int), SetB(int)

The API solutions require extra work; someone must use the API and not forget about it otherwise bugs could be introduced. The setters solution forces the usage of those methods, but I don’t want to rely on OOP here; instead, I want to go for a data-oriented approach and keep my struct as simple as possible.

I would like for a property of the struct to be configured in such a way that every time it’s being assigned a value, that value is verified against some requirements. In larger projects with layers that need to mutate some data passed around, it might be safer to go this way instead of relying on people to remember to explicitly do something.

How it looks

Someone told me they would like to see something like this:

struct Input {
    wrapped_value<int> a;
    wrapped_value<int> b;
};

wrapped_value is a wrapper that receives any value assigned to the property it wraps and makes sure it’s valid. The type of the property is passed as a template argument to the wrapper.

a and b should behave like their original types. Wrapping them, they are no longer integers, but wrapped_values. Continue reading Another type of data validation in C++