Andrei Avram

Pass function to thread (job scheduler update)

A simplification to the job scheduler from the previous post is to pass the job function to the thread managing the job call, instead of making a shared pointer and capture it in the lambda.

The schedule method changes to:

void Scheduler::schedule(const Job f, long n) {
    std::unique_lock<std::mutex> lock(this->mutex);
    condition.wait(lock, [this] { return this->count < this->size; });
    count++;

    std::thread thread{
            [this](const Job f, long n) {
                std::this_thread::sleep_for(std::chrono::milliseconds(n));

                try {
                    (*f)();
                } catch (const std::exception &e) {
                    this->error(e);
                } catch (...) {
                    this->error(std::runtime_error("Unknown error"));
                }

                condition.notify_one();
                count--;
            }, f, n
    };
    thread.detach();
}

A job scheduler in C++

Not long ago I started writing some C++ code, and a task that I enjoyed implementing was a very basic job scheduler (idea from dailycodingproblem.com). I’m sure there are “holes” to be filled in my implementation regarding performance, concurrency, and general correctness. This is an early dive into the language and this post is mostly for me, to explain some things to myself.

The scheduler takes a job (function) and a time (in milliseconds) and runs the job after the given time.

The first thing was to define a function pointer as the job type.

using Job = void (*)();

Another callback that I’ll be using is one to report job errors, which takes an exception as an argument.

using Error = void (*)(const std::exception &);

The scheduler class constructor takes a size and an error callback (to report errors). The size is the maximum number of jobs accepted until scheduling blocks and waits for a job to be finished.
An error callback is required. The first measure to ensure this is to delete the constructor that takes null for the error callback, which performs a compile-time check. Explicitly passing null will not be allowed, but a pointer that is null will be checked at runtime.

Scheduler(size_t size, Error error);
Scheduler(size_t size, nullptr_t) = delete;

Continue reading A job scheduler in C++

Docker and iptables

The firewall must be configured properly to prevent unwanted access which could lead to data loss and exploits. UFW allows you to quickly close and open ports. The following configuration closes all incoming ports and opens 22 (SSH), 80 (HTTP) and 443 (HTTPS):

ufw default deny incoming
ufw allow OpenSSH
ufw allow http
ufw allow https
ufw enable

But Docker will update iptables when you bind a container port to the host, opening the port for public access. To prevent this, you could bind the port to an internal address (private or 127.0.0.1). Another way is telling Docker to never update iptables by setting the “iptables” option to “false” in /etc/docker/daemon.json. This file should contain a JSON string: { “iptables”: false }.

This can be automated as:

apt install -y ufw
ufw default deny incoming
ufw allow OpenSSH
ufw allow http
ufw allow https
ufw --force enable # --force prevents interaction

apt install -y jq
touch /etc/docker/daemon.json
[[ -z $(cat /etc/docker/daemon.json) ]] && echo "{}" > /etc/docker/daemon.json
echo $(jq '.iptables=false' /etc/docker/daemon.json) > /etc/docker/daemon.json

But there are cases when preventing Docker from manipulating iptables can be too much and DNS won’t be resolved to some containers. The issue I met was when building a container from the NGINX Alpine image:

---> Running in 71130dd103f3
fetch http://dl-cdn.alpinelinux.org/alpine/v3.9/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.9/community/x86_64/APKINDEX.tar.gz
ERROR: http://dl-cdn.alpinelinux.org/alpine/v3.9/main: temporary error (try again later)
WARNING: Ignoring APKINDEX.b89edf6e.tar.gz: No such file or directory

For containers like this one you could use –network host or manually add iptables rules, which probably is not the best idea because they could change in the future. These are the rules I’ve seen Docker add:

iptables -N DOCKER
iptables -N DOCKER-ISOLATION-STAGE-1
iptables -N DOCKER-ISOLATION-STAGE-2
iptables -A FORWARD -j DOCKER-USER
iptables -A FORWARD -j DOCKER-ISOLATION-STAGE-1
iptables -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
iptables -A FORWARD -o docker0 -j DOCKER
iptables -A FORWARD -i docker0 ! -o docker0 -j ACCEPT
iptables -A FORWARD -i docker0 -o docker0 -j ACCEPT
iptables -A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
iptables -A DOCKER-ISOLATION-STAGE-1 -j RETURN
iptables -A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
iptables -A DOCKER-ISOLATION-STAGE-2 -j RETURN
iptables -A DOCKER-USER -j RETURN
iptables -t nat -A POSTROUTING ! -o docker0 -s 172.17.0.0/16 -j MASQUERADE

I did not specifically need to stop Docker from updating iptables as I was just experimenting, so I just let it do its job. I’m publishing (-p) the ports I need for public access and exposing (- -expose) the private ones.

Go bookmarks

I like reading again and again some resources which make me understand Go better each time. Some of them are worth being saved as bookmarks, which is what I’m doing right here. I wish I had known some of them a few years ago.

Why:

Intro:

Concurrency and memory model:

Channels:

Profiling:

Scheduling, stacks, pointers, memory:

Algorithms, patterns, tools:

People for people:

PHP 7.4 was released today

I was waiting for PHP 7.4 to be released. The most exciting feature for me is Typed Properties. Being released today, I was eager to compile it just to see that feature in action (and I’m also curious about Opcache Preloading).

I compiled it inside a Docker container and wrote a small test file.

Dockerfile:

FROM ubuntu:18.04

ARG VERSION=7.4.0

RUN apt update
RUN apt install -y wget gcc make automake pkgconf libxml2-dev libsqlite3-dev

RUN wget https://www.php.net/distributions/php-$VERSION.tar.gz
RUN tar xzfv php-$VERSION.tar.gz

WORKDIR php-$VERSION

RUN ./configure \
    --with-pdo-mysql \
    --with-mysqli

RUN make -j $(nproc)
#RUN make test # probably fails for now

RUN make install
RUN php -v

Compile PHP:

docker build . -t php740build

Save this as test.php:

<?php declare(strict_types=1);

class Calc
{
    /**
     * @var float[]
     */
    private array $nums;

    /**
     * Calc constructor.
     * @param float $n
     * @param float[] $nums
     */
    public function __construct(float $n, float ...$nums)
    {
        $this->nums = [$n, ...$nums];
    }

    /**
     * @return float
     */
    public function sum(): float
    {
        return array_sum($this->nums);
    }
}

$calc = new Calc(1, 2, 3);

echo $calc->sum();
echo "\n";

Run the test:

docker run --rm -v $PWD/test.php:/test.php -ti php740build php /test.php

It’s probably better to wait a while for some patches after being tested in the wild wild web.

Go Future with Reflection

While exercising Go with various Future implementations, at one moment reflection hit me. Using reflection to inspect types beats the point of strong types, but it never hurts to play.

I want to use any function within a Future and then call it asynchronous and wait for results. I could have a function to create the Future which holds the user function. Not having generics yet, any function could be passed by interface{}, then called by reflection.

So the user function is passed to a function which returns another function that can be called with the user function arguments. The user function will be executed on a new routine.

// Callable is the function to call in order to start running the passed function.
type Callable func(args ...interface{}) *Future

// New returns a Callable.
func New(function interface{}) Callable {

For a sum function, the usage is:

sum := func(a, b int) (int, error) {
   return a + b, nil
}
future := futurereflect.New(sum)(1, 2)

result, err := future.Result()

The Future should allow waiting for the user function to be executed and get its result.

type Future struct {
   functionType  reflect.Type
   functionValue reflect.Value
   args          []interface{}
   wait          chan struct{}
   result        interface{}
   err           error
}

// Wait blocks until function is done.
func (f *Future) Wait() {
   <-f.wait
}

// Result retrieves result and error. It blocks if function is not done.
func (f *Future) Result() (interface{}, error) {
   <-f.wait
   return f.result, f.err
}

The rest is just implementation, but it feels like pure madness. I did it for fun, to experiment, there are cases missing. And a language should not be pushed where it doesn’t belong. Use each language in its style.

Run the code on Go Playground.

Continue reading Go Future with Reflection

CGO examples

Nothing new or special regarding Golang CGO. I have been exploring it lately and it took me some time to understand its deep aspects, mostly on passing Go callbacks to C. And I wanted to keep close some full examples.

I created a small repository on GitHub which shows interactions of Go code with a C library (included) by passing and/or receiving numbers, strings, byte arrays, structs, enums, and calling Go functions from the C library.

Sources of inspiration and learning:

See the GitHub repository with the CGO examples and compile the code yourself on your machine or inside the Docker environment you’ll find there.

Custom Go package manager

I was working on a Go project with no modules support and introducing the modules was not an option at the time. All dependencies were installed by go get. When breaking changes were introduced in some packages, the builds started failing.

A simple way to quickly fix the issue was to write a very basic package manager which could get dependencies by the specified version (which go get does not support in GOPATH mode).

A package list can be a simple file which declares packages line by line. If no version specified, go get will be used. If a release version, commit hash or branch is specified, a git clone and checkout on the specified version will be invoked.

github.com/gorilla/mux@v1.7.2
github.com/go-playground/validator
github.com/apixu/apixu-go@0fe1e52

I wrote a Bash script which parses the file and handles the process for each of the two cases.

#!/usr/bin/env bash

FILE=$PWD/packages

[[ -z "${GOPATH}" ]] && echo "GOPATH not set" && exit 1;

GOSRC=${GOPATH}/src
[[ ! -d "${GOSRC}" ]] && echo "${GOSRC} directory does not exist" && exit 1;

while IFS= read -r line
do
    IFS=\@ read -a fields <<< "$line"
    pkg=${fields[0]}
    v=${fields[1]}

    echo ${GOSRC}/${pkg}

    if [[ -z ${v} ]]
    then
        go get -u ${pkg}
    else
        PKGSRC=${GOSRC}/${pkg}
        rm -rf ${PKGSRC}
        mkdir -p ${PKGSRC}
        cd ${PKGSRC}
        git clone https://${pkg} .
        git checkout ${v}
        cd - > /dev/null
    fi

    echo
done < "${FILE}"

Updates on PHP performance increase

While waiting for PHP 7.4 production release and its strong type new feature, I was wondering how the performance increased. A while ago I wrote a small stress test which is not the most reliable in all cases, it just points out some major changes that started occurring in PHP 7.

Since that test, PHP 7.3 was released and now there is a release candidate for 7.4. While the memory usage is the same for my test case since 7.0, the execution time looks better with every new version.

#!/usr/bin/env bash

test=$(cat << 'eot'
$time = microtime(true);

$array = [];
for ($i = 0; $i < 10000; $i++) {
    if (!array_key_exists($i, $array)) {
        $array[$i] = [];
    }

    for ($j = 0; $j < 1000; $j++) {
        if (!array_key_exists($j, $array[$i])) {
            $array[$i][$j] = true;
        }
    }
}

echo sprintf(
    "Execution time: %f seconds\nMemory usage: %f MB\n\n",
    microtime(true) - $time,
    memory_get_usage(true) / 1024 / 1024
);
eot
)

versions=( 5.6 7.0 7.1 7.2 7.3 7.4-rc )

for v in "${versions[@]}"
do
    cmd="docker run --rm -ti php:${v}-cli-alpine php -d memory_limit=2048M -r '$test'"
    sh -c "echo ${v} && ${cmd}"
done

Results (ignore the absolute values, just watch the differences):

5.6
Execution time: 4.573347 seconds
Memory usage: 1379.250000 MB

7.0
Execution time: 1.464059 seconds
Memory usage: 360.000000 MB

7.1
Execution time: 1.315205 seconds
Memory usage: 360.000000 MB

7.2
Execution time: 0.653521 seconds
Memory usage: 360.000000 MB

7.3
Execution time: 0.614016 seconds
Memory usage: 360.000000 MB

7.4-rc
Execution time: 0.528052 seconds
Memory usage: 360.000000 MB

Merge JSON arrays by key in PostgreSQL

UPDATE:

While testing more cases, I’ve come up with a solution that covers my needs better. Instead of taking distinct values, I used a FULL JOIN between the two data sets:

SELECT json_agg(data) FROM (
	SELECT coalesce(new.id, old.id) AS id, coalesce(new.value, old.value) AS value
	FROM jsonb_to_recordset('[{"id": 1, "value": "A"}, {"id": 2, "value": "B"}, {"id": 3, "value": "C"}, {"id": 6, "value": "6"}]') AS old(id int, value text)
	FULL JOIN jsonb_to_recordset('[{"id": 1, "value": "NEW A"}, {"id": 2, "value": "NEW B"}, {"id": 5, "value": "5"}]') AS new(id int, value text)
	ON old.id = new.id
) data

Given a JSON column which holds an array of objects, each object having “id” and “value” properties, I wanted to update the array with a new one, by merging the values by the “id” property.
If same id is present in an object of both arrays, the new one will be used.

Old:

[
    {"id": 2, "value": "B"},
    {"id": 1, "value": "A"}
]

New:

[
    {"id": 1, "value": "X"},
    {"id": 3, "value": "C"},
    {"id": 4, "value": "C"}
]

Result:

[
    {"id":1, "value":"X"},
    {"id":2, "value":"B"},
    {"id":3, "value":"C"},
    {"id":4, "value":"C"}
]

And I wanted to do this in one database roundtrip, not take the data in back end, merge it, and send it back to the database.

First, I transformed the array objects into records and prioritized the new array by giving its values a position; and I put both sets together:

SELECT 1 AS position, * FROM json_to_recordset('[{"id": 2, "value": "B"}, {"id": 1, "value": "A"}]') AS src(id int, value text) -- old
UNION ALL
SELECT 0 AS position, * FROM json_to_recordset('[{"id": 1, "value": "X"}, {"id": 3, "value": "C"}, {"id": 4, "value": "C"}]') AS dst(id int, value text) -- new

Then I extracted only the id and the value, the position being used only to sort data:

SELECT id, value FROM (
	SELECT 1 AS position, * FROM json_to_recordset('[{"id": 2, "value": "B"}, {"id": 1, "value": "A"}]') AS src(id int, value text) -- old
	UNION ALL
	SELECT 0 AS position, * FROM json_to_recordset('[{"id": 1, "value": "X"}, {"id": 3, "value": "C"}, {"id": 4, "value": "C"}]') AS dst(id int, value text) -- new
) AS ordered ORDER BY position

Next step, I took the distinct rows by id:

SELECT DISTINCT ON (id) id, value FROM (
	SELECT id, value FROM (
		SELECT 1 AS position, * FROM json_to_recordset('[{"id": 2, "value": "B"}, {"id": 1, "value": "A"}]') AS src(id int, value text) -- old
		UNION ALL
		SELECT 0 AS position, * FROM json_to_recordset('[{"id": 1, "value": "X"}, {"id": 3, "value": "C"}, {"id": 4, "value": "C"}]') AS dst(id int, value text) -- new
	) AS ordered ORDER BY position
) data

And finally, the result needs to be converted to JSON:

SELECT json_agg(uniqueid) FROM (
	SELECT DISTINCT ON (id) id, value FROM (
		SELECT id, value FROM (
			SELECT 1 AS position, * FROM json_to_recordset('[{"id": 2, "value": "B"}, {"id": 1, "value": "A"}]') AS src(id int, value text) -- old
			UNION ALL
			SELECT 0 AS position, * FROM json_to_recordset('[{"id": 1, "value": "X"}, {"id": 3, "value": "C"}, {"id": 4, "value": "C"}]') AS dst(id int, value text) -- new
		) AS ordered ORDER BY position
	) data
) uniqueid