CGO examples

Nothing new or special regarding Golang CGO. I have been exploring it lately and it took me some time to understand its deep aspects, mostly on passing Go callbacks to C. And I wanted to keep close some full examples.

I created a small repository on GitHub which shows interactions of Go code with a C library (included) by passing and/or receiving numbers, strings, byte arrays, structs, enums, and calling Go functions from the C library.

Sources of inspiration and learning:

See the GitHub repository with the CGO examples and compile the code yourself on your machine or inside the Docker environment you’ll find there.

Custom Go package manager

I was working on a Go project with no modules support and introducing the modules was not an option at the time. All dependencies were installed by go get. When breaking changes were introduced in some packages, the builds started failing.

A simple way to quickly fix the issue was to write a very basic package manager which could get dependencies by the specified version (which go get does not support in GOPATH mode).

A package list can be a simple file which declares packages line by line. If no version specified, go get will be used. If a release version, commit hash or branch is specified, a git clone and checkout on the specified version will be invoked.

github.com/gorilla/mux@v1.7.2
github.com/go-playground/validator
github.com/apixu/apixu-go@0fe1e52

I wrote a Bash script which parses the file and handles the process for each of the two cases.

#!/usr/bin/env bash

FILE=$PWD/packages

[[ -z "${GOPATH}" ]] && echo "GOPATH not set" && exit 1;

GOSRC=${GOPATH}/src
[[ ! -d "${GOSRC}" ]] && echo "${GOSRC} directory does not exist" && exit 1;

while IFS= read -r line
do
    IFS=\@ read -a fields <<< "$line"
    pkg=${fields[0]}
    v=${fields[1]}

    echo ${GOSRC}/${pkg}

    if [[ -z ${v} ]]
    then
        go get -u ${pkg}
    else
        PKGSRC=${GOSRC}/${pkg}
        rm -rf ${PKGSRC}
        mkdir -p ${PKGSRC}
        cd ${PKGSRC}
        git clone https://${pkg} .
        git checkout ${v}
        cd - > /dev/null
    fi

    echo
done < "${FILE}"

Updates on PHP performance increase

While waiting for PHP 7.4 production release and its strong type new feature, I was wondering how the performance increased. A while ago I wrote a small stress test which is not the most reliable in all cases, it just points out some major changes that started occurring in PHP 7.

Since that test, PHP 7.3 was released and now there is a release candidate for 7.4. While the memory usage is the same for my test case since 7.0, the execution time looks better with every new version.

#!/usr/bin/env bash

test=$(cat << 'eot'
$time = microtime(true);

$array = [];
for ($i = 0; $i < 10000; $i++) {
    if (!array_key_exists($i, $array)) {
        $array[$i] = [];
    }

    for ($j = 0; $j < 1000; $j++) {
        if (!array_key_exists($j, $array[$i])) {
            $array[$i][$j] = true;
        }
    }
}

echo sprintf(
    "Execution time: %f seconds\nMemory usage: %f MB\n\n",
    microtime(true) - $time,
    memory_get_usage(true) / 1024 / 1024
);
eot
)

versions=( 5.6 7.0 7.1 7.2 7.3 7.4-rc )

for v in "${versions[@]}"
do
    cmd="docker run --rm -ti php:${v}-cli-alpine php -d memory_limit=2048M -r '$test'"
    sh -c "echo ${v} && ${cmd}"
done

Results (ignore the absolute values, just watch the differences):

5.6
Execution time: 4.573347 seconds
Memory usage: 1379.250000 MB

7.0
Execution time: 1.464059 seconds
Memory usage: 360.000000 MB

7.1
Execution time: 1.315205 seconds
Memory usage: 360.000000 MB

7.2
Execution time: 0.653521 seconds
Memory usage: 360.000000 MB

7.3
Execution time: 0.614016 seconds
Memory usage: 360.000000 MB

7.4-rc
Execution time: 0.528052 seconds
Memory usage: 360.000000 MB

Merge JSON arrays by key in PostgreSQL

UPDATE:

While testing more cases, I’ve come up with a solution that covers my needs better. Instead of taking distinct values, I used a FULL JOIN between the two data sets:

SELECT json_agg(data) FROM (
	SELECT coalesce(new.id, old.id) AS id, coalesce(new.value, old.value) AS value
	FROM jsonb_to_recordset('[{"id": 1, "value": "A"}, {"id": 2, "value": "B"}, {"id": 3, "value": "C"}, {"id": 6, "value": "6"}]') AS old(id int, value text)
	FULL JOIN jsonb_to_recordset('[{"id": 1, "value": "NEW A"}, {"id": 2, "value": "NEW B"}, {"id": 5, "value": "5"}]') AS new(id int, value text)
	ON old.id = new.id
) data

Given a JSON column which holds an array of objects, each object having “id” and “value” properties, I wanted to update the array with a new one, by merging the values by the “id” property.
If same id is present in an object of both arrays, the new one will be used.

Old:

[
    {"id": 2, "value": "B"},
    {"id": 1, "value": "A"}
]

New:

[
    {"id": 1, "value": "X"},
    {"id": 3, "value": "C"},
    {"id": 4, "value": "C"}
]

Result:

[
    {"id":1, "value":"X"},
    {"id":2, "value":"B"},
    {"id":3, "value":"C"},
    {"id":4, "value":"C"}
]

And I wanted to do this in one database roundtrip, not take the data in back end, merge it, and send it back to the database.

First, I transformed the array objects into records and prioritized the new array by giving its values a position; and I put both sets together:

SELECT 1 AS position, * FROM json_to_recordset('[{"id": 2, "value": "B"}, {"id": 1, "value": "A"}]') AS src(id int, value text) -- old
UNION ALL
SELECT 0 AS position, * FROM json_to_recordset('[{"id": 1, "value": "X"}, {"id": 3, "value": "C"}, {"id": 4, "value": "C"}]') AS dst(id int, value text) -- new

Then I extracted only the id and the value, the position being used only to sort data:

SELECT id, value FROM (
	SELECT 1 AS position, * FROM json_to_recordset('[{"id": 2, "value": "B"}, {"id": 1, "value": "A"}]') AS src(id int, value text) -- old
	UNION ALL
	SELECT 0 AS position, * FROM json_to_recordset('[{"id": 1, "value": "X"}, {"id": 3, "value": "C"}, {"id": 4, "value": "C"}]') AS dst(id int, value text) -- new
) AS ordered ORDER BY position

Next step, I took the distinct rows by id:

SELECT DISTINCT ON (id) id, value FROM (
	SELECT id, value FROM (
		SELECT 1 AS position, * FROM json_to_recordset('[{"id": 2, "value": "B"}, {"id": 1, "value": "A"}]') AS src(id int, value text) -- old
		UNION ALL
		SELECT 0 AS position, * FROM json_to_recordset('[{"id": 1, "value": "X"}, {"id": 3, "value": "C"}, {"id": 4, "value": "C"}]') AS dst(id int, value text) -- new
	) AS ordered ORDER BY position
) data

And finally, the result needs to be converted to JSON:

SELECT json_agg(uniqueid) FROM (
	SELECT DISTINCT ON (id) id, value FROM (
		SELECT id, value FROM (
			SELECT 1 AS position, * FROM json_to_recordset('[{"id": 2, "value": "B"}, {"id": 1, "value": "A"}]') AS src(id int, value text) -- old
			UNION ALL
			SELECT 0 AS position, * FROM json_to_recordset('[{"id": 1, "value": "X"}, {"id": 3, "value": "C"}, {"id": 4, "value": "C"}]') AS dst(id int, value text) -- new
		) AS ordered ORDER BY position
	) data
) uniqueid

PostgreSQL: cannot call json_to_recordset on a scalar

I have a nullable JSON column in a PostgreSQL table. Data stored in the column is an array with key/value objects which have to be converted into records at some point. It’s a job for json_to_recordset, which takes a JSON array as input and returns the records.

create table people (
	properties json
);

insert into people values (null), ('[{"key": "a"}, {"key": "b"}]');

select * from people;

select key
from
	people,
	json_to_recordset(properties) as properties(key text);

But JSON also allows ‘null’ as value. And I don’t mean it allows null values because it’s a nullable column, it actually allows the string ‘null’, which is recognized as a null JSON value.

insert into people values ('null');

The value of the properties column in the new row is not null, so now json_to_recordset will complain about being passed a scalar.

Be careful when updating a JSON column, make sure you really set it to null if you want no value.

Abstract dependencies

Projects depend on packages, internal ones or from 3rd parties. In all cases, your architecture should dictate what it needs and not evolve around a package, otherwise changing the dependency with another package and mocking it in tests could take unnecessary effort and time.

While it’s very easy to include a package in any file you need and start using it, time will show it’s painful to spread dependencies all over. Someday you’ll want to change a package because it’s not maintained anymore or you discover security issues which lead to loss of trust, or maybe you just want to experiment with another package.

This is where abstraction comes in, helping to decouple an implementation in your project, to set the rules which a dependency must follow. Each package has its own API, thus you need to wrap it by your API.

To show an example, I’ve chosen to integrate a validation package into Echo framework. The validator package requires you to tag your structs with its validation rules. Continue reading Abstract dependencies

Error handling in Echo framework

If misused, error handling and logging in Go can become too verbose and tangled. If the two are decoupled, you can just pass the error forward (maybe add some context to it if necessary, or have specific error structs) and have it logged or sent to various systems (like New Relic) somewhere else in the code, not where you receive it from a function call.

One of the features I appreciate the most in Echo framework is the HTTPErrorHandler which can be customized to any needs. And combined with the recover middleware, error management becomes an easy task.

package main

import (
   "errors"
   "fmt"
   "net/http"

   "github.com/labstack/echo/v4"
   "github.com/labstack/echo/v4/middleware"
   "github.com/labstack/gommon/log"
)

func main() {
   server := echo.New()
   server.Use(
      middleware.Recover(),   // Recover from all panics to always have your server up
      middleware.Logger(),    // Log everything to stdout
      middleware.RequestID(), // Generate a request id on the HTTP response headers for identification
   )
   server.Debug = false
   server.HideBanner = true
   server.HTTPErrorHandler = func(err error, c echo.Context) {
      // Take required information from error and context and send it to a service like New Relic
      fmt.Println(c.Path(), c.QueryParams(), err.Error())

      // Call the default handler to return the HTTP response
      server.DefaultHTTPErrorHandler(err, c)
   }

   server.GET("/users", func(c echo.Context) error {
      users, err := dbGetUsers()
      if err != nil {
         return err
      }

      return c.JSON(http.StatusOK, users)
   })

   server.GET("/posts", func(c echo.Context) error {
      posts, err := dbGetPosts()
      if err != nil {
         return err
      }

      return c.JSON(http.StatusOK, posts)
   })

   log.Fatal(server.Start(":8088"))
}

func dbGetUsers() ([]string, error) {
   return nil, errors.New("database error")
}

func dbGetPosts() ([]string, error) {
   panic("an unhandled error occurred")
   return nil, nil
}

Avoid logging in functions:

server.GET("/users", func(c echo.Context) error {
   users, err := dbGetUsers()
   if err != nil {
      log.Errorf("cannot get users: %s", err)
      return err
   }

   return c.JSON(http.StatusOK, users)
})

Keep the handler functions clean, check your errors and pass them forward.

Golang error handling can be clean if you handle it as a decoupled component instead of mixing it with the main logic.

Automated Jenkins CI setup for Go projects

I was helping a friend on a project with some tasks amongst which code quality. I’ve set up some tools (gometalinter first, then golangci-lint) and integrated them with Travis CI.

Then GitHub decided to give free private repositories and my friend made the project private. Travis and other CI systems require paid plans for private repositories so I decided to set up a Jenkins environment. I’ve used Jenkins before, but never configured it myself. I thought it’s going to be a quick click-click install process, but I’ve found myself in front of various plugins, configurations, credentials and all sort of requirements.

One of the first things that come in my mind when I have to do something is if I’ll have to do it again later and if I should automate the process. And that’s how a pet project was born. My purpose was to write a configuration file, install the environment, then configure jobs for the repositories I need.

I’ve created two job templates, one for triggering a build on branch push, one for pull requests (required webhooks are created automatically). Based on the templates, I just wanted to create a new job, insert the GitHub repository url and start using the job.

The job templates are aiming at Go projects to run tests and code quality tools on (all required tools are installed automatically). Other configured actions are automatic backups and cleanups. And you’ll also find some setup scripts for basic server security and requirements.

There are things which could be improved, but now I have a click-click CI system setup called Go Jenkins CI.

 

 

Protect Docker secrets files

I have a Docker container managed with Docker compose, which defines the unless-stopped restart policy. But my container never starts after I reboot the machine. But it does restart if I just restart the Docker service. I have a similar setup on another machine which I have no issues with.

I kept on searching the issue until it hit me all of a sudden. I’m using secrets read from files. The files are in the /tmp  directory which gets cleared, thus the container fails to start.

When I defined the secrets files, I just threw them away from the source code repository, without thinking too much where. I wasn’t even sure if I was going to use them for a long time.

Dev testing

A user story is not done after the code has been written. Also, the developer’s part is not done after the code has been written. I strongly believe in dev testing: after implementing, the developer will once again read each line of the requirements (they had already done this at least once before implementing, right?!?!) and manually test the feature (even if they’ve written automated tests). At least a sanity test to make sure the QA colleagues will not get back to them after a few minutes of testing or every few minutes with another issue.

The interaction with the QA colleagues should be more in breaks. I want to see those people get bored like hell because they just need to check the implementation against the requirements and get back to chatting because everything is OK. If they get back with special situations, corner cases, weird exceptions, everyone’s doing a great job.

We’re human, we forget things while developing, it’s OK. But if the number of things is too big to fit in memory and it increases with every feature, maybe we should read the requirements a few more times and test each one of them. And we should ask questions. The code is done after we understand, implement, test.

And always ask ourselves what happens to the other parts of the app if we do just a small change in the old code. Then test. Yes, we, the developers, must test!