Merge JSON arrays by key in PostgreSQL

UPDATE:

While testing more cases, I’ve come up with a solution that covers my needs better. Instead of taking distinct values, I used a FULL JOIN between the two data sets:

SELECT json_agg(data) FROM (
	SELECT coalesce(new.id, old.id) AS id, coalesce(new.value, old.value) AS value
	FROM jsonb_to_recordset('[{"id": 1, "value": "A"}, {"id": 2, "value": "B"}, {"id": 3, "value": "C"}, {"id": 6, "value": "6"}]') AS old(id int, value text)
	FULL JOIN jsonb_to_recordset('[{"id": 1, "value": "NEW A"}, {"id": 2, "value": "NEW B"}, {"id": 5, "value": "5"}]') AS new(id int, value text)
	ON old.id = new.id
) data

Given a JSON column which holds an array of objects, each object having “id” and “value” properties, I wanted to update the array with a new one, by merging the values by the “id” property.
If same id is present in an object of both arrays, the new one will be used.

Old:

[
    {"id": 2, "value": "B"},
    {"id": 1, "value": "A"}
]

New:

[
    {"id": 1, "value": "X"},
    {"id": 3, "value": "C"},
    {"id": 4, "value": "C"}
]

Result:

[
    {"id":1, "value":"X"},
    {"id":2, "value":"B"},
    {"id":3, "value":"C"},
    {"id":4, "value":"C"}
]

And I wanted to do this in one database roundtrip, not take the data in back end, merge it, and send it back to the database.

First, I transformed the array objects into records and prioritized the new array by giving its values a position; and I put both sets together:

SELECT 1 AS position, * FROM json_to_recordset('[{"id": 2, "value": "B"}, {"id": 1, "value": "A"}]') AS src(id int, value text) -- old
UNION ALL
SELECT 0 AS position, * FROM json_to_recordset('[{"id": 1, "value": "X"}, {"id": 3, "value": "C"}, {"id": 4, "value": "C"}]') AS dst(id int, value text) -- new

Then I extracted only the id and the value, the position being used only to sort data:

SELECT id, value FROM (
	SELECT 1 AS position, * FROM json_to_recordset('[{"id": 2, "value": "B"}, {"id": 1, "value": "A"}]') AS src(id int, value text) -- old
	UNION ALL
	SELECT 0 AS position, * FROM json_to_recordset('[{"id": 1, "value": "X"}, {"id": 3, "value": "C"}, {"id": 4, "value": "C"}]') AS dst(id int, value text) -- new
) AS ordered ORDER BY position

Next step, I took the distinct rows by id:

SELECT DISTINCT ON (id) id, value FROM (
	SELECT id, value FROM (
		SELECT 1 AS position, * FROM json_to_recordset('[{"id": 2, "value": "B"}, {"id": 1, "value": "A"}]') AS src(id int, value text) -- old
		UNION ALL
		SELECT 0 AS position, * FROM json_to_recordset('[{"id": 1, "value": "X"}, {"id": 3, "value": "C"}, {"id": 4, "value": "C"}]') AS dst(id int, value text) -- new
	) AS ordered ORDER BY position
) data

And finally, the result needs to be converted to JSON:

SELECT json_agg(uniqueid) FROM (
	SELECT DISTINCT ON (id) id, value FROM (
		SELECT id, value FROM (
			SELECT 1 AS position, * FROM json_to_recordset('[{"id": 2, "value": "B"}, {"id": 1, "value": "A"}]') AS src(id int, value text) -- old
			UNION ALL
			SELECT 0 AS position, * FROM json_to_recordset('[{"id": 1, "value": "X"}, {"id": 3, "value": "C"}, {"id": 4, "value": "C"}]') AS dst(id int, value text) -- new
		) AS ordered ORDER BY position
	) data
) uniqueid

PostgreSQL: cannot call json_to_recordset on a scalar

I have a nullable JSON column in a PostgreSQL table. Data stored in the column is an array with key/value objects which have to be converted into records at some point. It’s a job for json_to_recordset, which takes a JSON array as input and returns the records.

create table people (
	properties json
);

insert into people values (null), ('[{"key": "a"}, {"key": "b"}]');

select * from people;

select key
from
	people,
	json_to_recordset(properties) as properties(key text);

But JSON also allows ‘null’ as value. And I don’t mean it allows null values because it’s a nullable column, it actually allows the string ‘null’, which is recognized as a null JSON value.

insert into people values ('null');

The value of the properties column in the new row is not null, so now json_to_recordset will complain about being passed a scalar.

Be careful when updating a JSON column, make sure you really set it to null if you want no value.

Encrypt column in PostgreSQL

PostgreSQL has a nice encryption (and hashing) module called pgcrypto which is easy to use. Using a provided key, you can quickly encrypt a column which contains sensitive information.

While the setup is fast and the usage is simple, there could be some disadvantages in some contexts:

  • be careful how you send the encryption key to the database server (if public, use SSL for transport, else keep it in a private network at least)
  • data is encrypted/decrypted in the database, so the transport is in plain (watch out for memory dump attacks)
  • some queries are very slow, as the decrypt operation is performed on the entire table if you want to sort or filter by encrypted columns

Continue reading Encrypt column in PostgreSQL

PostgreSQL NOTNULL vs IS NOT NULL vs NOT ISNULL

Today I’ve found out a flavor of PostgreSQL when checking if a composite type column is not null. There are different ways of checking for not null columns:

  • NOTNULL
  • IS NOT NULL
  • NOT ISNULL

For primitive values any of the above works, while for the composite types there is a particularity.

Given a table of users with a deleted column of a composite type action, and some users marked as deleted, I need all users that are deleted. Continue reading PostgreSQL NOTNULL vs IS NOT NULL vs NOT ISNULL

PostgreSQL batch operations in Go

Consider the following case: When creating a user (database insert) with their profile (another insert), other users must be updated (database update) with a new score value. Score is just a float for which a dummy formula will be used. And then an action record is needed (insert), which marks the fact that a user was created.

The tech context is PostgreSQL in Go with pgx as database driver and Echo framework for the HTTP server. The database setup is straight forward using Docker; it also includes a database management interface which will be available at http://localhost:54321. If you clone the sample repository, and start the setup with Docker Compose (docker compose up -d), when the PostgreSQL Docker container is built, a database is created with the schema used in this post.

CREATE TABLE "users" (
  "id" serial NOT NULL,
  "username" CHARACTER VARYING (100) NOT NULL,
  "score" DECIMAL NOT NULL DEFAULT 0,
  "created" TIMESTAMP(0) WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
  "updated" TIMESTAMP(0) WITH TIME ZONE
);

CREATE TABLE "user_profile" (
  "user_id" INTEGER NOT NULL,
  "firstname" CHARACTER VARYING (100) NOT NULL,
  "lastname" CHARACTER VARYING (100) NOT NULL
);

CREATE TABLE "actions" (
  "id" serial NOT NULL,
  "description" text NOT NULL,
  "created" TIMESTAMP(0) WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP
);

Data integrity is of interest, so all the queries will be sent on a database transaction. And because there are multiple user update queries, they will be sent all at the same time in a batch of operations. Continue reading PostgreSQL batch operations in Go