postgresql Archives - Andrei Avram

Merge JSON arrays by key in PostgreSQL

UPDATE:

While testing more cases, I’ve come up with a solution that covers my needs better. Instead of taking distinct values, I used a FULL JOIN between the two data sets:

SELECT json_agg(data) FROM (
	SELECT coalesce(new.id, old.id) AS id, coalesce(new.value, old.value) AS value
	FROM jsonb_to_recordset('[{"id": 1, "value": "A"}, {"id": 2, "value": "B"}, {"id": 3, "value": "C"}, {"id": 6, "value": "6"}]') AS old(id int, value text)
	FULL JOIN jsonb_to_recordset('[{"id": 1, "value": "NEW A"}, {"id": 2, "value": "NEW B"}, {"id": 5, "value": "5"}]') AS new(id int, value text)
	ON old.id = new.id
) data

Given a JSON column which holds an array of objects, each object having “id” and “value” properties, I wanted to update the array with a new one, by merging the values by the “id” property.
If same id is present in an object of both arrays, the new one will be used.

Old:

[
    {"id": 2, "value": "B"},
    {"id": 1, "value": "A"}
]

New:

[
    {"id": 1, "value": "X"},
    {"id": 3, "value": "C"},
    {"id": 4, "value": "C"}
]

Result:

[
    {"id":1, "value":"X"},
    {"id":2, "value":"B"},
    {"id":3, "value":"C"},
    {"id":4, "value":"C"}
]

And I wanted to do this in one database roundtrip, not take the data in back end, merge it, and send it back to the database.

First, I transformed the array objects into records and prioritized the new array by giving its values a position; and I put both sets together:

SELECT 1 AS position, * FROM json_to_recordset('[{"id": 2, "value": "B"}, {"id": 1, "value": "A"}]') AS src(id int, value text) -- old
UNION ALL
SELECT 0 AS position, * FROM json_to_recordset('[{"id": 1, "value": "X"}, {"id": 3, "value": "C"}, {"id": 4, "value": "C"}]') AS dst(id int, value text) -- new

Then I extracted only the id and the value, the position being used only to sort data:

SELECT id, value FROM (
	SELECT 1 AS position, * FROM json_to_recordset('[{"id": 2, "value": "B"}, {"id": 1, "value": "A"}]') AS src(id int, value text) -- old
	UNION ALL
	SELECT 0 AS position, * FROM json_to_recordset('[{"id": 1, "value": "X"}, {"id": 3, "value": "C"}, {"id": 4, "value": "C"}]') AS dst(id int, value text) -- new
) AS ordered ORDER BY position

Next step, I took the distinct rows by id:

SELECT DISTINCT ON (id) id, value FROM (
	SELECT id, value FROM (
		SELECT 1 AS position, * FROM json_to_recordset('[{"id": 2, "value": "B"}, {"id": 1, "value": "A"}]') AS src(id int, value text) -- old
		UNION ALL
		SELECT 0 AS position, * FROM json_to_recordset('[{"id": 1, "value": "X"}, {"id": 3, "value": "C"}, {"id": 4, "value": "C"}]') AS dst(id int, value text) -- new
	) AS ordered ORDER BY position
) data

And finally, the result needs to be converted to JSON:

SELECT json_agg(uniqueid) FROM (
	SELECT DISTINCT ON (id) id, value FROM (
		SELECT id, value FROM (
			SELECT 1 AS position, * FROM json_to_recordset('[{"id": 2, "value": "B"}, {"id": 1, "value": "A"}]') AS src(id int, value text) -- old
			UNION ALL
			SELECT 0 AS position, * FROM json_to_recordset('[{"id": 1, "value": "X"}, {"id": 3, "value": "C"}, {"id": 4, "value": "C"}]') AS dst(id int, value text) -- new
		) AS ordered ORDER BY position
	) data
) uniqueid

PostgreSQL: cannot call json_to_recordset on a scalar

I have a nullable JSON column in a PostgreSQL table. Data stored in the column is an array with key/value objects which have to be converted into records at some point. It’s a job for json_to_recordset, which takes a JSON array as input and returns the records.

create table people (
	properties json
);

insert into people values (null), ('[{"key": "a"}, {"key": "b"}]');

select * from people;

select key
from
	people,
	json_to_recordset(properties) as properties(key text);

But JSON also allows ‘null’ as value. And I don’t mean it allows null values because it’s a nullable column, it actually allows the string ‘null’, which is recognized as a null JSON value.

insert into people values ('null');

The value of the properties column in the new row is not null, so now json_to_recordset will complain about being passed a scalar.

Be careful when updating a JSON column, make sure you really set it to null if you want no value.

PostgreSQL NOTNULL vs IS NOT NULL vs NOT ISNULL

Today I’ve found out a flavor of PostgreSQL when checking if a composite type column is not null. There are different ways of checking for not null columns:

NOTNULL
IS NOT NULL
NOT ISNULL

For primitive values any of the above works, while for the composite types there is a particularity.

Given a table of users with a deleted column of a composite type action, and some users marked as deleted, I need all users that are deleted. Continue reading PostgreSQL NOTNULL vs IS NOT NULL vs NOT ISNULL

A few things I’ve enjoyed reading lately

PostgreSQL batch operations in Go

Consider the following case: When creating a user (database insert) with their profile (another insert), other users must be updated (database update) with a new score value. Score is just a float for which a dummy formula will be used. And then an action record is needed (insert), which marks the fact that a user was created.

The tech context is PostgreSQL in Go with pgx as database driver and Echo framework for the HTTP server. The database setup is straight forward using Docker; it also includes a database management interface which will be available at http://localhost:54321. If you clone the sample repository, and start the setup with Docker Compose (docker compose up -d), when the PostgreSQL Docker container is built, a database is created with the schema used in this post.

CREATE TABLE "users" (
  "id" serial NOT NULL,
  "username" CHARACTER VARYING (100) NOT NULL,
  "score" DECIMAL NOT NULL DEFAULT 0,
  "created" TIMESTAMP(0) WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
  "updated" TIMESTAMP(0) WITH TIME ZONE
);

CREATE TABLE "user_profile" (
  "user_id" INTEGER NOT NULL,
  "firstname" CHARACTER VARYING (100) NOT NULL,
  "lastname" CHARACTER VARYING (100) NOT NULL
);

CREATE TABLE "actions" (
  "id" serial NOT NULL,
  "description" text NOT NULL,
  "created" TIMESTAMP(0) WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP
);

Data integrity is of interest, so all the queries will be sent on a database transaction. And because there are multiple user update queries, they will be sent all at the same time in a batch of operations. Continue reading PostgreSQL batch operations in Go