Setting up a Docker environment
The easiest way to get up and running with these simple hacks is to install Docker and run a postgres container, we can then connect to the container using a simple IDE - personally I use PSequel, its free and easy to use with my Mac.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
export POSTGRES_USER="user"
export POSTGRES_PASSWORD="password123"
export POSTGRES_PORT=5432
docker run -p $POSTGRES_PORT:5432 -d --name postgres -e POSTGRES_USER=$POSTGRES_USER -e POSTGRES_PASSWORD=$POSTGRES_PASSWORD postgres
echo "
Connection to postgres:
Host: localhost
Port: $POSTGRES_PORT
User: $POSTGRES_USER
Password: $POSTGRES_PASSWORD
"
|
Postgres hacks
1 - plpgsql function for when GreenPlum doesn't like Postgres
While developing a pipeline we wanted to keep all of our hard codes dependencies in one table which we could access as developers and the business could access through automated documentation that is produced from the table itself.
This way we can have a quality assurance that goes along with our output and make the development more transparent. To do this we thought we should make some useful SQL functions that make the script less verbose and save our fingers!
First we created a table to accept the data we wanted to store in it:
1
2
3
4
5
6
7
|
DROP TABLE IF EXISTS dependancies;
CREATE TABLE dependancies (
name text;
descr text;
codes text[];
codes_descr text[];
);
|
The dummytable we will use in this example is:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
-- Creating a table for our dummy data
DROP TABLE IF EXISTS dummytable;
CREATE TABLE dummytable (
value TEXT,
code TEXT,
descr TEXT
);
--Adding some dummy data into the table
INSERT INTO dummytable (value, code, descr) VALUES
('CHI', 'ANIMAL', 'Chickens are on the run'),
('CHI', 'FOOD', 'Chick pea curry'),
('BA', 'FOOD', 'Bananas'),
('BA', 'ANIMAL', 'The bats have escaped'),
('ORA', 'FOOD', 'Oranges');
|
The initial function was as follows:
1
2
3
4
5
6
7
8
9
10
|
CREATE or replace FUNCTION dep_record(name text, descr text, codes text[])
RETURNS TABLE(name text, descr text, codes text[], codes_descr text[])
AS
$$
SELECT $1::text as name, $2::text as descr, array_agg(t1.value::text) as codes, array_agg(t2.descr::text) as codes_descr
FROM (SELECT unnest($3) as value) t1
LEFT JOIN dummytable t2
ON t1.value=t2.value AND t2.code = 'ANIMAL'
$$
LANGUAGE SQL;
|
Now if we try to use this function (using GreenPlum):
1
2
|
INSERT INTO dependancies(name, descr, codes, codes_descr)
SELECT * FROM dep_record('animal_codes','Codes of the animals used in the Zoo pipeline', ARRAY['CHI', 'BA', 'COW', 'MON']);
|
We would expect this table:
name |
descr |
codes |
codes_descr |
animal_codes |
Codes of the animals used in the Zoo pipeline |
{"CHI", "BA", "COW", "MON"} |
{"Chickens are on the run", "The bats have escaped", NULL, NULL} |
But instead we get an error:
1
|
ERROR function cant execute on segment because it accesses relation "dummytable"
|
Solution
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
CREATE or replace FUNCTION dep_record(nsname text, nsdescr text, nscodes text[])
RETURNS void
AS
$$
BEGIN
INSERT INTO dependancies(name, descr, codes, codes_descr)
SELECT nsname::text as name, nsdescr::text as descr, array_agg(t1.value::text) as codes, array_agg(t2.descr::text) as codes_descr
FROM (SELECT unnest(nscodes) as value) t1
LEFT JOIN dummytable t2
ON t1.value=t2.value AND t2.code = 'ANIMAL';
RETURN;
END;
$$
LANGUAGE plpgsql;
-- How to use:
SELECT dep_record('animal_codes','Codes of the animals used in the Zoo pipeline', ARRAY['CHI', 'BA', 'COW', 'MON']);
SELECT * FROM dependancies;
|
Now we get the required solution, simply by adding the insert into the function!
name |
descr |
codes |
codes_descr |
animal_codes |
Codes of the animals used in the Zoo pipeline |
{"CHI", "BA", "COW", "MON"} |
{"Chickens are on the run", "The bats have escaped", NULL, NULL} |
2 ) One case statement with multiple columns on output
Say for example, we have a single condition where we want to display information in our table as follows:
A timestamp of the application, a description of the application outcome and the number of days since that outcome was decided. This is easy enough to do in something like Python where we would just give a conditional statement and make all the changes we needed to for multiple columns. However in Postgres this is a little tricky, the solution comes in the form of making our own data type with the data types of the new columns we want to output as a record type.
So for our example, we would want a new data type that holds a timestamp, an integer and text. We can our new data type a very imaginitive name of floatytextdate. Lets also make some dummy data to do with applications of visas and passports:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
-- Creating our new data type
DROP TYPE IF EXISTS floatytextdate;
CREATE TYPE floatytextdate AS (t1 INTEGER, t2 TEXT, t3 TIMESTAMP);
-- Creating a table for our dummy data
DROP TABLE IF EXISTS application_details;
CREATE TABLE application_details (
dte_of_app TIMESTAMP,
exp_dte_of_app TIMESTAMP,
type_of_app TEXT
);
--Adding some dummy data into the table
INSERT INTO application_details (dte_of_app, exp_dte_of_app, type_of_app) VALUES
('2003-2-1', '2013-2-1', 'Passport'),
('2017-5-23', '2027-5-23', 'Passport'),
('2012-9-16', '2022-9-16', 'Passport'),
('2003-2-1', '2013-2-1', 'Visa'),
('2017-5-23', '2027-5-23', 'Visa');
|
Now time for the cheeky hack!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
DROP TABLE IF EXISTS application_verbose;
CREATE TABLE application_verbose AS
SELECT (x.temp).t1 AS days_since, (x.temp).t2 AS days_since_reason, (x.temp).t3 AS relevent_date
FROM (
SELECT CASE
WHEN type_of_app = 'Passport' AND exp_dte_of_app < now() THEN (date_part('day', now() - exp_dte_of_app), 'Days since most recent passport expired', exp_dte_of_app)::floatytextdate WHEN type_of_app = 'Passport' AND exp_dte_of_app >= now()
THEN (date_part('day', now() - dte_of_app), 'Days since most recent passport has been valid', dte_of_app)::floatytextdate
WHEN type_of_app = 'Visa' AND exp_dte_of_app < now() THEN (date_part('day', now() - exp_dte_of_app), 'Days since most recent visa expired', exp_dte_of_app)::floatytextdate WHEN type_of_app = 'Visa' AND exp_dte_of_app >= now()
THEN (date_part('day', now() - dte_of_app), 'Days since most recent visa has been valid', dte_of_app)::floatytextdate
END AS temp
FROM application_details
) x;
|
Notice how we only used a single case statement but ended up with three columns with different data types outputed! Pretty nifty.
Result
days_since |
days_since_reason |
relevent_date |
251 |
Days since most recent passport has been valid |
2017-05-23 00:00:00 |
251 |
Days since most recent visa has been valid |
2017-05-23 00:00:00 |
1823 |
Days since most recent passport expired |
2013-02-01 00:00:00 |
1823 |
Days since most recent visa expired |
2013-02-01 00:00:00 |
1961 |
Days since most recent passport has been valid |
2012-09-16 00:00:00 |