PostgreSQL’s coolest features and how you can apply them

A popular database management system, PostgreSQL is an open source and robust piece of software that is also very versatile. During a recent brown bag session, Adrian Serafin introduced us to some of his favorite features in PostgreSQL with working examples of how to apply them.

Here are the features Adrian covered:

Table inheritance

A relatively new feature of PostgreSQL, introduced in version 9.4, is the table inheritance feature, offering some interesting options for database design. Let’s say you’re working on a piece of software to track employee time across different projects and you create a database of all the employees in a company. However, the company also has contractors who need to be included in the system. You might want to display similar information on both full-time employees and contractors, but with an additional column for invoices for contractors. With table inheritance, your contractors table can inherit details from the original employees table while having an additional column.


create table employees (
  name varchar,
  email varchar
);

create table contractors (
  invoice_prefix varchar
) inherits (employees);

insert into employees values ('Jonh Employee', 'employee@example.com');
insert into contractors values('John Contractor', 'contractor@example.com', 'C');

In this example, the contractor table inherits details from the original but with an additional invoice_prefix column.

Now, to list all employees of a company you would just do a simple query:


select * from employees;

      name       |         email
-----------------+------------------------
 John Employee   | employee@example.com
 John Contractor | contractor@example.com

This then shows all the records from both the employees and contractors tables.

However, to present only information from the contractors table you can specify:



select * from contractors;

      name       |         email          | invoice_prefix
-----------------+------------------------+----------------
 John Contractor | contractor@example.com | C

However, what if you just want to present information on the employees and not contractors? There is a simple fix using one additional word: only.


select * from only employees

     name      |        email
---------------+----------------------
 John Employee | employee@example.com

As you can see, this then just displays the employees minus the contractors. Pretty nifty eh?

Foreign data wrappers

The next useful feature identified by Adrian is the ability for PostgreSQL to access data from other sources. Known as ‘foreign data wrappers’, this feature means that you can query data from other systems: a remote SQL server, NoSQL server or even CSV, XML or JSON files amongst many others

As an example, if you imagine you’re building a new system in Rails using PostgreSQL and you need to connect to a legacy database that is in MySQL to access some data. Using foreign data wrappers means that you don’t need to expose your new application to the old database, instead making your application think it’s dealing with the same database. First, you’ll need to install a driver, which in this case is for MySQL, and enable the extension:


CREATE EXTENSION IF NOT EXISTS mysql_fdw;
You then need to let PostgreSQL know how to connect to the specified server:

CREATE SERVER local_mysql_server
     FOREIGN DATA WRAPPER mysql_fdw
     OPTIONS (host '127.0.0.1', port '3306');

CREATE USER MAPPING FOR aserafin
SERVER local_mysql_server
OPTIONS (username 'aserafin', password 'test123test');

And then create mapping for a table on your source database:


CREATE FOREIGN TABLE mysql_users(
     name varchar,
     email varchar)
SERVER local_mysql_server
     OPTIONS (dbname 'postgresql_bb', table_name 'users');

Once this is set up, you’re then able to query information from the table in the source location, as if it were just another local table.


select * from mysql_users;

We can then update, insert and modify data on the MySQL server from within PostgreSQL.


insert into mysql_users values('John Smith', 'smith@example.com');
udpate mysql_users set name = 'Mark Smith' where email = 'smith@example.com';

We can also use external tables in join queries:


select * from mysql_users, invoices where mysql_users.email = invoices.email

The performance of select operations depends on the wrapper implementation. For example the MySQL driver implementation is passing the where conditions in the command above to the MySQL server and fetching back already filtered data. But this may not be the case for other drivers.

You should always check the driver behaviour, be aware of it’s performance limitations and run your own tests to avoid issues with production.

View the full list of available data wrappers here.

Common table expressions

CTE, or common table expressions provide a way to extract statements for a larger query into a named subquery. This makes them easier to read and understand. They can be thought of as temporary tables that exist just for this specific query. It works together with a select statement to display information from a database as required.

Let’s assume we have a table with a product, and we want to generate a weekly sales report. First we create the table:


create table if not exists sales
(
  date date,
  sku_code varchar,
  quantity integer,
  price numeric,
  id serial not null,
  constraint sales_pkey primary key (id)
);

We then want to generate a sales report, even though on some days there have been no sales. First of all, we want to create the time frame that the sales will be measured against, in this case days of the week. This is done using the generate_series function:


select date(day) from generate_series('2019-01-01', '2019-01-07', interval '1 day') as day;

    day
------------
 2019-01-01
 2019-01-02
 2019-01-03
 2019-01-04
 2019-01-05
 2019-01-06
 2019-01-07

Having generated our table, we can use this query as a CTA and left join the sales table to it:


with days as (
  select date(day) as day from generate_series('2019-01-01', '2019-01-07', interval '1 day') as day
)
select days.day, coalesce(sales.quantity * sales.price, 0) as total from days left join sales on (days.day = sales.date);

    day     | total
------------+-------
 2019-01-01 |  1000
 2019-01-02 |   100
 2019-01-03 |   300
 2019-01-04 |     0
 2019-01-05 |   300
 2019-01-06 |     0
 2019-01-07 |     0

Using common table expressions, we have specified ‘days’ at the top which has helped to present the query in a more readable way.

As Adrian noted in his presentation, the CTEs could become a double edged sword. The CTE data is evaluated only one per parent query execution, which on one hand allows us to cache expensive calculations. But on the other hand it can force PostgreSQL to materialise the query resulting in much worse performance.

Window Functions

Window function is used to perform a calculation on a set of records that are somehow related to the current record. In a way this is similar to what aggregate functions do (like max, min, sum etc) and in fact almost all aggregate functions can be used in a window context. The key difference though is that using a window function does not cause rows to be grouped into a single output row like the aggregate functions do. Instead every row will retain its individual identity.

With the exception of standard functions like min, max, sum etc, this feature provides additional functionality - for example presenting data that is sensitive to row ordering like first, last, rank etc.

Take for example organising a running event, and you have the times for each participant stored in a table including their age category.


create table participants (
  name varchar,
  finished_at timestamp with time zone,
  age_category varchar
);

If you want to display the winners of each category, by displaying their finishing time and age category, you would use window functions.


with ranked_participants as (
  select
    name,
    finished_at,
    age_category,
    rank() over w
  from
    participants
  window w as (partition by age_category order by finished_at asc)
)
select * from ranked_participants where rank = 1;

 name |      finished_at       | age_category | rank
------+------------------------+--------------+------
 Jane | 2019-01-01 08:01:12+01 | adults       |    1
 John | 2019-01-01 08:00:12+01 | kids         |    1

There are lots of different applications for window functions, which you can view here

Storing or querying JSON

PostgreSQL also allows you to store JSON, which is stored as plain text making it faster to insert but slower to modify. However, you can also store in JSONB format, which is the binary representation of JSON which is slower to modify but faster to query.

As an example, we will create an orders table, but instead of having a separate table for order items we will create a jsonb column info in the orders table that will contain order items among some other data.


CREATE TABLE IF NOT EXISTS orders (
  id integer,
  info jsonb
);

insert into orders values(1, '{"price": 100, "tax": 10, "items": [{"sku_code": "A001", "quantity": 1}, {"sku_code": "A003", "quantity": 2}]}');
insert into orders values(2, '{"price": 200, "tax": 20, "items": [{"sku_code": "A002", "quantity": 5}]}');

We can select just one property of the info column as json


select info->'price' as price, info->'tax' as tax from orders;

 price | tax
-------+-----
 100   | 10
 200   | 20

Or as text


select info->>'price' as price, info->>'tax' as tax from orders;

 price | tax
-------+-----
 100   | 10
 200   | 20

We can also select the first item of each order


select info#>'{items, 0}' as item from orders;

                item
-------------------------------------
 {"quantity": 1, "sku_code": "A001"}
 {"quantity": 5, "sku_code": "A002"}

We can select all sku_codes for every order


select orders.id, array_agg(items.value->>'sku_code') from orders, jsonb_array_elements(orders.info->'items') as items group by orders.id;

 id |  array_agg
----+-------------
  1 | {A001,A003}
  2 | {A002}

We can see how many items are in every order


select id, jsonb_array_length(info->'items') from orders;

 id | jsonb_array_length
----+--------------------
  1 |                  2
  2 |                  1

We can also select all orders containing some sku_code


select * from orders where info->'items' @> '[{"sku_code": "A001"}]'

id |                                                      info
----+----------------------------------------------------------------------------------------------------------------
  1 | {"tax": 10, "items": [{"quantity": 1, "sku_code": "A001"}, {"quantity": 2, "sku_code": "A003"}], "price": 100}

We can sum quantity of items in every order


select sum(cast(items.value->>'quantity' as integer)) from orders, jsonb_array_elements(orders.info->'items') as items group by orders.id;


 sum
-----
   3
   5

Adrian recommends using JSONB due to it’s superior querying and indexing capabilities. As Adrian points out, storing as JSON limits your potential to edit and change things in the future.

Generating JSON

As well as storing JSON data on the database, you can also return JSON. Adrian explains how you can use row_to_json together with the function json_agg to transform the results of the runners query from the window functions example to json.


with ranked_participants as (
  select
    name,
    finished_at,
    age_category,
    rank() over w
  from
    participants
  window w as (partition by age_category order by finished_at asc )
)
select json_agg(row_to_json(ranked_participants.*))  from ranked_participants where rank = 1;

ON Conflict

Another of the neat features of PostgreSQL include ON conflict, or upsert, which is another new addition. Currently only available for users of PostgreSQL 9.5 and up, ON Conflict allows you to avoid raising an exception and performing a different action in the event that a row already exists in a given table.

If for example you have a table which includes a unique email column, like this:


create table users (
  email varchar UNIQUE,
  duplicated_attempts integer not null default 0
);


insert into users values('test@email.com');

You can create an upsert in ON Conflict which means that if a user with a duplicate email is added to the table, rather than raising an exception it simply increases the duplicated attempts counter.


insert into users values('test@email.com') on conflict (email) do update duplicates_counter = duplicates_counter + 1;
insert into users values('test@email.com') on conflict (email) do update set duplicated_attempts = users.duplicated_attempts + 1;

select * from users;

     email      | duplicated_attempts
----------------+---------------------
 test@email.com |                   2

There is also support for custom functions in plpgSQL and other languages, such as pl-ruby.

Adrian also highlights the listen/notify feature which allows you to send messages to clients in case of updates or changes to the data.

For extensive tables there is also an option to partition tables and divide up data as needed. If you want to separate data into specific tables, such as monthly ranges or client types, then you are able to create segregated tables.

The Wrap up

Because it’s open source, PostgreSQL is always evolving with new features being added continuously. You can find their latest release information right on their homepage , which also provides links to all their documentation and tutorials.