PostgreSQL’s coolest features and how you can apply them
A popular database management system, PostgreSQL is an open source and robust piece of software that is also very versatile. During a recent brown bag session, Adrian Serafin introduced us to some of his favorite features in PostgreSQL with working examples of how to apply them.
Here are the features Adrian covered:
- Table inheritance
- Foreign data wrappers
- Common table expressions
- Window functions
- Storing or querying JSON
- Generating JSON
- ON Conflict
A relatively new feature of PostgreSQL, introduced in version 9.4, is the table inheritance feature, offering some interesting options for database design. Let’s say you’re working on a piece of software to track employee time across different projects and you create a database of all the employees in a company. However, the company also has contractors who need to be included in the system. You might want to display similar information on both full-time employees and contractors, but with an additional column for invoices for contractors. With table inheritance, your contractors table can inherit details from the original employees table while having an additional column.
createtable employees (
name varchar,
email varchar
);
createtable contractors (
invoice_prefix varchar
) inherits (employees);
insertinto employees values ('Jonh Employee', 'employee@example.com');
insertinto contractors values('John Contractor', 'contractor@example.com', 'C');
In this example, the contractor table inherits details from the original but with an additional invoice_prefix column.
Now, to list all employees of a company you would just do a simple query:
select*from employees;
name | email
-----------------+------------------------
John Employee | employee@example.com
John Contractor | contractor@example.com
This then shows all the records from both the employees and contractors tables.
However, to present only information from the contractors table you can specify:
select*from contractors;
name | email | invoice_prefix
-----------------+------------------------+----------------
John Contractor | contractor@example.com |C
However, what if you just want to present information on the employees and not contractors? There is a simple fix using one additional word: only.
select*fromonly employees
name | email
---------------+----------------------
John Employee | employee@example.com
As you can see, this then just displays the employees minus the contractors. Pretty nifty eh?
The next useful feature identified by Adrian is the ability for PostgreSQL to access data from other sources. Known as ‘foreign data wrappers’, this feature means that you can query data from other systems: a remote SQL server, NoSQL server or even CSV, XML or JSON files amongst many others
As an example, if you imagine you’re building a new system in Rails using PostgreSQL and you need to connect to a legacy database that is in MySQL to access some data. Using foreign data wrappers means that you don’t need to expose your new application to the old database, instead making your application think it’s dealing with the same database. First, you’ll need to install a driver, which in this case is for MySQL, and enable the extension:
CREATE EXTENSION IFNOTEXISTS mysql_fdw;
You then need to let PostgreSQL know how toconnectto the specified server:
CREATE SERVER local_mysql_server
FOREIGNDATA WRAPPER mysql_fdw
OPTIONS (host'127.0.0.1', port '3306');
CREATEUSER MAPPING FOR aserafin
SERVER local_mysql_server
OPTIONS (username 'aserafin', password 'test123test');
And then create mapping for a table on your source database:
CREATEFOREIGNTABLE mysql_users(
name varchar,
email varchar)
SERVER local_mysql_server
OPTIONS (dbname 'postgresql_bb', table_name'users');
Once this is set up, you’re then able to query information from the table in the source location, as if it were just another local table.
select*from mysql_users;
We can then update, insert and modify data on the MySQL server from within PostgreSQL.
insertinto mysql_users values('John Smith', 'smith@example.com');
udpate mysql_users set name ='Mark Smith'where email ='smith@example.com';
We can also use external tables in join queries:
select*from mysql_users, invoices where mysql_users.email = invoices.email
The performance of select operations depends on the wrapper implementation. For example the MySQL driver implementation is passing the where conditions in the command above to the MySQL server and fetching back already filtered data. But this may not be the case for other drivers.
You should always check the driver behaviour, be aware of it’s performance limitations and run your own tests to avoid issues with production.
View the full list of available data wrappers here.
CTE, or common table expressions provide a way to extract statements for a larger query into a named subquery. This makes them easier to read and understand. They can be thought of as temporary tables that exist just for this specific query. It works together with a select statement to display information from a database as required.
Let’s assume we have a table with a product, and we want to generate a weekly sales report. First we create the table:
createtableifnotexists sales
(
date date,
sku_code varchar,
quantity integer,
price numeric,
id serial notnull,
constraint sales_pkey primarykey (id)
);
We then want to generate a sales report, even though on some days there have been no sales. First of all, we want to create the time frame that the sales will be measured against, in this case days of the week. This is done using the generate_series function:
select date(day) from generate_series('2019-01-01', '2019-01-07', interval '1 day') asday;
day------------
2019-01-012019-01-022019-01-032019-01-042019-01-052019-01-062019-01-07
Having generated our table, we can use this query as a CTA and left join the sales table to it:
with days as (
select date(day) asdayfrom generate_series('2019-01-01', '2019-01-07', interval '1 day') asday
)
select days.day, coalesce(sales.quantity * sales.price, 0) as total from days leftjoin sales on (days.day= sales.date);
day| total
------------+-------
2019-01-01|10002019-01-02|1002019-01-03|3002019-01-04|02019-01-05|3002019-01-06|02019-01-07|0
Using common table expressions, we have specified ‘days’ at the top which has helped to present the query in a more readable way.
As Adrian noted in his presentation, the CTEs could become a double edged sword. The CTE data is evaluated only one per parent query execution, which on one hand allows us to cache expensive calculations. But on the other hand it can force PostgreSQL to materialise the query resulting in much worse performance.
Window function is used to perform a calculation on a set of records that are somehow related to the current record. In a way this is similar to what aggregate functions do (like max, min, sum etc) and in fact almost all aggregate functions can be used in a window context. The key difference though is that using a window function does not cause rows to be grouped into a single output row like the aggregate functions do. Instead every row will retain its individual identity.
With the exception of standard functions like min, max, sum etc, this feature provides additional functionality - for example presenting data that is sensitive to row ordering like first, last, rank etc.
Take for example organising a running event, and you have the times for each participant stored in a table including their age category.
createtable participants (
name varchar,
finished_at timestampwith time zone,
age_category varchar
);
If you want to display the winners of each category, by displaying their finishing time and age category, you would use window functions.
with ranked_participants as (
select
name,
finished_at,
age_category,
rank() over w
from
participants
window w as (partition by age_category orderby finished_at asc)
)
select*from ranked_participants where rank =1;
name | finished_at | age_category | rank
------+------------------------+--------------+------
Jane |2019-01-0108:01:12+01| adults |1
John |2019-01-0108:00:12+01| kids |1
There are lots of different applications for window functions, which you can view here
As an example, we will create an orders table, but instead of having a separate table for order items we will create a jsonb column info in the orders table that will contain order items among some other data.
CREATETABLEIFNOTEXISTS orders (
id integer,
info jsonb
);
insertinto orders values(1, '{"price": 100, "tax": 10, "items": [{"sku_code": "A001", "quantity": 1}, {"sku_code": "A003", "quantity": 2}]}');
insertinto orders values(2, '{"price": 200, "tax": 20, "items": [{"sku_code": "A002", "quantity": 5}]}');
We can select just one property of the info column as json
select info->'price'as price, info->'tax'as tax from orders;
price | tax
-------+-----
100|10200|20
Or as text
select info->>'price'as price, info->>'tax'as tax from orders;
price | tax
-------+-----
100|10200|20
We can also select the first item of each order
select info#>'{items, 0}'as item from orders;
item
-------------------------------------
{"quantity": 1, "sku_code": "A001"}{"quantity": 5, "sku_code": "A002"}
We can select all sku_codes
for every order
select orders.id, array_agg(items.value->>'sku_code') from orders, jsonb_array_elements(orders.info->'items') as items groupby orders.id;
id | array_agg
----+-------------
1|{A001,A003}2|{A002}
We can see how many items are in every order
select id, jsonb_array_length(info->'items') from orders;
id | jsonb_array_length
----+--------------------
1|22|1
We can also select all orders containing some sku_code
select*from orders where info->'items'@>'[{"sku_code": "A001"}]'
id | info
----+----------------------------------------------------------------------------------------------------------------
1|{"tax": 10, "items": [{"quantity": 1, "sku_code": "A001"}, {"quantity": 2, "sku_code": "A003"}], "price": 100}
We can sum quantity of items in every order
selectsum(cast(items.value->>'quantity'as integer)) from orders, jsonb_array_elements(orders.info->'items') as items groupby orders.id;
sum-----
35
Adrian recommends using JSONB due to it’s superior querying and indexing capabilities. As Adrian points out, storing as JSON limits your potential to edit and change things in the future.
As well as storing JSON data on the database, you can also return JSON. Adrian explains how you can use row_to_json together with the function json_agg to transform the results of the runners query from the window functions example to json.
with ranked_participants as (
select
name,
finished_at,
age_category,
rank() over w
from
participants
window w as (partition by age_category orderby finished_at asc )
)
select json_agg(row_to_json(ranked_participants.*)) from ranked_participants where rank =1;
Another of the neat features of PostgreSQL include ON conflict, or upsert, which is another new addition. Currently only available for users of PostgreSQL 9.5 and up, ON Conflict allows you to avoid raising an exception and performing a different action in the event that a row already exists in a given table.
If for example you have a table which includes a unique email column, like this:
createtable users (
email varchar UNIQUE,
duplicated_attempts integer notnulldefault0
);
insertinto users values('test@email.com');
You can create an upsert in ON Conflict which means that if a user with a duplicate email is added to the table, rather than raising an exception it simply increases the duplicated attempts counter.
insertinto users values('test@email.com') on conflict (email) doupdate duplicates_counter = duplicates_counter +1;
insertinto users values('test@email.com') on conflict (email) doupdateset duplicated_attempts = users.duplicated_attempts +1;
select*from users;
email | duplicated_attempts
----------------+---------------------
test@email.com |2
There is also support for custom functions in plpgSQL and other languages, such as pl-ruby.
Adrian also highlights the listen/notify feature which allows you to send messages to clients in case of updates or changes to the data.
For extensive tables there is also an option to partition tables and divide up data as needed. If you want to separate data into specific tables, such as monthly ranges or client types, then you are able to create segregated tables.