Some time ago I was involved in a project aiming to systemize a pandemonium of databases. Especially hard were the toponym ones. The source data was a deep XML tree with regions and cities at the top and streets and houses at the bottom.

| id         | parent_id  | name       |
| ---------- |------------|------------|
| 1          | 2          | Kremlin    |
| 2          |            | Moscow     |
| 3          | 2          | Red Square |
| …   (1M+ unsorted records)         … |

I had to put this tree into PostgreSQL to allow queries. A typical query was, for example, to fetch all the streets of the cities within a specific region or to fetch the whole path from a particular street at the bottom up to the root node, etc.

Since there were quite a few queries to make, it was a must to process them as fast as possible. Therefore, I needed a hierarchical storage pattern to speed them up.

So I did the following:

  • Imported the table;
  • Chose the suitable hierarchical pattern;
  • Calculated the extra reference data used by the chosen pattern.

At each of the three stages, it was the speed that mattered. I wanted to do everything in next to no time, because there are not many things worse than having your server not responding, and downtimes are unacceptable.

Importing the initial data

In PostgreSQL, a table is best populated by the COPY FROM statement. It works significantly faster than multiple INSERT statements. Ensure to prepare the input data for COPY FROM. It accepts CSV or a binary file/string at the input. Binary is faster.


In the same orbit

Explore more open source projects

Contact us

We’d love to hear from you! We’re not really all that evil, and we love discussing potential projects, intriguing ideas, and new opportunities. Complete the form below or drop us a line at

Martians at a glance
years in business

A product development consultancy that works with startups and established businesses, while also creating open source-based products and services