PgClosureTreeRebuild

Some time ago I was involved in a project aiming to systemize a pandemonium of databases. Especially hard were the toponym ones. The source data was a deep XML tree with regions and cities at the top and streets and houses at the bottom.

| id         | parent_id  | name       |
| ---------- |------------|------------|
| 1          | 2          | Kremlin    |
| 2          |            | Moscow     |
| 3          | 2          | Red Square |
| …   (1M+ unsorted records)         … |

I had to put this tree into PostgreSQL to allow queries. A typical query was, for example, to fetch all the streets of the cities within a specific region or to fetch the whole path from a particular street at the bottom up to the root node, etc.

Since there were quite a few queries to make, it was a must to process them as fast as possible. Therefore, I needed a hierarchical storage pattern to speed them up.

So I did the following:

  • Imported the table;
  • Chose the suitable hierarchical pattern;
  • Calculated the extra reference data used by the chosen pattern.

At each of the three stages, it was the speed that mattered. I wanted to do everything in next to no time, because there are not many things worse than having your server not responding, and downtimes are unacceptable.

Importing the initial data

In PostgreSQL, a table is best populated by the COPY FROM statement. It works significantly faster than multiple INSERT statements. Ensure to prepare the input data for COPY FROM. It accepts CSV or a binary file/string at the input. Binary is faster.

Author

In the same orbit

Explore more open source projects

Let's solve your hard problems

Martians at a glance
18
years in business

We're experts at helping developer products grow, with a proven track record in UI design, product iterations, cost-effective scaling, and much more. We'll lay out a strategy before our engineers and designers leap into action.

If you prefer email, write to us at surrender@evilmartians.com