Implementing a depth-first search in a PostgreSQL stored procedure using Perl

In this article I'm going to show you how do implement a depth-first search using a stored procedure written in Perl inside the PostgreSQL database server. Depth-first search is an advanced algorithm for quickly traversing all nodes in a tree (like, e.g., a filesystem). In this example we're going to store our tree in a database using a very simple structure. The method I'm going to show you will also allow you to use separate sort order for each node in the tree. As you might know, databases that support SQL99 recursive queries (WITH keyword) can probably do all of this with regular SQL, but the query is not very easy to read and understand. The problem I'm trying to solve is basically to indent each record as we print it out with the required number of spaces so that it looks like a tree instead of a flat table. Because this is insanely time consuming to do client-side as the tree increases in size, I'm going to show you how to do it with just one single SQL query. The database structure we're going to use is very simple: CREATE TABLE tree ( id UUID NOT NULL DEFAULT uuid_generate_v4(), name VARCHAR(100) NOT NULL, url VARCHAR(255), created TIMESTAMP NOT NULL DEFAULT current_timestamp, parent_id UUID ); Let's assume that you have a Ubuntu or Debian machine running that has PostgreSQL 8.3 available in the repositories. The reason I'm suggesting version 8.3 is that it has built-in support for the UUID datatype and a contrib module for generating them. This way we don't need to muck around with sequences for our primary keys. Let's assume you're logged in as a user with full sudo access. Run this code to install the necessary packages needed for the examples: $ sudo aptitude install postgresql-8.3 postgresql-plperl-8.3 postgresql-contrib-8.3 libdbd-pg-perl gidentd This is what the different packages do, in case you need to look for them on other OSes:

  • postgresql-8.3: The database server itself and the client libraries.
  • postgresql-plperl-8.3: Support for perl stored procedures in postgresql.
  • postgresql-contrib-8.3: Has the uuid_generate_v4() function.
  • libdbd-pg-perl: DBD::Pg cpan module, why not use the included one?
  • gidentd: With this one we can use the default ident authentication for local postgresql users, quite handy on a development box.

After this you need to create a database user (role) for you to create databases. This command will make a user with the same name as the user you're logged in as. This is just a very quick way to enable ident authentication for your user with minimum hassle. The arguments mean: no-superuser (-S), create-database (-d) and no-roles (-R). Invert the case if you want to give yourself more/less rights. $ sudo -H -u postgres createuser -S -d -R $(whoami) After that we're ready to create our database and populate it with necessary defaults. Then we install the procedural language into our database, and finally install the uuid generator functions. The UUID functions must be added to the database as a database superuser because they're written in C. $ createdb -E UNICODE test $ createlang plperl test $ sudo -H -u postgres psql -1 test -f /usr/share/postgresql/8.3/contrib/uuid-ossp.sql In case you're wondering what that -1 argument is, it means to run everything in a transaction. Next thing is to create our test database. I'm going to assume that you put the above mentioned table definition in the file schema.sql. $ psql -1 test -f schema.sql Okay, so now we have a nice little table that we can fill with some data. Since you're probably as lazy as I am and don't want to create a lot of data yourself I've prepared a small chunk of map data you can play with. It's available here: http://files.smidsrod.no/example_data.sql Import it quickly like this: $ wget -q -O - http://files.smidsrod.no/example_data.sql | psql -a -1 test The -a option is there so that you can quickly see the results from the import. If you've looked closely at the data you will see that it implements a simple tree structure with several roots (all the records with empty parent_id). In this example I won't make use of the url attribute, but it's in there for a future article I'm planning to write. I've also included some UTF-8 data just for fun. By now you're probably getting a little bit bored with all of this mundane database stuff. Let's get on with what I really wanted to show you! To be able to return data in a smart way we need a custom datatype. This is trivially simple to create in postgresql. Save it as node.sql and import (you should be getting the hang of this now): CREATE TYPE "public"."node" AS ( node_id UUID, node_type VARCHAR(20), node_number INTEGER, node_level INTEGER ); The DFS (Depth-First-Search) algorithm is a bit large, so just install it quickly from http://files.smidsrod.no/depth_first_search.sql. $ wget -q -O - http://files.smidsrod.no/depth_first_search.sql| psql -1 test Now you should be able to run this simple query to get a set of data suitable for creating a tree. $ psql -1 test -c 'select tree.name,tt.node_id,tt.node_level from traverse_tree(null) tt JOIN tree ON node_id=tree.id;'

name node_id node_level
America 33c70c03-fddb-4c0e-b275-b33cd4c54c8a 0
United States f2880759-6b8e-494f-add9-69edfad07fab 1
Mexico 8a2dae2e-102c-40b6-a82c-5d525729c0f6 1
Brazil e41cc9bb-4bc4-41d6-9cb8-7f3aa1fd6217 1
Asia d435ab7f-b47e-4ebc-9fd6-0a0d48e46190 0
China 728bac87-9b3f-4e91-b76b-00a185555efb 1
Japan a64255cb-d1b2-41ad-a053-b38b305244ec 1
Europe 2fce9c47-936b-41c4-8b96-6b1e0c05ba1d 0
Norway 1ebeb84c-38de-4748-badb-ef726680eb1a 1
Vestfold 8c971775-edf7-4884-ad39-e9a58fd172c8 2
Sandefjord b7e39443-16d6-4a99-911f-af330e9ded7d 3
Tønsberg 7b3c12cf-ec01-4304-8d77-16e1c81c1168 3
Hordaland 0384b300-e638-4f88-802a-69762dd08e00 2
Nordland 5bc7c3d6-f23e-4848-864e-b7bd148e154b 2
France ef19a5ca-01fa-405a-a437-3d1a54570fbf 1
Spain 43e9f802-e1e7-4a9c-9d0a-35fbc2b1baa6 1

Just to demonstrate how useful this is you can try to run traverse_tree.pl and look at the output. $ wget -q -O - http://files.smidsrod.no/traverse_tree.pl.txt | perl 33c70c03-fddb-4c0e-b275-b33cd4c54c8a America f2880759-6b8e-494f-add9-69edfad07fab   United States 8a2dae2e-102c-40b6-a82c-5d525729c0f6   Mexico e41cc9bb-4bc4-41d6-9cb8-7f3aa1fd6217   Brazil d435ab7f-b47e-4ebc-9fd6-0a0d48e46190 Asia 728bac87-9b3f-4e91-b76b-00a185555efb   China a64255cb-d1b2-41ad-a053-b38b305244ec   Japan 2fce9c47-936b-41c4-8b96-6b1e0c05ba1d Europe 1ebeb84c-38de-4748-badb-ef726680eb1a   Norway 8c971775-edf7-4884-ad39-e9a58fd172c8     Vestfold b7e39443-16d6-4a99-911f-af330e9ded7d       Sandefjord 7b3c12cf-ec01-4304-8d77-16e1c81c1168       Tønsberg 0384b300-e638-4f88-802a-69762dd08e00     Hordaland 5bc7c3d6-f23e-4848-864e-b7bd148e154b     Nordland ef19a5ca-01fa-405a-a437-3d1a54570fbf   France 43e9f802-e1e7-4a9c-9d0a-35fbc2b1baa6   Spain Voila! Problem solved with only one query transmitted to the database that does all the work. Next week I'll show you how to convert this dataset to an HTML unordered list.