Recently I have been doing some work to analyse text stored in a Postgres database and determine if it matches certain keywords and patterns. The built in FTS (full text search) capabilities in Postgres are quite powerful, but unfortunately are not easily customizable for custom word/token patterns. This is where a custom FTS dictionary called dict_regex comes in handy. It allows you to specify token parsing criteria using Perl compatible regular expressions.
Unfortunately, the module is not part of the standard contrib and has to be built from source. The following documents my attempts to build and use the module on Ubuntu 9.10.
First, make a scratch directory and download the source for Postgres
$ mkdir postgres_build $ cd postgres_build $ sudo apt-get source postgresql
Now download all the build dependencies for postgresql
$ sudo apt-get build-dep postgresql
You will also need to install the following additional dependencies
$ sudo apt-get install libpcre3 libpcre3-dev libreadline-dev \ postgresql-client-common postgresql-common
Enter the postgres source code tree
$ cd postgresql-8.4-8.4.2/contrib
Download and unpack the source for the dict_regex contrib module
$ wget http://vo.astronet.ru/arxiv/dict_regex.tgz $ tar -zxvf dict_regex.tgz
Patch the contrib/Makefile and add dict_regex as shown
--- Makefile.old 2010-01-25 02:21:23.415229876 -0700 +++ Makefile 2010-01-25 02:21:51.122731769 -0700 @@ -14,6 +14,7 @@ cube \ dblink \ dict_int \ + dict_regex \ dict_xsyn \ earthdistance \ fuzzystrmatch \
Patch the debian/postgresql-contrib-8.4.install file as well and add a line for dict_regex
--- postgresql-contrib-8.4.install.old 2010-01-25 02:30:31.892731877 -0700 +++ postgresql-contrib-8.4.install 2010-01-25 02:30:52.925231039 -0700 @@ -35,6 +35,7 @@ usr/lib/postgresql/8.4/lib/uuid-ossp.so usr/lib/postgresql/8.4/lib/test_parser.so usr/lib/postgresql/8.4/lib/dict_int.so +usr/lib/postgresql/8.4/lib/dict_regex.so usr/lib/postgresql/8.4/lib/dict_xsyn.so usr/lib/postgresql/8.4/lib/auto_explain.so usr/lib/postgresql/8.4/lib/pg_stat_statements.so
Rebuild the modified source (make sure you cd to the main postgres source directory)
$ dpkg-buildpackage -rfakeroot -b
Post a Comment