Among the long list of new features of PostgreSQL 9.1 the new SQL/MED implementation is probably of the most underrated. SQL/MED is an extension of the SQL standard defines foreign data wrappers (FDW) that allow you to reach data located outside of your database, with regular SQL request (MED stands for Management of External Data). In a nutshell, SQL/MED is cool.

As far as i know, PostgreSQL and DB2 are the only major RDBMS providing an implementation of this standard. The beauty with PostgreSQL is that you can write your own data wrappers to connect your database to any storage you want ; other RDBMS, NoSQL storages, web services, whatever system that holds data and can deliver it.

Basically to Write a Foreign Data Wrapper you need to code six callback routines in C to explain how to plan and execute queries toward your new source of data.

PostgreSQL 9.1 has been released a few month ago, but there's already a bunch of FDW available (even though most of them are still beta). At the time, I'm writing this we already have http://wiki.postgresql.org/wiki/For..., so you can connect your PostgreSQL server to ; Oracle, MySQL, SQLlite, anything providing an ODBC driver, LDAP, couchdb, redis, Amazon S3 storage, CSV files, Twitter, Google Search, HTML pages, RSS feeds, ....

For an almost complete list, check out the PostgreSQL wiki ; http://wiki.postgresql.org/wiki/For...

And of course these wrappers are packaged as extension and most of them are available on PGXN and quite easy to install : http://pgxn.org/tag/fdw/

But there's more ! Among these FDW there's one called Multicorn. This particular extension allows you to write FDW in Python. A wrapper over another wrapper. Basically this means you don't have to know C to write your own FDW and you can use high-level libraries from the Python world.

Ok let's take a simple example. I'd like to query data contained in a web calendar like Google Calendar. Those data are available in ical files, with a specific format called iCalendar. So want i need to do is explain PostgreSQL how to reach and parse the files.

With Python and Multicorn, this is done with 20 lines of code : see icalfdw.py

And that's it.

Of course this is a basic implementation. We can do better with some optimizations and a few log handling. It's also important to keep in mind that foreign data wrappers have strong limitations... I'll talk about all that and other things in my next blog post.

In the meantime, if you're in Brussels this week-end for FOSDEM you can come to the PostgreSQL Devroom, I'll make a presentation of Multicorn along with its creator.



blog comments powered by Disqus