RDF and Perl

January 5th, 2008 11:00 PM

I’ve been committing a lot of code recently to the perlrdf project, and thought an update was in order.

RDF::Query development has been moved to the perlrdf google code svn repository, and some major reorganization of the code has taken place since the last release. Unfortunately, some of the changes have introduced backward-incompatabilities (discussed in the next post), so this work may be leading to a 2.0 release sometime in the future. Most of the changes have to do with breaking off logically independent code into separate packages and defining clean APIs between them (all packages mentioned are part of the perlrdf project):

  • RDF::Trine is my new umbrella package for RDF toolkit code. This includes the usual classes for nodes, statements, namespaces, iterators, parsers, triplestores, etc. Some of these are discussed below.

  • RDF::Trine::Iterator - a set of iterator classes for representing graphs, bindings, and boolean results (the booleans are pretty much solely for SPARQL’s benefit). This was initially meant to be strictly a set of iterators for SPARQL results, but I found it generally useful for the whole stack, from the triplestore all the way up to the query results.

  • RDF::Trine::Store::DBI - a RDBMS-based triplestore supporting MySQL or SQLite using the Redland schema and borrowing heavily from the RDF::Query::Compiler::SQL class to allow querying for BasicGraphPatterns in one database roundtrip. The interface also allows for result ordering to be done by the database, resulting in much faster execution for many simple SPARQL queries.

    In the future, I hope to roll more of the SQL compiler into this package, to allow more complex all-at-once queries (e.g. supporting some GroupGraphPatterns, FITLERs, etc.). I also hope to take a cue from rdflib by allowing predicates’ rdfs:range to restrict the number of JOINs needed in the underlying SQL generation.

  • RDF::Trine::Parser - a pure-perl Turtle parser (with RDF/XML and TriX parsers hopefully on the way).

  • RDF::Trine::Namespace - a class for namespace-based syntax shortcuts (like $foaf->name).

  • RDF::Endpoint - a SPARQL endpoint using RDF::Query and RDF::Trine, able to run as a standalone server or a CGI (with Apache mod_perl support planned).

All of this code is under development, but I’d welcome anyone to give it a try and provide feedback. I’m hoping that by moving the development to a public SVN server, more people will be able to see and contribute to the code. Finally, I’m hoping to get the code into shape for release it to CPAN soon.