Thoughts on Named Graph Queries in SPARQL

November 30th, 2007 3:58 PM

I was talking with Li a while back about complex SPARQL queries, and a few interesting issues were raised that I thought I’d write down for future reference.

We agreed that neither of us were aware of a SPARQL package that was designed specifically to handle a large number of FROM NAMED clauses (hundreds or thousands), although many packages seem to handle this OK when the URIs are all on the local network. I haven’t looked into the issue in any depth, but I’d be interested in seeing some test results on how the various implementations handle this. (Mine would certainly perform miserably at this task.) If the future holds lots of small RDF documents (from, for example, the linking-open-data project), then SPARQL engines will have to be able to handle queries with a large number of external data sources. An interesting approach (maybe somebody is already doing it?) might be highly parallelizing the model construction of named graphs.

The other issue that arose was the need for deeper support of named graphs. SPARQL endpoints are useful not just as a direct mediator between triplestore and client, but themselves as a data source for other SPARQL endpoints (via CONSTRUCT). The problem is, named graphs are only really useful within one query. Once you CONSTRUCT a result graph, you lose the names. What might be useful is to natively support a serialization format that supported named graphs (TriG, TriX) allowing CONSTRUCTion of named result graphs using syntax similar to:


CONSTRUCT { GRAPH ?g { ?s ?p ?o } } WHERE { ... }

This would allow multiple named graphs to be loaded from a single call to a SPARQL endpoint, although there would need to be clarification on the semantics of FROM NAMED <uri-with-named-graphs> — Does <uri-with-named-graphs> disappear as a graph name, replaced by its sub-graph names? Do the constituent triples belong to both <uri-with-named-graphs> and one or more of its sub-graphs? Might we need a new variant of FROM NAMED to differentiate this type of named-graph-serialized data? Perhaps it could just be used with a FROM clause, the named serialization being enough to indicate that graph names should be preserved in the query.

Comments

Have you ever heard about Networked Graphs? Maybe it solves your issue.

Posted by: Josef Petrák on November 30th, 2007 5:39 PM