Feature for SPARQL 1.2

July 6th, 2017 6:15 PM

JindÅ™ich Mynarz recently posted a good list of “What I would like to see in SPARQL 1.2” and I thought I’d add a few comments as well as some of my own wished-for features.

Explicit ordering in GROUP_CONCAT, and quads support for both the HTTP Graph Store Protocol and CONSTRUCT queries (items 2, 5, and 8 in JindÅ™ich’s list) seem like obvious improvements to SPARQL with a clear path forward for semantics and implementation.

Here are the some of the other wished-for features:

  • Explicitly specify the REDUCED modifier (#1)

    As an implementor, I quite like the fact that REDUCED is “underspecified.” It allows optimization opportunities that are much cheaper than a full DISTINCT would be, while still reducing result cardinality. I think it’s unfortunate that REDUCED hasn’t seen much use over the years, but I’m not sure what a better-specified REDUCED operator would do different from DISTINCT.

  • Property path quantifiers (#3)

    The challenge of supporting path quantifiers like elt{n,m} is figuring out what the result cardinality should be. The syntax for this was standardized during the development of SPARQL 1.1, but we couldn’t find consensus on whether elt{n,m} should act like a translation to an equivalent BGP/UNION pattern or like the arbitrary length paths (which do not introduce duplicate results). For small values of n and m, the translation approach seems natural, but as they grow, it’s not obvious that use cases would only want the translation semantics and not the non-duplicating semantics.

    Perhaps a new syntax could be developed which would allow the query author to indicate the desired cardinality semantics.

  • Date time/duration arithmetic functions (#6)

    This seems like a good idea, and very useful to some users, though it would substantially increase the size and number of the built-in functions and operators.

  • Support for non-scalar-producing aggregates (#9)

    I’m interested to see how this plays out as a SPARQL extension in systems like Stardog. It likely has a lot of interesting uses, but I worry that it would greatly complicate the query and data models, leading to calls to extend the semantics of RDF, and add new query forms, operators, and functions.

  • Structured serialization format for SPARQL queries (#10)

    I’m indifferent to this. I suspect some people would benefit from such a format, but I don’t think I’ve ever had need for one (where I couldn’t just parse a query myself and use the resulting AST) and it would be another format to support for implementors.

Beyond that, here are some other things I’d like to see worked on (either standardization, or cross-implementation support):

  • Support for window functions

  • Explicit support for named graphs in SERVICE blocks

    This can be partially accomplished right now for hard-coded graphs by using an endpoint url with the default-graph-uri query parameter, but I’d like more general support that could work dynamically with the active graph when the SERVICE block is evaluated.

  • Structured errors for use in the SPARQL Protocol

    My preference for this would be using the RFC7807 “Problem Details” JSON format, with a curated list of IRIs and associated metadata representing common error types (syntax errors, query-to-complex or too-many-requests refusals, etc.). There’s a lot of potential for smarter clients if errors contain structured data (e.g. SPARQL editors can highlight/fix syntax issues; clients could choose alternate data sources such as triple pattern fragments when the endpoint is overwhelmed).