Property Path use in Wikidata Queries

September 28th, 2018 9:06 AM

I recently began taking a look at the Wikidata query logs that were published a couple of months ago and wanted to look into how some features of SPARQL were being used on Wikidata. The first thing I’ve looked at is the use of property paths: how often paths are used, what path operators are used, and with what frequency.

Using the “interval 3” logs (2017-08-07–2017-09-03 representing ~78M successful queries1), I found that ~25% of queries used property paths. The vast majority of these use just a single property path, but there are queries that use as many as 19 property paths:

Pct. Count Number of Paths
74.3048% 58161337 0 paths used in query
24.7023% 19335490 1 paths used in query
0.6729% 526673 2 paths used in query
0.2787% 218186 4 paths used in query
0.0255% 19965 3 paths used in query
0.0056% 4387 7 paths used in query
0.0037% 2865 8 paths used in query
0.0030% 2327 9 paths used in query
0.0011% 865 6 paths used in query
0.0008% 604 11 paths used in query
0.0006% 434 5 paths used in query
0.0005% 398 10 paths used in query
0.0002% 156 12 paths used in query
0.0001% 110 15 paths used in query
0.0001% 101 19 paths used in query
0.0001% 56 13 paths used in query
0.0000% 12 14 paths used in query

I normalized IRIs and variable names used in the paths so that I could look at just the path operators and the structure of the paths. The type of path operators used skews heavily towards * (ZeroOrMore) as well as sequence and inverse paths that can be rewritten as simple BGPs. Here are the structures representing at least 0.1% of the paths in the dataset:

Pct. Count Path Structure
49.3632% 10573772 ?s <iri1> * ?o .
39.8349% 8532772 ?s <iri1> / <iri2> ?o .
4.6857% 1003694 ?s <iri1> / ( <iri2> * ) ?o .
1.8983% 406616 ?s ( <iri1> + ) / ( <iri2> * ) ?o .
1.4626% 313290 ?s ( <iri1> * ) / <iri2> ?o .
1.1970% 256401 ?s ( ^ <iri1> ) / ( <iri2> * ) ?o .
0.7339% 157212 ?s <iri1> + ?o .
0.1919% 41110 ?s ( <iri1> / ( <iri2> * ) ) / ( ^ <iri3> ) ?o .
0.1658% 35525 ?s <iri1> / <iri2> / <iri3> ?o .
0.1496% 32035 ?s <iri1> / ( <iri1> * ) ?o .
0.1124% 11889 ?s ( <iri1> / <iri2> ) / ( <iri3> * ) ?o .

There are also some rare but interesting uses of property paths in these logs:

Pct. Count Path Structure
0.0499% 5274 ?s ( ( <iri1> / ( <iri2> * ) ) / ( <iri3> / ( <iri2> * ) ) ) / ( <iri4> / ( <iri2> * ) ) ?o .
0.0015% 157 ?s ( <iri1> / <iri2> / <iri3> / <iri4> / <iri5> / <iri6> / <iri7> / <iri8> / <iri9> ) * ?o .
0.0003% 28 ?s ( ( ( ( <iri1> / <iri2> / <iri3> ) ? ) / ( <iri4> ? ) ) / ( <iri5> * ) ) / ( <iri6> / ( <iri7> ? ) ) ?o .

Without further investigation it’s hard to say if these represent meaningful queries or are just someone playing with SPARQL and/or Wikidata, but I found them curious.

  1. These numbers don’t align exactly with the Wikidata query dumps as there were some that I couldn’t parse with my tools. ↩︎