February 8th, 2004 5:05 PM

The Getty does some strange work. See search results for abandoned complexes in the United States from the Getty Thesaurus of Geographic Names as an example.

The TGN is a structured, world-coverage vocabulary of 1.3 million names, including vernacular and historical names, coordinates, and place types, and descriptive notes, focusing on places important for the study of art and architecture.

It’s fascinating. And there’s a lot of it. You could lose yourself for days performing bizzare searches.


this is awesome! and exactly the type of thing that can be very helpful in geo-tagging photos. like i was telling you earlier, with a database like this you can just stick in a city or geographic feature name into your RDF rather than have to figure out GPS coordinates for every photo. with getty’s hierarchy information, you can even try implementing hierarchical searches (i.e. search for “california” and get back photos tagged as “los angeles”)… now we just need a consistent rdf vocabulary for this dataset (a la wordnet).

Posted by: gary on February 9th, 2004 7:51 PM

Except it’s never that simple. If you drink the RDF/SemWeb kool-aid, you can’t “just stick in a city or geographic feature name,” because that is a dead end as far as RDF goes. Before you can do that (“a la wordnet”), you need an RDF equivalent of the Getty database that defines URIs for each location, annotated with geographic data. (The order in which this happens is important.)

Just like a lot of things, the Getty provides the raw data, but getting it into a usable form for the Semantic Web takes a lot of effort.

Posted by: kasei on February 9th, 2004 8:04 PM

I was going to say that you could probably mine the TGN site for this data in an annotation tool, and then drop it into triple-space, but since the TGN site doesn’t validate as any version of HTML, that might be a risky thing to rely on.

Posted by: kasei on February 9th, 2004 8:14 PM

hence i said, “now we just need a consistent rdf vocabulary for this dataset”. i realize that it’s useless to put the place name into the RDF as a string. the getty dataset needs an RDF-defined vocabulary so that we can identify places with URI’s. while it will definitely take some thought, i don’t think a first version would be too difficult.

Posted by: gary on February 9th, 2004 8:16 PM

while the getty’s web interface isn’t very conducive to extracting information out of, a first attempt could be to just define an rdf vocabulary that would map to specific records in the getty database without necessarily having an rdf representation of the actual records. this is not to mention that the getty’s terms of use might not be too friendly to constant querying against their database.

Posted by: gary on February 9th, 2004 8:20 PM

Before you try to figure out how to interface with their web site, what about just asking Getty if they will let you use the data, and possibly receive it in database form?

Posted by: Traveler on February 10th, 2004 1:47 PM

there’s a good chance they will. here’s what their website says about it:

The Getty vocabularies are made available on the Web to support limited research and cataloging efforts. Companies and institutions interested in regular or extensive use of the vocabularies should explore licensing options by contacting the Vocabulary Program. The data of the vocabularies may be licensed to both commercial and non-profit organizations. Data are available in flat files (ASCII or USMARC) and in XML. Note that these are data files only—there is no user interface.

however, while better than nothing, that’s a static view of the data. their database is being updated weekly, so it’d be nice to be able to interface to the live data directly.

Posted by: gary on February 10th, 2004 8:53 PM