Museum Web Developers

March 10, 2011

Update on collections data and geocoded NRM data

Filed under: collections,data,requestforcomment — mia @ 6:05 pm

I’m glad to see the news about the release of objects from the collections of the Science Museum, the National Media Museum and the National Railway Museum has spread so far and wide already.

A few people have commented on the licence (Creative Commons Attribution-NonCommercial-ShareAlike, CC BY-NC-SA) and on the format (CSV).  As tomorrow is my last day, I can’t really speak for the museum but the intention is to learn from how people use the data – the things they make, the barriers they face, etc – and iterate (as resources allow) until we get to an optimal solution (or solutions). So please get in touch if you’ve got requests or think you can help clear up some of the issues these kinds of projects face, because there’s a good chance you’ll help make a difference.

The licence is a pragmatic solution – it’s clarification of existing terms rather than a change to our terms, because this avoided a need for legal advice, policy review, etc, that would have added several months to the process.

And yes, I know CSV is quick and dirty, but it’s effective. The museum sector is still working out how to match the resources available with the needs of mash-up type developers who work best with JSON and those who are aiming for linked open data; my hope is that your feedback on this will help museums figure out how to support people using open data in various forms. A simple solution like this also means it’s easy for the museum to re-run the export to update the data as time goes on, and that anyone, geek or not, can open the files without being startled by angle brackets and acronyms. Also, did I mention it was quick?

Finally, we’ve already had some useful feedback and even some improved files. Richard Light sent us a geocoded version of records from the National Railway Museum (NRM) (index of locations: http://api.sciencemuseum.org.uk/collections/updates_from_other_people/Richard_Light/nrm-geo-sort.xml (63kb), full file http://api.sciencemuseum.org.uk/collections/updates_from_other_people/Richard_Light/nrm-geo.xml – 20mb, browser-beware).

I’ll let Richard explain in his own words:

I converted the source CSV to XML using my CSV Converter program, which is a home-made program I wrote to do a “mail-merge” on CSV data, with the aim of easily generating other formats such as XML.

The geocoding was carried out by calls to my place URL-ifier program. This uses the standard Geonames query API, but splits a place description into its component place names (e.g. “Swindon, Wiltshire, England” becomes three place names) and searches for a “Swindon” contained within places “Wiltshire” and “England”.

I wrote an XSLT transform which copied the source document, and each time it found a place field, it called out to my URL-ifier using the document() function:

<xsl:template match=”PLACE_MADE[text()!='']“>
<xsl:variable name=”geonames”
select=”document(concat(‘http://light.demon.co.uk/scripts/getPlaceURL.exe
?amp;q=’, text()))/*/text()”/>
<xsl:copy>
<xsl:if test=”$geonames!=””>
<xsl:attribute name=”geonamesId”><xsl:value-of
select=”$geonames”/></xsl:attribute>
</xsl:if>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>

Where this was successful in inferring a Geonames identifier, it added a “geonamesId” attribute to the PLACE_MADE field. So the result is a copy of the source data, with added geocoding.

All of the NRM data was geocoded in a single XSLT operation, but this operation had to call my URL-ifier, and hence the Geonames API, many times. There are limits on how hard you can hit this service, so care needs to be exercised! (You can get your own Geonames identifier for free, and then have your own allocation of API calls, if you want to use this service in a serious way.)

Now that the data contains Geonames URLs, you have access to all the background information about each place. All Geonames entries have lat/long co-ordinates (which is what you need to stick a pin on a map in your browser, using e.g. KML markup), but in addition will often have info such as population. You just need to make an HTTP request for the Geonames URL, specifying that you want RDF back, e.g.: http://light.demon.co.uk/scripts/cgiforwarder.exe?url=http://sws.geonames.org/2633352/&accept=rdf and process the RDF/XML which comes back.

Personally, this kind of thing makes it all worthwhile – we can’t easy export our entire geographical hierarchy, so being able to geocode the imperfect data we have is really useful.

If you’ve done something interesting with our data we’d love to feature it. We’re also curious to know who’s having a look at it, even if you’re not at the point of having something to share.

Finally, I’d almost forgotten to thank the many wonderful people who’d contributed to the Museums and the machine-processable web site or come along to #linkingmuseums meetups to work out how to get to re-usable museum data. I’ll be keeping up the wiki in future, and can be contacted @mia_out.

September 10, 2009

Working out collections online – your questions?

Filed under: collections,design,requestforcomment — mia @ 8:59 pm

I’ve been slowly putting together a list of research questions to try and tackle as I re-work our collections online (with our very own blogger, the transport curator David Rooney) and the ‘Online Stuff‘ section.  I’ll write up the process and my ideas as I go, but in the meantime – what’s your number one question about presenting museum collections online?  It could be ‘does x work better than y’, or ‘do people really want z’, or anything that’s been hanging around at the back of your brain. Leave a comment below or tweet @mia_out.

And speaking of collections online, check out the V&A’s Collections Search, just out in beta today.  There’s so much to explore and the interface is a delight.  Congratulations to all involved!

April 24, 2009

Some on-going work on museum APIs

Filed under: API,design,requestforcomment — mia @ 10:20 pm

Just a quick note to say we haven’t abandoned this blog, but at the moment I’ve been concentrating on working out issues around schemas/formats, content, and functions for re-usable and interoperable cultural heritage data on a wiki.

There’s a list of things you can do if you work in a museum or are a developer interested in using museum data – jump in!

March 5, 2009

The great API challenge

Filed under: API,crowdsourcing,design,requestforcomment — mia @ 7:12 pm

Another MCG (museums computer group) discussion list post repurposed as a blog post… In a discussion about the Brooklyn Museum API following on from discussion of the NMOLP ‘Creative Spaces’ project, Richard Light asked:

Don’t we need a standard for what a museum API looks like, and what it delivers? Even better, shouldn’t we stop thinking that we need to invent everything we use, and just adopt something like the Linked Data paradigm?

I quickly checked with Daniel, our head of web, that it was ok for me to throw this open to the world, and posted in response:

Science Museum is looking at releasing an API soon – project-specific to start with, but with the intention of using that as an iterative testing and learning process, and I’d be happy to talk to other museums about what they’re doing to try and come up with something with at least some core similarities in the schema and functionality. Anyone up for it?

So, are you up for it? I’ve had a few good responses already. My vague idea is maybe using digitalheritage.ning.com to share data schemas, API functionality, discuss the various acronyms we’re using, etc.

You can leave a comment here, or join the ning, or @miaridge on twitter.

Competitions using APIs – any resources

Filed under: competition,mashups,requestforcomment — mia @ 7:07 pm

The original impetus for creating this blog was to provide somewhere to talk about our plans, ask for feedback, and generally make the process of running a mashup competition using a set of object data created for an exhibition really transparent.

The project is close to signed-off, and I’ll go into more detail then, but in the meantime, here’s a post I sent to the MCG (museums computer group) email list:

Does anyone have good examples, bad examples, personal experience, whatever, on competition models, licensing, preservation, timelines, platforms, other public domain data sources, visualisation tools, etc? You can email me offlist if that’s easier, I can post a compiled list back here.

I was at JISC’s recent dev8D event and got some good ideas there, and I’m happy to share the research I’ve already done if anyone is interested.

Powered by WordPress

Switch to our mobile site