Museum Web Developers

March 9, 2011

Collections data published

Filed under: collections,data — mia @ 8:10 pm

I’m very excited about sharing this with you – we’ve just released 218,822 records about objects from the collections of the Science Museum, the National Media Museum and the National Railway Museum.

The collections include objects relating to aeronautics, agriculture, astronomy, cinematography, medicine, materials, space, television, time measurement, transport and more. They range in size from contact lenses to Concorde 002.

We’ve released the files as a lightweight experiment – we’d like to understand whether, and if so, how, people would use our data. We’d also like to explore the benefits for the museum and for programmers using our data – your feedback will inform decisions about future investment in more structured data as well as helping shape our understanding of the requirements of those users. The files are in CSV format – because it’s a really simple format, viewable in a text editor, we hope that it will be usable by most people.

We’ve published three data sets:

  • 218,822 object records
  • 40,596 media records
  • 173 event records

The files are released under the Creative Commons Attribution-NonCommercial-ShareAlike (CC BY-NC-SA) licence. Please get in touch if you’ve got ideas that require a commercial licence.

The files are available at
Documentation for collections data from Science Museum, National Media Museum, National Railway Museum (NMSI) released as CSV. This page includes information about the fields available and the collections included.

The documentation page includes contact addresses, or you can leave a comment below.

1 Comment »

  1. Some notes on my experiences on importing the data:

    * The CSV files appear to be valid (yay) – the Ruby FasterCSV library (which is apparently fairly strict) didn’t seem to choke on them at all.
    * If you’re importing them as UTF-8, there seem to be some illegal characters (which caused me an issue when importing into a PostGres database). I passed them through the Iconv conversion library to convert them to valid UTF-8.
    * The occasional field is really long, so if you set up your database using string (VARCHAR) fields you may end up either truncating them or hitting errors (depending on your DB behaviour). I ended up switching to ‘text’ fields for all the fields apart from collection and ‘whole_part’.
    * The name is often blank, and the title is often blank, so you can’t rely on either being present in your views. I wrote a quick helper which displays title if present, otherwise name. There are also a small number of fields with neither a name nor a title! In these cases, you may want to show a truncated form of the description.
    * I didn’t find any duplicate object_ids, but I didn’t enforce this uniqueness in the database either (just in case).

    Frankie

    Comment by Frankie Roberto — March 11, 2011 @ 12:51 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Comment Spam Protection by WP-SpamFree

Powered by WordPress

Switch to our mobile site