Here on the blog, we’ve featured a series of datasets which might be interesting to you, such as the British Library’s British National Bibliography. We’ve also visited the work of our Data Team in finding and improving published data and making them available through Kasabi. One thing I haven’t talked about here is some of the reasons for publishing quality data, and making it not only available, but useful.
Making data available online is something many different organisations have been exploring, for quite diverse reasons, but one question seems to surface during any discussion about publishing data: How useful will this data be? Developers discuss the schemas and intricacies, project managers ask about the measured benefits, and civil servants try to work out how to meet their remit by opening up public data that they’re looking after. But every topic comes back to the reuse of the information.
One way to model this problem is to use Sir Tim Berners-Lee’s star-rating system for good-quality practices in publishing data. Sir Tim discussed this idea at the Gov 2.0 Expo in 2010, and it’s worth watching the video:
The Talis Consulting blog introduced the star-ratings last spring, and I’ve virtually stolen their summary to give you a quick-glance overview of the 5*’s of publishing data:
- On the web, open licensed: get your data out there, in any form, for others to use under an open license, such as the Open Government License for Public Sector Information—clear and unambiguous. For many, this is one of the significant steps, because it often includes the convincing of others that this might be a good idea.
- Machine-readable data: make the data you have just published readable by software. If it was a spreadsheet that you previously published as a nicely formatted pdf—make the Excel file available in addition.
- Non-proprietary format: publish a csv file as well, then it can be used in software and applications different from those Microsoft ones.
- Linked Data: start using URIs as identifiers—and publishing in RDF format. This step is another that needs a bit more thought as to how you are going to describe your data
- Do the Linking: link, or use, identifiers in your data to identifiers published out on the wider web of data. For example, if you are using a UK post code, why not use the Ordnance Survey URI (http://data.ordnancesurvey.co.uk/id/postcodeunit/WR112RE)?
The star-rating system seems to apply very readily to the public sector, where there is a responsibility on a body to make its information available online, and accessible. This is an area we’d love to spend some more time exploring, because Kasabi has been built with the intention of making data as usable as possible, and the system is designed to publish 5* data.
A good example of this is Ordnance Survey, which has been a pioneer of publishing its information as Linked Data. Other colleagues of mine worked with the OS on their project (more information here), and there are also some OS sets available in Kasabi. They’ve been highly linkable, and usable as John Goodwin demonstrates with his “linksets,” that match up OS data with other information to provide a reusable hub of data about a specific area:
Have Data? Drop Us a Line
My Kasabi-colleagues Tim, Alison and Rob have all worked on public-sector data projects, and we are looking for more projects to get our teeth into. We’re looking for more data to work with, and to talk with original source of data to discuss their own efforts in publishing public information. In particular, if you are wondering how to get your data up to Sir Tim’s 5*s, drop Alison an email at firstname.lastname@example.org, or leave a comment below!