Last week, I had the fun of giving a very quick talk and a demo of Kasabi in Nottingham. I was at the Open Source GIS conference, and the track covered open government data. I also had the pleasure of being able to listen to the session ahead of my talk.

Dr Mooney and Professor David Martin covered a wide topic of opening up data. We got a full view; from data being made public and its ramifications, to how it’s being used. Several difficulties with using data also emerged, and gave me quite a bit to ponder for my talk, which was handy.

Some of the problems seemed to centre around data that can be found, but its usage is unclear. As a researcher or developer, it can be difficult to work out whether published information can be used (or reused), which seriously hinders the projects, findings, and results of this published data. This seemed particularly interesting with governmental data: which bits of this information is part of an “Open Government” initiative, and which have I just stumbled across and am technically not allowed to use? The other topic coalesced into discussing data that is published but difficult to use. The structure, provenance, completeness, time of publishing and simple formats all become problems when they compromise the developer’s plans.

This very nicely set the context for my talk on curation and publication. Because the talk time was very short, I really wanted to simply cover the idea that data is more useful when it’s looked after. Data which is curated requires forethought, updating, and consideration of usage. Below are a few slides which outline my thinking on this. The demo of Kasabi also slotted into this context as I was able to—very quickly—show some of our tools and concepts to make publishing and curation of data easier.

My next few weeks seem to follow this line of thought. I am on my way, as I write, to the Open Knowledge Conference in Berlin to catch up with the Open Knowledge Foundation community. Leigh recently wrote a piece for the OKF blog which provides a clear picture of our perspective on open data, so it’s worth a read.

Also, as blogged about here earlier, the first public event we’re hosting will be hacking on top of Open Government data. There’s info in that piece about the event (27 July in London), but I’d be interested in hearing about this topic more beforehand too.

What data do you see as missing for building on top of? What, if not published, hinders your project or plans?

Also, if you’re in Berlin over the next few days, drop me a line and we’ll find some Kaffee und Kuchen or Weissbier. :)

