An interview with John Goodwin
John has been working with semantic web technologies for a number of years, and is deeply involved in the Ordnance Survey’s recent adoption of Linked Data. Recently, John has been exploring the features in the Kasabi beta, so we caught up with him to find out more about his experience.
Can you tell us about your involvement with Linked Data at the Ordnance Survey?
Some years ago, my line manager told me he’d heard of this thing called the ‘semantic web’ and asked me to have a look to see if was of any potential relevance to Ordnance Survey.
Initially we weren’t sure how useful RDF (Resource Description Framework) would be for spatial data, because there were no mechanisms at the time for performing the sort of quantitative queries that could be done in everyday GIS software. However, we realised that you could still do a lot of useful things in small to mid-scale data using qualitative queries—such as creating statements about the explicit spatial relationships between different regions.
The administrative geography of Great Britain offered an interesting set of data with which to experiment. It was a well defined geography containing regions with well defined boundaries. We made an initial RDF dataset that contained explicit containment information like: “Hampshire contains Winchester”. We provided a static RDF file on the research website that was freely available for download and non-commercial use.
On the 1st of April last year, after some time down the triple mines, OS OpenData™ was launched, replacing the simple static RDF file.
In what way does the OS location data add useful context to other datasets?
This offers great potential to data publishers. By linking to identifiers for places and postcodes in your data, you can enrich the information you hold.
Imagine you have a list of schools and their postcodes. By connecting to the URIs for those postcodes, you have a whole new way to view and analyse your data. Through that link, you now know the ward, district and county those schools are in. This offers the potential to, for example, aggregate statics about a fine grain geography like postcodes up to coarser grain geographies like wards or districts. This is a very simple example of how merging two sets of Linked Data can deliver benefits.
Another benefit is that “everything happens somewhere,” and a lot of data has some kind of location element to it. This makes location one of the key data integration hubs. The combination of Linked Data technology and location as an information hub should enable serendipitous discovery of new information that would previously have been very difficult with more traditional technologies.
How do you see platforms like Kasabi helping to make data more useful or accessible?
Some developers can be put off using Linked Data because they consider SPARQL to be a steep learning curve, though in reality it is no harder, and possibly even easier, than SQL. So Kasabi provides a neat way to create simple RESTful APIs that can be mapped onto potential complex SPARQL queries. This means we can provide ‘web 2.0 developers’ with RESTful access to Linked Data through a fairly simple API.
Another benefit is for Linked Data users. Linked Data can sometimes be hard to consume—you either have a choice of downloading a dump of the whole dataset, or programmatically accessing individual URIs. Kasabi’s custom APIs provide a useful way of pulling out subsets of data that might be needed for particular applications.
You’ve been very active in exploring the features in the early Kasabi beta. Can you tell us about what you’ve been working on?
I started simply by experimenting with the custom APIs. I then started to experiment with more interesting (and arguably more useful) ones. The first of these was ‘select postcode by region’. Here the OS region identifier is the parameter in the API call, and a list of postcodes for that region is returned in KML. Kasabi provides the ability to transform results returned in RDF/XML to a given schema or standard (e.g. KML). This is very neat.
I then worked with spatial queries. I created an API that would take a region identifier and spatial relation, returning the identifiers of regions related to the original by specific spatial relation. For example: ‘return me everything that touches Southampton’’.
It then occurred to me that these latter types of queries could be of use to Linked Data developers as well. For example, we could provide APIs short cuts that ‘describe’ everything in Southampton and return RDF. This provides a neat way of ‘cookie cutting’ the Ordnance Survey Linked Data by region allowing developers to pull out just that data.
You’ve created two new datasets, containing hyperlocal information about Southampton and Hampshire. How did you go about creating them?
Following on from experiments, I wondered how easy it would be to ‘cookie-cut’ other datasets—some in Kasabi, and some not—to created integrated views of particular regions…or hyperlocal datasets. The simple combination of linked to OS data with custom APIs is a big win for Linked Data consumers.
The initial data gathering provided with a list of the things ‘within’ Southampton, and a list of triples stating that information. There was also a link from some of those things to their postcode. In order to add some extra polish to the data, I had to compute reciprocal relations (e.g. say that Southampton ‘contains’ various objects) and also provide outward links from postcodes. So where we have a triple that states that a school has a particular postcode, I added in a triple to say that that postcode was the postcode of the school.
So, for each of the regions, we have a list of: airports, bus stops, stations, schools, GPs, hospitals, renewable energy generators, heritage sites, postcodes, councillors, crime statistics, administrative regions and postcodes… and I could list more! Each important element, like schools or hospitals, are linked to a postcode, district, ward—and in the case of Hampshire—county.
Hyperlocal sets like this make the potential to create the (in)famous postcode paper a reality. So I really can put my postcode in, and a variety of data comes out.
Many of the applications are simple, such as: ‘find me my nearest school’, and most can be done in the original datasets. Where it gets more interesting is when we mix data from various sources: ‘find me GPs in my ward, and all the bus stops within a 100 metre radius of those GPs’.
These are just two possibilities, and no doubt other people will come up with far more interesting use cases.
From Leigh Dodds, Kasabi’s project manager
John’s experience illustrates some of the essential features of Kasabi. The ability to share expertise by publishing useful queries and creating APIs to offer access to data in new ways has the potential to make hiqh-quality data even more valuable. Kasabi is not just a data publishing platform, but also a way for a community to find explore new ways to interact with data.
Mixing together data from different sources, e.g. to create the “hyperlocal” datasets as John as done, is a powerful feature of the underlying Linked Data platform upon which Kasabi is built.