There was an interesting post published on Programmable Web yesterday about how Foursquare are developing their platform. The most important aspect of that is this goal:
to make Foursquare the missing “Rosetta Stone for location, allowing you to link information about a real-world place from one database to any other.” Now it’s not just about using Foursquare, but connecting it to other services.
This is interesting as it’s another example of a trend that I’ve been expecting for some time: that the current crop of API/data providers will recognise the power of being able to connect different databases through shared identifiers. Yahoo Geoplanet took a similar step in this direction in April last year, as Programmable Web also reported. And in October the Guardian announced that they were connecting their platform to MusicBrainz.
The utility of shared identifiers, to enable cross-walking between different datasets, is well understood. Traditional data suppliers like Experian and Hoovers take great care to ensure that their databases have all the different identifiers for an business (for example). It allows a data provider to push their identifiers and data as the definitive source without locking out developers using data from other systems. It’s important to minimize friction for developers moving between platforms.
The Reconciliation API used by Google Refine to resolve identifiers and labels in a CSV file, into a strong identifier in a dataset, is another example of this linking process becoming part of the dataset curation workflow. And it’s one reason we’ve integrated that into Kasabi.
Linking Reduces Friction
What’s interesting to me is that this process of connecting up different datasets is the essence of the Linked Data movement. Right now, there’s a different set of technologies being used not just in the publishing of the raw data, but also in how shared identifiers can be resolved into additional data. But it seems likely that there will be some convergence, if not on specific technologies then around the approach: because there is still some friction in accessing the reference data, when we’re only dealing with shared identifiers.
The programmable web article notes that:
…there’s no straightforward way to extract much more information from these particular partner sites beyond the link to their listing page.
It doesn’t seem likely that things will stay this way though. There are plenty of sites that deal with location, have their own APIs, and stand to benefit from having their index of places harmonized with Foursquare’s. And then, of course, there may be enterprising mash-up builders who work out ways to extract information directly from linked pages, even if they are intended for browser display and not application parsing.
The Linked Data approach is to integrate data directly with the web, so it has a web address, a URL, just like everything else. So instead of having to turn a shared identifier into an API call, we instead add links directly to the external data. Developers and, importantly, applications, can then just follow the links to find and use additional third-party data.
By using technologies like RDFa to add machine-readable data directly to web pages, we can further stream-line the process of publishing and finding data.
Why Context Remains King
Sharing identifiers and linking between datasets is useful not just because it helps any individual dataset owner become the “Rosetta Stone” for their specific domain. It’s useful because we live in a Long Tail world.
As a data provider, no matter how much energy you put into curation to make your data more comprehensive there will always be some additional external data, some additional context, that can add value. That value may be incremental to the majority of users, but it will be important to someone.
Additional context unlocks value by providing additional way to access, interpret or navigate a dataset. Additional context allows us to ask more questions of the data.
The ability to contribute and make use of context by annotating, linking and combining datasets together will be a key part of the value of Kasabi.