What’s in the Kasabi Beta?

Posted on 03/22/2011 by

1


Last week we held our first Kasabi hack day and yesterday we started sending out a wider set of invites to developers to let them take an early look at the site. We’ve spent a great deal of time over the last few months exploring lots of different areas of functionality in Kasabi. But we’ve ended up releasing a cut-down view of the application to get what any start-up craves: feedback.

The vision for Kasabi is to provide an data marketplace that will let all kinds of users easily find, share and remix data from a number of different sources. Kasabi will support a range of different publishing and curation models for data, ranging from true public domain data through to data that is commercially licensed. That data might be collected and curated by a large organization, an enthusiastic community, or a passionate domain expert.

There are some exciting times ahead for exploring new approaches to data curation, and the business models that will make those processes and the applications built upon them sustainable. We think we’ve got some unique ideas around how to enable that in Kasabi, and we’re excited to be working in the same areas as a lot of other great products.

But we need to start somewhere and the foundation of any good platform are the developer tools. In our first release we’ve decided to populate the product with open data to allow our beta testers to play with a number of new and existing datasets. We’re keen to get feedback on the developer experience before we start rolling out more features. Although we have a number in the pipeline that we know will provide some useful extra tooling and support. We had a great discussion at our hack day that encouraged us that we’re on the right track.

We’ve initially populated Kasabi with a small selection of datasets (55 at the time of writing). We’ve got existing data sets such as Dbpedia and Geonames, the BBC Linked Data, as well as some new data, e.g. from English Heritage, the Prelinger Archives, the Foodista recipe wiki and even a Lego part catalogue!

Each of those datasets has a minimum of 5 APIs. I say “minimum” because one of the core features of Kasabi is that we’re allowing the community to create their own APIs over datasets. In Kasabi the community can curate their own APIs onto the data we’re hosting. This means you can use an existing off-the-shelf stock API auto-deployed by the system itself, an API created by another community member, or create your own pathway into the data which can then be shared with others.

Why deal with the frustrations of constantly changing APIs? Or wait for data publishers to add a new type of output format or query type, when you can create something yourself?

All of the data in Kasabi is stored as RDF — i.e. in a graph database — that provides us with a great underlying infrastructure for managing a variety of different types of data. It’s also providing the foundation for building out the range of APIs, data curation and analysis tools that we have planned. Use of RDF means that the data we are hosting is also available as Linked Data, providing yet another way to access the data.

But we recognise that not all developers are happy with using RDF, SPARQL, etc. So we have a number of ways that data can be accessed with more familiar technologies, e.g. simple RESTful APIs. We’ve spent a lot of time listening to and talking to developers over the last couple of years and think we have a handle on how to make these technologies more accessible.

If we can help make those technologies more mainstream along the way, then great. But our real goal is to drive innovation around data curation and usage and, fundamentally, to support the creation of some great products. Our use of semantic web technologies is because we think it’ll help us deliver on that goal.

For those of you who like feature lists, here’s a quick run down of features you can expect from the first Kasabi beta:

  • Discovery tools to help you search, browse and learn about datasets in the system
  • Quick access to data using a simple click-through process
  • Experimental API Explorer to quickly allow you to get hands-on with the data
  • Linked Data views for all hosted datasets (we’re also mirroring a number which have their own Linked Data at source)
  • A range of standard APIs, including full-text search, the Google Refine Reconciliation API, and SPARQL endpoints.
  • Every API supports a range of output formats including XML and JSON
  • API building tools using the Linked Data API specification and something we’re calling “SPARQL Stored Procedures”
  • Ability to share and document SPARQL queries with the community
  • Initial client libraries in Ruby and PHP

We’ll be opening up the ability to add new datasets shortly, allowing users to begin populating the system with their own data. This will also provide tools for managing basic branding, categorization, and documentation for each dataset. We’re already using those tools ourselves to curate the initial datasets.

The hosting of data in Kasabi will be free. We will be providing free Linked Data hosting for all natively hosted data. Kasabi is intended to provide a sustainable home for the publishing and management of public domain data. Our intention is to focus our business model on transactional uses of data and a number of value-adds around data hosting; more on that at a later date.

To return to my opening statements, what we’re looking for now is feedback. If you’re a developer and you’re interested in taking a look at Kasabi, we’re keen to hear from you. We’re planning to run a private beta through until early May. To request access in the interim, sign-up on Kasabi.com. If you’re a data owner and want to learn more about our plans, then please get in touch also, we’d love to hear from you.

Posted in: Beta