In the announcement last week that we’re shutting down Kasabi, we said that we would make an archive of the datasets available.
There is a list of download links now available. The spreadsheet lists each of the datasets, their license and a download link for the data. The spreadsheet contains nearly 200 datasets that were publicly available in Kasabi.
The list doesn’t contain any unpublished (private) datasets. It also doesn’t include a few datasets that Talis was hosting, but which are still available elsewhere, e.g. those from data.gov.uk or those that were straight mirrors of other sources. VoiD descriptions of each of the datasets — including their title, etc — is harvestable from data.kasabi.com.
If you have a private dataset that you need to have exported, then please get in touch.
edsu
07/16/2012
I just noticed that there are a few datasets that look to have a very small size, often 20 bytes. Did those archive properly? Here is a list of the dataset names and their size.
Leigh Dodds
07/16/2012
Some of these dataset may be spam, or where owner has published without really adding any content. All the export processes were successful.
L.
Dan
07/20/2012
OK, thanks Leigh for the advice.
Now my question is: How can I use these datasets?
Do I need a triplestore db (Jena, Sesame…)? Suggestions?
Philip John (@_philjohn)
07/21/2012
Dan – there are several options, either something like Fuseki (serverised Jena TDB), 4store r the Open Source version of OpenLink Virtuoso.
All three are simple to get running and provide a SPARQL endpoint.
Dan
07/22/2012
Coul you please give me some directions on how to use your datasets? I was building my MSc project relying on your food dataset and APIs. Now i’d like to go on using it but I don’t have idea on how to do.
edsu
07/23/2012
I just thought I’d let you know that I archived these datasets at the Internet Archive.
Leigh Dodds
07/30/2012
Thanks Ed. Really appreciate you taking the time out to do that
Dan
07/23/2012
Which triplestore do you use to load nquads?
Leigh Dodds
07/30/2012
Hi Dan,
The majority of triple stores will cope with both ntriples and nquads. We’ve been using the TDB store that comes with Apache Jena. I’d recommend you start there. This would give you a SPARQL endpoint to use.
For help on loading data files, configuring the software, etc I suggest you take a look through the Apache Jena documentation and then post any unanswered questions to the Jena User mailing list.
The rest of the APIs (search, reconcilation, etc) were all custom built for Kasabi so won’t be supported by a triple store out of the box. I’m afraid you’ll need to look at implementing those yourself, should you need them.
Cy Shak
07/30/2012
I seem to not be able to find the dataset http://kasabi.com/dataset/brandweer-amsterdam-amstelland-dispatch-messages/schema in the archive. Please advise. Thanks.
Leigh Dodds
07/30/2012
Row 58?
Cy Shak
07/30/2012
That’s just the instance data. The schema information (class descriptions) was captured in http://kasabi.com/dataset/brandweer-amsterdam-amstelland-dispatch-messages/schema
Leigh Dodds
07/30/2012
The void descriptions — which includes summary of the schema — isn’t part of the dataset. The archive files contain all the data submitted by the owners. The void descriptions were generated automatically and not stored in the triple store managed by the user.
The data was separately harvestable from the Kasabi Linked Data views. But as of today this is no longer available.
Ed Summers has already archived both the datasets & their metadata to the Internet Archive. You can find what you need there:
http://archive.org/details/kasabi
HtH,
L.