One of the things I’ve mentioned in a few featured dataset posts is Pytassium. Pytassium is listed alongside other client libraries in the Kasabi docs, and it’s a Python creature hatching from an egg for installing etc. But what I’ve mostly been using it for is a cool trick it does when it works as a command line over Kasabi.
Scripting for APIs
import pytassium import time dataset = pytassium.Dataset('nasa','put-your-api-key-here') # -------------------------- # Use the lookup API # -------------------------- response, data = dataset.lookup('http://data.kasabi.com/dataset/nasa/person/eugeneandrewcernan') if response.status in range(200,300): # data now contains an rdflib.Graph print data.serialize(format="turtle") else: print "Oh no! %d %s - %s" % (response.status, response.reason, body) There are usage patterns for each of the core APIs in Kasabi, with the current exception of augmentation (which has its own home under “to do” )
The other feature of Pytassium is the command line utility. I use this quite a bit to explore a dataset, and it begins with the terminal.
So, since I’m already at my terminal, I get to simply type: p y t a s s i u m, and get the lovely >>> prompt, reminding me, as a noob, that I’m at the wheel of Python here. So to explore a set, I simply have to tell the command which dataset I’m looking for (using the URL path), then give it my API key (make sure you’re subscribed):
Now that we’re inside the Prelinger dataset, we can start poking around. If you’re after a quick sample of the data, simply type: sample. If you’ve got a piece of data that you want a description of, type in:
describe . If you run a sample, or have an array back already, you can also describe by the index (i.e. “describe 0”).
For a bigger picture of the dataset,
count will display the number of triples in the dataset. You can count by class and other elements too, looking for numbers of classes or the number of items using a particular predicate (i.e. count foaf:person)
show will give you elements of the data’s model such as the vocabularies used (“show schema”) or a description of the whole set (“show void”).
Another great tool is accessing the Search API via the command line by intuitively typing in “search” and the word you’re looking for. I’m pleased to say that the Prelinger Archives contain coffee:
>>> search coffee
0. Sanka Coffee Commercial (score: 1.0)
1. 2007-07-30-001118 (score: 0.79477257)
2. Coffee (score: 0.79324174)
3. 2005-07-19-041738 (score: 0.72796285)
4. 2010-02-06-044138 (score: 0.6798459)
5. 2007-10-03-192352 (score: 0.67383134)
6. 2009-03-27-025959 (score: 0.6686716)
7. 2006-03-25-091700 (score: 0.666953)
8. 2010-04-30-131055 (score: 0.63529885)
9. Folgers Coffee Commercial (score: 0.5960149)
Kasabi’s Reconciliation API is also available from the command line. As the Attribution information for a dataset. You can follow the ReadMe for some information on using all the APIs, and Python’s batch-scripting options too.
Pytassium also has some tools for managing your own data, so if you have created a set in Kasabi, you can use Pytassium for loading and management. Just “use your-dataset”, and don’t forget your APIkey.
Data can be loaded from your machine to the dataset using the “store” command (‘store yourdata.nt’). A quick “status” command will give you a notice of whether your dataset’s been published or not, and you can also reset a dataset, deleting all the data in the set.
So that’s a quick and dirty post covering Pytassium, which has been useful for my exploring and getting a good picture of datasets. Pytassium’s ReadMe on github goes into details on all the commands and gives some outlines for using API calls, so it’s well-documented so far. I’ve also enjoyed the fact that it’s very intuitive—especially for my exploration tasks—by using clear words and simple commands. I haven’t used it directly for scripting, but have played with SPARQL from the command line, and loaded data from my hard-drive to Kasabi with a few keystrokes.
I’m sure there are some Python folk out there, and I’d be interested in any scripts you might use with Pytassium, so feel free to comment (maybe point at github?).