Recommended allowance of… Pytassium

Posted on 08/11/2011 by

0


One of the things I’ve mentioned in a few featured dataset posts is Pytassium. Pytassium is listed alongside other client libraries in the Kasabi docs, and it’s a Python creature hatching from an egg for installing etc. But what I’ve mostly been using it for is a cool trick it does when it works as a command line over Kasabi.

It’s up on github, and was built by Ian Davis. To get up and running, I used setuptools, and the easy_install option:

>sudo easy_install pytassium

Scripting for APIs

Pytassium ties in with most of Kasabi’s APIs, and the ReadMe goes through getting started for scripting along with the APIs. A short example is using the Lookup API (for NASA in this case):

import pytassium
import time
dataset = pytassium.Dataset('nasa','put-your-api-key-here')

# --------------------------
# Use the lookup API
# --------------------------
response, data = dataset.lookup('http://data.kasabi.com/dataset/nasa/person/eugeneandrewcernan')
if response.status in range(200,300):
  # data now contains an rdflib.Graph
  print data.serialize(format="turtle") 
else:
  print "Oh no! %d %s - %s" % (response.status, response.reason, body)

There are usage patterns for each of the core APIs in Kasabi, with the current exception of augmentation (which has its own home under “to do” :) )

Exploring Data

The other feature of Pytassium is the command line utility. I use this quite a bit to explore a dataset, and it begins with the terminal.

So, since I’m already at my terminal, I get to simply type: p y t a s s i u m, and get the lovely >>> prompt, reminding me, as a noob, that I’m at the wheel of Python here. So to explore a set, I simply have to tell the command which dataset I’m looking for (using the URL path), then give it my API key (make sure you’re subscribed):


>>>use prelinger-archives
>>>apikey ***

Now that we’re inside the Prelinger dataset, we can start poking around. If you’re after a quick sample of the data, simply type: sample. If you’ve got a piece of data that you want a description of, type in: describe . If you run a sample, or have an array back already, you can also describe by the index (i.e. “describe 0”).

For a bigger picture of the dataset, count will display the number of triples in the dataset. You can count by class and other elements too, looking for numbers of classes or the number of items using a particular predicate (i.e. count foaf:person) show will give you elements of the data’s model such as the vocabularies used (“show schema”) or a description of the whole set (“show void”).

Another great tool is accessing the Search API via the command line by intuitively typing in “search” and the word you’re looking for. I’m pleased to say that the Prelinger Archives contain coffee:


>>> search coffee
0. Sanka Coffee Commercial (score: 1.0)
1. 2007-07-30-001118 (score: 0.79477257)
2. Coffee (score: 0.79324174)
3. 2005-07-19-041738 (score: 0.72796285)
4. 2010-02-06-044138 (score: 0.6798459)
5. 2007-10-03-192352 (score: 0.67383134)
6. 2009-03-27-025959 (score: 0.6686716)
7. 2006-03-25-091700 (score: 0.666953)
8. 2010-04-30-131055 (score: 0.63529885)
9. Folgers Coffee Commercial (score: 0.5960149)

Kasabi’s Reconciliation API is also available from the command line. As the Attribution information for a dataset. You can follow the ReadMe for some information on using all the APIs, and Python’s batch-scripting options too.

Pytassium also has some tools for managing your own data, so if you have created a set in Kasabi, you can use Pytassium for loading and management. Just “use your-dataset”, and don’t forget your APIkey.

Data can be loaded from your machine to the dataset using the “store” command (‘store yourdata.nt’). A quick “status” command will give you a notice of whether your dataset’s been published or not, and you can also reset a dataset, deleting all the data in the set.

So that’s a quick and dirty post covering Pytassium, which has been useful for my exploring and getting a good picture of datasets. Pytassium’s ReadMe on github goes into details on all the commands and gives some outlines for using API calls, so it’s well-documented so far. I’ve also enjoyed the fact that it’s very intuitive—especially for my exploration tasks—by using clear words and simple commands. I haven’t used it directly for scripting, but have played with SPARQL from the command line, and loaded data from my hard-drive to Kasabi with a few keystrokes.

I’m sure there are some Python folk out there, and I’d be interested in any scripts you might use with Pytassium, so feel free to comment (maybe point at github?).

Posted in: Technology