Featured Dataset: MTA New York City Transit

Posted on 07/22/2011 by


Today’s featured dataset is one full of transit data from New York city, which is handy, because the Metropolitan Transportation Agency is looking for apps to be built on their data!

Haveing a read of Programmable Web’s blog post, it seems that the MTA is offering a $15k prize for the best app, and they have more details about the contest too.

Vocabulary DiagramSo, a quick look at the dataset:

It was put together by Ian, who built it together with a vocabulary:

“The data is structured according to the schema at http://vocab.org/transit/terms/ (which was co-developed with this data)”

The vocabulary itself is open, and hosted on github with the intention to explore how it works to open up the co-development of a dataset schema too.

Have a look at the depth of the schema, and you can start to glean the kinds of topics covered in the dataset, along with a full description of the way the data has been modelled. This schema gives detail on every descriptive property in the data. For example, the term Arrival Time is defined:

The time of day at which the service arrives at the stop. The time is measured from “noon minus 12h” (effectively midnight, except for days on which daylight savings time changes occur) at the beginning of the service date. For times occurring after midnight on the service date, the time will be a value greater than 24:00:00 in HH:MM:SS local time for the day on which the trip schedule begins. Services that span multiple dates will have stop times greater than 24:00:00. For example, if a service begins at 10:30:00 p.m. and ends at 2:15:00 a.m. on the following day, the stop times would be 22:30:00 and 26:15:00.

Another place to start with this set is having a look at the many sample SPARQL queries supplied: http://beta.kasabi.com/api/sparql-endpoint-mta-new-york-city-transit#Sample Queries. These should give you an idea of some of the contextual views of the dataset, and “ask questions” of the set:

  • Departure and arrival times for a particular day
  • Find all routes operated by MTA New York City Transit
  • Find rail routes operated by MTA New York City Transit
  • Find transit agency details

We can take a look at the final one’s query:

prefix transit: <http://vocab.org/transit/terms/>
prefix foaf: <http://xmlns.com/foaf/0.1/>
select ?agency ?name where {
  ?agency a transit:Agency 
    ; foaf:name ?name 

Ian’s also created a customised API for this set, which is a stored SPARQL procedure. This API produces a list of transit stops near a point as a GeoRSS feed.

This set has been designed to be built on, and it doesn’t take much effort to generate ideas for applications built on it. With New York’s transit agency putting up a reward for the best app, it seems like a good one to start on!

If you’ve got ideas, or would like some additional discussion about building around it, you can drop a line to the Kasabi developer network’s Google group (kasabi-dev@googlegroups.com), which is even more useful if you sign up as a member: http://groups.google.com/group/kasabi-dev.

Posted in: Datasets