House-hunting with data

Posted on 02/07/2012 by


For SaleI’ve been a follower of Anna Powell-Smith on twitter for a while now (mainly because of her work on the Open Domesday Project), but I somehow missed her fantastic life-hack linking house prices with distance from London. On one hand, it shows what coders do to solve real-life problems with multiple sources of data and straightforward visualisations. But it also answers a series of questions I’ve recently been asking in a way I can’t easily find.

Anna explained:

We’re house-hunting. And for me, like most coders, house-hunting involves lots and lots and lots of screen-scraping.

… I’ve been looking at transport and house-price data. Specifically, I’ve scraped travel times to London by train versus house prices, to examine the theory that houses get much cheaper once you escape the commuter belt.

This graph is the exactly my kind of tool, because it leads to better questions to ask. I had no idea how far from London some most places are by train, and some time spent highlighting bubbles with affordable houses got me thinking about what else I’d want to know. I can start with train times and house prices, and I’m now wondering what other lifestyle elements to consider: town population, access to areas of outstanding natural beauty, school rating (you know, just in case some day :) )…

Coming from a Kasabi perspective, I’m being quite geeky about this and composing additional layers in my head. Can I find places which are also near airports that fly to Colorado to visit my family? How do the hospitals rate? What if I mixed this graph with a map of places of historical interest, like those in Kasabi’s sets from English Heritage?

Anna also discovered things she wasn’t looking for, by exposing a bump in house prices affected by Edinburgh. This kind serendipity is pretty exciting when it ends up uncovering things that weren’t on the agenda. Also interesting are the outliers. There might be gems of places which would be commutable to London but aren’t hideously expensive, which a detailed enquiry into the data could uncover. Granted, these may often be caused by something odd, such as abnormal properties skewing the results of a smaller town as Anna pointed out.

I also think this lifehack points out places where data could be made more accessible. Anna used an API from the accomodation search site Nestoria (which interviewed her later), but had to screen-scrape for train times. It’d be helpful to know how data can be used, if a provider can make the licensing clearer for reuse. Looking at this from the perspective of Kasabi, I’m wondering what data would have been more useful to pull directly from APIs? Also, the datasets didn’t seem to mesh easily as Anna pointed out:

For each station, find the mean asking price for a 3-bed house within 2km in the past 6 months, from the Nestoria API. (Nestoria shows listing prices, rather than transaction prices like Zoopla, so it may contain duplicates and is probably less accurate – but Zoopla isn’t granular enough to search just for 3-bed houses.)

So, what do you hack to answer your life questions? The data team is working on better travel data—as blogged earlier—and we all want to know how Kasabi can make this kind of hack better for you. What data do you need? Please drop me a line or leave a comment below.

Some existing datasets that might help:

Image: “For Sale” by Ian Muttoo via flickr CC: by-sa 2.0

Posted in: Ideas