Featured Datasets: NHS Organizations and Performance Data

Posted on 07/26/2011 by


Tomorrow we’re hosting our first themed hack day on the topic of Open Government Data. For the event I wanted to gather some new datasets that could be used to create some new interesting applications.

I decided to focus on the NHS and have just published three new datasets:

  • NHS Organization — a dataset containing the identifier, name, address and hierarchy for all of the NHS related organizations. The dataset provides URIs for all of the NHS trusts and their related sites, right down to individual General Practices, Dispensaries and even GPs. The data has been enhanced to include links to homepages, twitter, flickr, and facebook pages where available.
  • NHS A&E Activity Statistics — weekly NHS provider based statistics on A&E activity from 7th November 2010 to 3rd July 2011. Use this to find number of people attending each of the different types of A&E departments, and numbers of emergency admissions
  • NHS Hospital Activity Statistics — Quarterly statistics, ranging from 2007-2008 through to 2010-2011, on numbers of GP referrals, first and subsequent outpatient appointments, including numbers of patients who did not attend

The datasets link up: all of the statistical reports relate to NHS trusts that are defined in the organization dataset, which provides an important new way to begin aggregating NHS statistics from a number of different sources. While there is a vast amount of statistical data available from the Department of Health, there is as yet no “dashboard” that aggregates and presents statistics at the provider level. These datasets mark the first step in that direction.

Using these datasets it’s already possible to find general practices by postcode, or the busiest A&E departments in the UK. We’ve created some example SPARQL queries that begin to show the potential; e.g organization queries, or A&E data queries.

But, to provide even greater flexibility in querying, we’ve created a fourth aggregate dataset called NHS Performance Data

The performance dataset aggregates the three new datasets along with the Ordnance Survey Linked Data. The OS and NHS Organization datasets provide important foundational data layers that provide richer means of querying statistics, e.g. based on the UK administrative geography or structure of the NHS. Over time, as we add new NHS datasets, this layered dataset will be updated to include new sources.

The datasets draw on the Organization ontology for describing organizations and their members, and the Data Cube vocabulary for publishing statistical data. Where possible links have been made to existing government sources including the Ordnance Survey and reference.data.gov.uk. Consult the developer documentation for each dataset for more details on the modelling.

The statistical datasets were converted using Google Refine and the RDF extension, and linked to the NHS Organization data using the Kasabi reconciliation API for that dataset. This was a straight-forward process and hints at the potential for crowd-sourcing the effort of converting, linking, and aggregating NHS statistics. I plan to blog more about that workflow in the coming weeks.

I’m excited to see what people can start to produce using this data. Should be a great hackday!

Posted in: Datasets, Events