We are now a few months into phase 2 of the Greater Manchester Data Synchronisation Programme (GMDSP), and I figured now would be a good time to reflect on what has happened so far, what we (Trafford) are getting out of it, and where I see the programme going.
If you already know about the GMDSP, skip this bit. GMDSP, or DSP for short (I know!) is a programme designed to synchronise the release of data across all local authorities in Greater Manchester (Clockwise, from bottom: Trafford, Salford, Bolton, Wigan, Bury, Oldham, Rochdale, Tameside, Stockport, Manchester). The idea being that if we release the same data, to the same definition, then we will create datasets that cover a larger geographical area and a larger population-base, and are therefore of more use to more people. These datasets were to be made available as linked data, which allows other computers to read and understand the data, and match it with other datasets, such as those held by the Office for National Statistics, or Ordnance Survey. To help us do this, ‘Code Fellows’ worked with each authority to get the data standardised. The Code Fellows are civic-minded hackers, with expertise in data and systems.
The Programme is funded by the Department of Business, Innovation and Skills (BIS) through two catapults – the Connected Digital Economy Catapult, and the Future Cities Catapult. These two catapults fund the programme in the hope that it will achieve i) stimulation of business growth (through app development, data services, data use) and ii) strengthening relationships between the participating local authorities, so that we share data, knowledge and experience.
GMDSP is co-ordinated by Future Everything, an innovation lab for digital culture.
Phase 1 of the programme involved three local authorities in Greater Manchester – Trafford, Salford and Manchester. A special database was set up, by a Manchester-based company called Swirrl, called a quad store, which allows us to put a LOT of data into it.
We then worked together to identify some datasets for release – after a period of negotiation, we selected:
- Recycling sites
- Grit bins
- Planning applications
- Council expenditure
Some of these seemed to be of little interest to the public/developer community, but Phase 1 was mainly about looking to test the modelling process in phase 1, and prove we could work together.
So we worked on getting the data out of the various systems they exist in – the Code Fellows were instrumental in this. Modelling the data, in particular, would have been near-impossible for us to do on our own. Steven Flower, the code fellow who worked with us in Trafford, brought with him the concept of using OpenRefine to sort our data out.
The 24h Coding Challenge
On the 29th March 2014, the first Hack was held, where approximately 25 developers came to Tech Hub Manchester, for 24 hours, to see what they could do with the data. There were groups from across Manchester, and even a team from Istanbul.
It turned out that most of the developers that had come for the hack weren’t familiar with the format the data was in. So the first hour or so of the hack was spent teaching the developers how to use linked data, and Sparql to query and get the data. This was actually a very good thing to have done – an opportunity to show the developer community of Greater Manchester how to use linked data (linked data is gathering pace as the way in which public organisations make their data available).
You can read more about the Coding Challenge on the GMDSP blog.
There were two things that came out of the Coding Challenge that were of particular interest to us, as a local authority. Light Raider is an app which aims to turn streetlights into virtual collectable items, so that people are encouraged to move around more, by walking past the streetlights. You can read more about that in this Manchester Evening News article. This is a really exciting development, and something we are looking to support in Trafford, hopefully using Trafford Council staff to pilot the app.
The second thing that was of interest was developed by a data scientist from Salford, who took data from 30 years of planning applications, and used natural language processing on the text of the application, to test how much influence certain words would have on the likelihood that a planning application would be approved or rejected (eg “oak” and “conservatory”). This is interesting because this level of analysis and understanding could be a way to indicate whether a planning application is likely to succeed, prior to submission. It could also be used to identify trends, especially where there is many years worth of data, for example increasing use of the word “porch” could be correlated with changing crime rates, or education levels (note: this is a dreamt up example – not fact). It would also be very interesting to apply the same methodology to other datasets where there is lots of text, with a clearly categorised outcome. I think it’s worth exploring whether this processing could be used against structural surveys, or risk assessments, or social care casenotes, for example.
Phase 2 of the programme began in the Summer of 2014. There were some subtle changes to the design of the programme – the CodeFellows were restructured into two teams – one data team, and one systems team. We also added two more local authorities – Stockport and Tameside, as well as data for Greater Manchester Fire and Rescue Service.
We identified a further two datasets that we could model and release – business rates, and libraries data. We have also made the decision, based on feedback from phase 1, to use Open Refine to process all the data – this has meant that all 5 authorities will be using the same tool, and we have already started to have sessions where the data people from the authorities are working together, in the same room, on different datasets. Crucially, we are able to share the processes, so that one authority deals with one dataset, then makes the files available on Github. The others can then tweak and localise them, massively reducing the net time and effort we spend processing data.
The authorities now also have access to the staging area of the quad store, which means we can upload data ourselves, and test it in a safe environment.
Phase 2 has also seen collaboration with Leeds Data Mill, where GMDSP is co-hosting events with Leeds, and helping build connections across the Pennines.
We anticipate that more and more of our data will be released through the quad store. Assets, such as parks, community buildings, and defibrillators will all be added. We can then use the data there, and pull it back into our own maps and analyses whenever we need to – as can anyone else. We will also look at publishing quantitative data through the quad store – eg area based counts of referrals to children’s social care, for example.
We will also continue to work together as a group of authorities. We can learn from each other as we look to make better use of our own data. The principles of linked data mean that we could carry out complex, multi-authority data pieces, pulling data from other sources, such as census data.
Finally – longer term aspirations for the programme involve drawing sensor data into the quad store – such as that from smart citizen monitors, increasing the number of partners, and repeatability of the model in other regions/countries.