Recently, I had the great fortune of attending Urban Data Hack and the opportunity to follow, from afar, Flood Hackathon. It was the first time I've attended/observed hack days focused on the intersection of open datasets(*) and social good.
Both events were amazing on various levels, but above all, they were further examples of the level of humanity in the tech community and the high level of entrepreneurship and giving that exist in it.
So here are a few observations...
The amount of data that is available out there and the rate at which it is growing is simply mind-blowing (to me anyway... data scientists eat this kind of stuff for breakfast).
The cost of collecting data is also dropping like crazy - anyone can hook up a sensor to practically anything for the price of a day out at the cinema.
And opening up access to data is one of the best things data owners can do to encourage innovation. There is a large community of users and developers who are just hungry for original sources of data to use and integrate in order to create useful services.
However, there seem to be challenges in making all that data open. And providing open data costs money. Costs involve time in creating the APIs, hosting the data, and running the service.
Whether to charge people for access to data is a dilemma that data sources are faced with. Paid access does put a high bar on innovation with your data, so that may lead to a "what's the point" moment.
My recommendation, if you're a data source, would be to make sure you play an active part in how your data is being used. That's one way to help with your return on investing in open data. If you are providing your data to hackdays like the ones mentioned earlier, for example, try and be present and work with the teams so they also get some input on how you can benefit from the apps they are building using your data.
Be careful of where you host your data though. Some commercial platforms will charge you (a ransom) for access to your own data. I can think of at least one mapping company that is doing this, which is preventing various government bodies from opening up our data. Check your hosting provider's policy.
There is a lot of creativity within the tech community and there is now an amazing ecosystem of tools and services that makes it possible for that creativity to be expressed and tested very very quickly. 24 hours to build an app that could save lives is now not an unimaginable timescale.
There seem to be a few factors to think about when it comes to using open data to build applications for social good:
Analysis: using existing data to understand a particular state of the world. Every attempt at solving a problem must first begin with an understanding of the world within which that problem exist. The more you know about the problem and its surrounding context, the more effective your solution will be.
Modelling: reshaping existing data into a format that can be used to solve problems within a specific domain. The original data would have been collected without any particular solution in mind, so it will have to be reformatted into a new structure that is most suitable for the solution that is being built. In a lot of cases, there will be holes in the new data set. These holes can sometimes be filled by integrating data from different sources.
Action: what the application allows people to do. All that analysis and modelling become simply academic if they do not lead to action. This is where the role of engineers come to the fore -- combining data and requirements to create applications that can be used to solve real world problems. Here, great user experience will go a long way towards an effective problem solving tool -- it requires empathy with your users and how they can best use the app for action, given their situation (e.g. in a crisis).
Prediction: using learning algorithms on existing data to attempt to predict outcomes in future, given a similar set of circumstances. As they say, prevention is always better than cure. Historical datasets give us a fantastic opportunity to understand trends and predict them. While it is important to ensure help is provided promptly and effectively in times of crises, I'd wager my last Rolo that crisis victims would prefer not to have to ask for that help in the first place.
Data aggregators and directories are important roles in the ecosystem. There is so much data out there, that it's hard to keep up with what is available. Additionally, you want to ensure that the data feeds you want can be integrated with your existing data with minimal effort.
Directories are useful for discovering data sources, and aggregators will help reduce the cost of providing open data by redistributing data feeds in various consistent standards, which helps reduce the cost of providing open data on the data sources.
My observations above are from the perspective of an "outsider", someone not from the data science community, and more from the perspective of a developer and user. Corrections and comments please below!
* Lets just leave the "big data" buzzword to corporate salespeople and "consultants", shall we ;)
** This post originally appeared on Medium.Suggest a correction