Big Data Requires Big Brains - Number Crunchers Get Competitive

21/05/2012 12:34 | Updated 21 July 2012

In the same week that Google Analytics used Tweets to predict that London Mayoral election incumbent Boris Johnson was holding a four-point lead over long time Labour rival Ken Livingston, IBM also rolled out a law enforcement analytics package aimed at helping to predict and ultimately prevent criminal and terrorist activity. Big data, big results.

But no matter how good their predictive power, these algorithmic systems could always be better. At their heart are algorithms designed by people. And if you want to get the best out of people you need to turn to the crowd. Take dark matter, for example. Last year NASA, the European Space Agency and Britain's Royal Astronomical Society held a competition encouraging people from other disciplines to cast their fresh eyes on the challenging task of mapping this mysterious cosmological stuff, which is believed to make up the vast majority of matter in the universe. In what proved to be a powerful example of high-end crowdsourcing, data geeks worldwide competed for cash and kudos - a $3,000 prize and the promise of an all-expenses paid trip to NASA's Jet Propulsion Labs.

In this case the first breakthrough came within a week of the competition's launch, when Martin O'Leary, who was doing his PhD in glaciology at Cambridge University in the UK at the time, created an algorithm which according to the White House "outperformed the state-of-the-art algorithms most commonly used in astronomy for mapping dark matter". O'Leary's glaciological research involved algorithmically finding the edges of glaciers, and that method provided to be a useful way to measure galaxies millions of light years from the Earth.

O'Leary's methods were then improved upon by other competitors including Eu Jin Lok, a graduate at Deloitte Analytics in Australia, and Ali Hassaine, a professor Qatar University who specialises in building algorithms to analyse handwriting signatures, in order to detect fraud. In the end the prize went to cosmology professor David Kirkby and graduate student Daniel Margala at UC Irvine, who refined their statistical model by developing an artificial neural network that could learn to recognize patterns in the galaxy images.

Besides achieving the goal of improving out dark matter mapping capabilities it also demonstrated that good old-fashioned rivalry can produce outstanding results. The same sort of approach gave us the Lindberg's first transatlantic flight, accurate time-keeping for navigation and more recently the first privately funded, manned space venture. But in today's data driven, connected world the digitization of this competitive spirit allows for even more to be squeezed out of people. So while a lone number cruncher might try a few techniques in isolation, the design of the competition positively encouraged data scientists to keep scrambling, by having a real-time leaderboard display the ranking of all teams at any given time.

In this age of big data, companies and researchers have piles of data but struggle to get access to the best analysts. Even if they do have access, it's hard to identify who (or which approach) might do best for a given problem. But putting it out to the crowd in this way overcomes these problems. In short, it is a highly effective way of crowdsourcing genius.

And despite often being complete unknowns, these 'mathletes' have already found better ways to predict which borrowers will default on a loan, which used cars are likely to be bad buys at a used-car auction and the likelihood that an HIV patient's infection will become less severe given their genetic markers. In return they get the chance to compete for anything from cash prizes - sometimes worth millions of dollars - to nothing more than kudos.

Although it may seem like a lot of work for those competitors that walk away at the end of a contest empty-handed, most of them don't take part for the money. It is the competitive dynamic that motivates entrants to push the limits of their capabilities in a way that publishing papers and writing patents simply do not. And we would be crazy not to take advantage of this. So as enterprises, governments and institutions increasingly turn to Big Data solutions, whether it's to cure cancer or our crumbling economy, doesn't it makes sense to use the best brains we've got?


NASA Big Data Phd