In May 2015, Labour leader Ed Miliband lost the General election. His party diminished, he resigned. But, the political press wildly exclaimed he had 'won twitter'. For much of the punditry, this signified once and for all that "A tweet does not a vote make".
Fast-forward a year and the campaign to leave the European Union won the UK-EU referendum. The campaign also 'won twitter'. Now, the same pundits who downplayed the role of social media in the General Election became excited again that social media could and would one day be a tool for electoral predictions.
The unreliability of these results demonstrates one central problem with Twitter analysis that greatly reduces its credibility in predictive science: It's selective bias.
In the U.K., about 33 million people are on Facebook, but the number actively posting political opinions and views on Twitter is much, much lower. So low in fact that many have accused the platform of being an "unrepresentative urban liberal dreamland".
However, the problem with this conclusion is its simplicity.
Despite its obvious bias towards the intelligentsia, Twitter remains the largest experiment in public opinion ever; the largest 'coffee house' of conversation ever.
Every second, on average, around 6,000 tweets are tweeted on Twitter. Some of these, if not most, are trivial, but amongst the medley of noise are countless valuable and deeply personal insights into people's opinions. Ignoring such a large data set would be ridiculous in any science.
The trick is to make this data useful and relevant.
There is absolutely no reason why people should not analyse twitter users in exactly the same way they analyse opinion givers in the polls; by building representative samples. To be representative the characteristics (demographic, attitudinal and behavioural) of people analysed should, as far as is possible, match those of the entire voting population. This is simple logic, yet completely ignored in academic and consultancy research into social media use during elections.
Once we have this sample, which could in theory far exceed poll samples, we can start testing sentiment.
Sentiment analysis is the crucial advantage using Twitter has over opinion polls. Polls suffer, and have always suffered, because people often respond with the answers the pollsters want to hear, suppressing extreme or marginalised opinions. This happens less online.
Yet sentiment analysis is also in its infancy, choosing to measure the number of hashtag reposts or likes in support of, or against a campaign. That's where artificial intelligence comes in, allowing computers to intelligently recognise genuine sentiment through Natural Language Processing (the way humans think and speak).
It is an exciting area, but costly, research intensive and complex.
In September this year, a team of British researchers and data scientists will come together to see whether the use of this technology can correctly predict a major UK political event; the election of a new Labour leader.
The project called Deep Listen will involve building complex data sets of Labour party members (who elect their leader) and analysing the sentiment of their posts using artificial intelligence, over time and on a massive scale. Finally, the research will use regressional analysis to extrapolate and predict the eventual result. Everything will be published live and online ahead of time.
This is important and timely work. The failure of pollsters and bookmakers to accurately predict the outcome of Brexit cost the global economy over $1 trillion dollars. It has pushed the financial sector and other industries that rely on accurate information, to alternatives.
Whether these alternatives stack up is an important next step in our battle to predict the next major political shocks.Suggest a correction