There's a growing misconception that given enough data all questions can be answered and all problems can be solved - but helping an organisation to make the most out of data doesn't begin by collecting a lot of it.
If you don't know the kind of question you're asking or the kind of data that will give you the answer, all you're doing is creating 'noise'. When it comes to big data, companies get carried away with the large before considering actual value.
In order to make your organisation data driven you need a strategy in place and you need to understand what you're trying to achieve.
Size doesn't matter, it's what you do with it that counts
It's not the 'big' that's driving change in organisations, it's the ability to analyse a broad variety of data sources in real time.
In fact, according to a study from NewVantage only 28 per cent of enterprises say volume is the primary driver for big data projects. To put data at the heart of a business, it's much more important to think about what data the company will be processing, and why, rather than how much data it is dealing with. The size is only important in so far as you want to ensure you have a statistically valid data set and a scalable infrastructure. As Microsoft Research finds, "the majority of real-world analytic jobs process less than 100 GB of input," including those at Google, Yahoo!, Facebook and Microsoft.
I'll give two examples. The first is public sector, and the second from financial services. The City of Chicago wanted to build a predictive analytics system called Windy Grid that would pull data from 26 different agencies. By combining disparate, unstructured data from 311 calls (street lights broken, road damage, etc.), 911 reports (i.e., calls for emergency services), gun sales, public transportation, and more, and enabling it to be normalised and queried in real time, the City is able to anticipate crime, disease outbreaks, or other public safety-related problems might occur, and respond accordingly. The City tried unsuccessfully to get relational database technology to work, but turned to MongoDB for its ability to ingest a variety of data, apply geospatial properties, and query it in real time.
The second example is MetLife, a large, global insurance provider. For years MetLife tried to use traditional relational databases to attain a holistic view of its customers. Unfortunately, it ended up with 70+ data silos, so that when a customer called to inquire about a product, the customer service representative only had an isolated picture of the customer's profile, making it hard to satisfy her needs and to cross-sell other products. Despite investing a great deal of money and two years focused on trying to crack the problem with an RDBMS, MetLife managed to create a common schema across the diverse data sources in just two weeks with MongoDB, and was in production within three months. As is usually the case within an enterprise, MetLife wasn't struggling to handle petabytes of data, but rather to manage gigabytes or terabytes of data from disparate sources in real time.
What questions are you trying to answer? In Big Data, the winner isn't who can collect the most data. An organisation that is collecting vast amounts of data and then scratching its head as it decides what to do with it, is not a company with a wise data strategy. It's essentially just hoarding, filling digital landfills. Be smart and be scientific - what questions does the organisation need answering? Have a hypothesis, then use the available data sets to prove it and act on it. Anything else risks having data be an expensive distraction rather than the robust and transformational tool it should be.
Greatest Computers Ever Built
Get the right people involved. Yes, this may mean the CEO, but it probably also requires you to reach deeper into the organisation to source the people who can open access to the requisite data and others who can help you understand the data. That said, you're going to want a senior executive on-board to help navigate the organisation and get approvals where necessary.
There's been plenty of chat about 'data scientists' lately and I assume there are some smart people getting rich by consulting in this role. However, I would argue that it's much harder to learn one's business than to learn how to use big data technologies, particularly since technologies like MongoDB are so pervasive already within the enterprise. Far better, then, to turn existing employees into data scientists than to hire them from the outside who will spend months getting up to speed on one's business.
Organisations that want to become data driven should start small, iterate and fail fast. Big data is all about asking the right questions and then moving your project on as you learn which data sources are valuable and which questions yield real insights. Many organisations have allowed the hype of big data to motivate them to move forward, but have been left confused as to what they are actually getting out of it.
Big data is nothing on its own, but used correctly by the right people it can be a powerful tool for your business.