Why is everyone so hyped over Big Data?
Possibly it's because people are now realising the power of Big Data as a rich source of information for detecting security intrusions, and has since developed a taste for more and more logs.
Log Correlation has since then followed as IT professionals realised that individual log entries by themselves meant very little, but when placed into context against one another illustrated more than just system-level events. They illustrated behavioral context -- clusters of individual log lines which could be translated into records of human-readable actions.
Security is still in the early days of this science and practice of event correlation: Methods and results are rarely shared with the community, the target for what is effective keeps moving, and yet we're already talking about Big Data.
Terror and Possibility
The land is populated by people who have been doing this stuff for a long time before us.
Vast databases of information being mined for emergent patterns and used to process simulations over and over are hardly new to the world -- the finance, medical and aerospace industries have spent years in this realm. How is it, then, that the security world has not previously tapped into this pool of expertise before now to help us glean the knowledge lying dormant within our vast supplies of data? Quite simply, it's because we still don't know what questions to ask in the first place.
What Do You Want to Know?
We still aren't very good at asking the right questions from our data.
In security analytics, it's often the relations between the data (not the data itself) that is important. Just as detective work is a matter of "connecting the dots," so are the relations between our data points for the true information (Log Correlation itself is about looking for and exposing those relations).
No amount of Big Data is going to save us until we can learn to formulate better questions for that data. nformation security as a discipline may have much to learn from other technology fields.
I'll cut to the chase here: BioInformatics.
It won't take long before you find a plethora of advanced (and aesthetically pleasing) visualization techniques being used to present and explore data relations, like the CIRCOS system.
Ask better questions, discover relationships, create hypotheses and test them against more data; rinse, repeat -- the scientific method.
Big Data will not magically enable us to discern better answers until we come up with better questions to explore the relationships between our data more thoroughly.
The field of log correlation could make great strides if were we to establish an open format for exchanging ideas for correlations in a vendor-neutral manner and collectively discuss what is effective within the field instead of how we operate today.
Information security is evolving into areas well explored within other fields. Our issues with discovering relations and implications from our oceans of unstructured data are at the heart of the field of complex event processing.
If we are going to reap the benefits that Big Data promises and not let this become another failed fad, then we have to start overcoming our isolationist attitude and start inviting experts from other disciplines to join us and teach us how to use this new toolset.