Blue String Pudding: September 2016

Since it's heading inexorably towards autumn, as evidenced by the squally rain outside my window, I thought that I would choose a suitably seasonal topic.

There's a point of view which argues that there is no such thing as "too much data". Eventually it will be useful. Tools will be invented to trawl through the data automatically, and computers will get faster. Why not hang onto it? So people turn into squirrels and store more and more log files, potentially including personal data (there's an ongoing discussion as to whether IP addresses constitute personal data, which makes things more interesting).

I can immediately think of three reasons this is a bad argument:

Legal compliance: holding onto personal data "because it might come in handy" isn't exactly "keep for only as long as required for the purposes for which it was collected" which is a rough recast of Principle 5 of the UK Data Protection Act.
As speed of processing increases, the rate of data collection will also increase. It's likely that the one will never catch up with the other.
By the time you can trawl the data, you may not need it any more.

An information security product which really encourages squirreling - or at least the acquisition of incredibly large quantities of data - is a SIEM (security intrusion and event management system). It takes server log files and network traffic data (and anything else you can find) and correlates data to produce event/incident alerts.

But a SIEM isn't a magic box which solves all your problems. It needs someone to tell it what t look for - what an event is, and how to correlate data. If you go for "collect first, ask questions afterwards", you are very likely to have a tonne of data and nothing meaningful. Often, people confronted with this unpalatable truth go for a rather odd solution: collect more data. Upgrade the system to a bigger one. Get more log feeds from more systems, or from different types of sources, and eventually it will all make sense.

That's equivalent to collecting pieces of many jigsaw puzzles for years, storing them in a big box, and never actually putting them together. So you decide to collect more pieces, maybe from even more puzzles and buying a bigger box. How will this help you? Now imagine that some of these pieces are contaminated with plutonium. You REALLY don't need those around unless absolutely necessary. That's like doing this with personal data.

Yes, I hear you saying, but the lovely people who created my SIEM have pre-configured it to look for things I need to care about. They already thought of the incidents I need to be told about, and it's all fine. They are the experts.

Not necessarily. Those fine professionals might know their product inside out, but there is one thing they know pretty much nothing about, even if they have had ten meetings with you. Your business. They can look for generic events, like "someone keeps trying and failing to log in", but how useful is that to you?

To answer that question, I bet you started to think about business impact, maybe "If it was repeated login attempts on a system holding payment records, then I'd be worried". So knowledge of your business is key to useful incident notifications. CAVEAT: Some pre-configured alerts may be suitable if you are in a very regulated business and are very typical of that business.

Without understanding which events are of relevance, you get the second SIEM problem, overload. Hundreds or thousands of cries of "WOLF!" every second. Imagine if every failed login generated an alert. Or if it told you about every time someone connected to FaceBook (which might work for you, but some companies rely on FaceBook).

You'll also have spent a great deal of money for little return.

However, it's not all bad news, squirreling away logfiles isn't a brilliant plan; but tactical acquisition of a subset of that same ocean of data might just work.

If you have a SIEM, or are considering getting one, and are of the squirreling persuasion, stop and think:

What are you really going to use the SIEM for? Business reasons, and make them SPECIFIC. Not that "to help us detect incidents" stuff.
Think of specific, clearly defined use cases where correlation and alerting will help you (more on this in a later entry)
Work out the minimum data you need for exactly those use cases
Work out where you can get it from and how (politics may happen here)
Make sure the SIEM can understand it (format)
Work out how it can be connected together to identify the specific incident you are trying to detect

And, for bonus points.:

Agree with the relevant parties how the notification will be responded to (and how it will be verified, especially during early days where it might be a false positive).

The third major surprise resulting from turning on a SIEM, even when it is well configured, is that you suddenly get reports of incidents and they leap out at the incident response team. Yes, these things have been happening before the SIEM told you about them. But once they have been reported, you actually have to do something about them - even if it is only recording then and producing a nice little graph which you can attach to your new homework on "Why we need more budget/a different policy/more kit".

Once planned, put your plan into action. Basic project management stuff. Be ready to revise your plans to meet your overall goal. Take advantage of interesting benefits along the way, but don't lose sight of the objectives.

So, in summary, you don't need to squirrel, but you can use some of the data very profitably, if you know what you need it for. Be business-led, not data-driven.

Blue String Pudding

Friday, 16 September 2016

Autumn topic - Log squirreling