By: Oren Falkowitz

Big data, the mountains of information spinning through warehouses of humming, blinking computers, promising to give us fresh, new insights we need to understand and deal with everything from weather patterns to buying patterns. Everybody loves the idea of big data.

Its value is irrefutable. And now that we have the computing horsepower to crunch through it, we’re just beginning to scratch the surface of its potential impact on humankind. However, there’s another aspect of big data that flies under the radar, untapped, but just as important in serving up new insights and understanding. It’s called, “small pattern recognition.”

At the TED 2016 talks in Vancouver, there was a particularly amazing use of big data presented by scientist Riccardo Sabatini. In his talk, he wheeled out 175 huge books, in which each of the 262,000 pages were filled with the letters A, G, T and C, in very small type. This was the entire DNA sequence of geneticist and entrepreneur Craig Venter. 262,000 pages!

And then, Professor Sabatini pointed out two pages he had marked with yellow post-its. On the first one was a sequence of eight letters, whose order and placement indicated that Venter had blue eyes. The second showed two letters, which, if they were in the wrong order, would have meant that Venter had cystic fibrosis.

The 262,000 pages? Big data.

The 8-letter blue eye sequence and the 2-letter cystic fibrosis marker? Those are small patterns.

Hiding in Big Data, Small Patterns Tell Big Stories

Big data hides millions of small patterns, each one meaning something. Big data can make millions, even billions of correlations and turn them into a trend, a movement, or a prediction. Small patterns, on the other hand, provide a specific insight derived from something that occurs infrequently.

Those little patterns often lead to something much, much bigger. Like a warm breeze from the wrong direction that presages the onset of a hurricane.

It is precisely because these small patterns tell so much that sometimes we go to extraordinarly lengths to hide them. For instance, small pattern recognition is the reason some poker players wear sunglasses, or try to cultivate their “poker face,” so as not to give away the “tell,” some minuscule movement, glance, or facial expression that tells how a player feels about the cards he’s holding. But even this doesn’t work all that well (hence the sunglasses), because there’s nothing more obviously unnatural than someone trying to act natural. And it doesn’t matter whether we’re at a green felt table, or in cyberspace.

In his 1989 book, The Cuckoo’s Egg, Clifford Stoll relates his first person account of what may be the first computer break-in in history, where a hacker infiltrated the Lawrence Berkeley National Laboratory and was eventually caught, thanks to Stoll’s documentation and persistence. The story begins when Stoll, who managed some computers at the lab, was asked to resolve a $.75 accounting anomaly in the computer usage logs. He traced the charge to an unauthorized user who had used nine seconds of computing time and hadn’t paid for it. Nine misplaced seconds was the only clue that a top secret government laboratory was breached.

Turns out that this use of a seemingly innocuous charge almost too small to notice or care about is a favorite of those who commit credit card fraud. A $1 dollar charge for gas is enough to confirm that the card is still active, and yet on a statement or transaction log, it looks like some sort of accounting error. It’s not.

The use of big data analytics is obviously valuable when it comes to improving cybersecurity. Big data can find trends and uncover vulnerabilities. But looking at the small patterns in that big data may prove to be even more valuable. What kinds of small patterns, exactly? Something that’s just odd enough to be noticed. Anomalies. Things that at first glance look normal, but aren’t. The improbable. The suspicious, the conspicuous, even the absence of something that should be there.

Eventually, a lot of these little events add up to a pattern. That pattern becomes an identity by which hackers unwittingly reveal themselves. And here’s the most important aspect of those identities. They are indelible. Hackers try to hide them or vary their approach, but even the ways they do that give them away.

For example, one of the more revealing small patterns is a request in the log that says, “What’s my IP address?” That’s innocent enough, right? Except most users don’t care what their IP address is and IT people working on the network usually already know theirs. So now we’ve identified someone who may be poking around, trying to understand where they in a network to make a lateral move, or looking for a vulnerability. And we can begin to track that activity and correlate it with other similar kinds of activity that become a pattern, which eventually weeds out the innocent user and points us to a bad actor.

This is important because we have to realize that these big breaches we hear about, the Office of Personnel Management, Sony Pictures, Target, don’t just happen all of a sudden. They take months and months, sometimes years to play out. While the attack patterns in the earliest stages are sparse and seldom, they are very recognizable once they’ve begun to be correlated. We need to focus on finding those small datasets, establishing a credible reference set, and using that set as a launching point to look for similar such patterns.

Every cyber criminal has a particular way of operating defined by personal tics and nuances. It may be some peculiar way in which they write code, a favorite probe, or a payload that’s used over and over. Taken by themselves, these are just isolated anomalies. When you aggregate a lot of them, they form a pattern. Cluster and correlate enough of those patterns, and you start to reveal an identity that’s impossible to cover or fake, catching even the most careful cyber thief.

In the world of cybersecurity, it is most effective to stop an attack when it’s just a blip on the radar, a few letters out of place hidden within a book of a few hundred thousand. It is currently common practice to wait for an attack to develop unnoticed for months or a year until it’s a full-blown breach on a massive scale. In order to evolve this outdated approach and identify attacks early, we must consistently and vigorously watch for those small but recognizable patterns at their onset.

We may live in a world of big data, but for the early stages of cyber breaches, we have to think small. You don’t take down an organization with thousands or even millions of people in it without creating an infrastructure equal to the task. It’s a big job, but it can’t be done without creating small patterns that can tip off security pros.

Big data is transformative. But the bigger it gets, the more the small stories have to be mined out of it. When it comes to cybersecurity and protecting your data, small is big.