Data Loss Prevention

 View Only

ISTR Insights: Sizing up Data Breaches 

May 13, 2016 01:14 PM

Data breaches have almost become a daily occurrence. It may not seem like it on the surface, but according to the 2016 Internet Security Threat Report (ISTR), the number of publically disclosed data breaches has risen steadily over the last number of years to reach 318 in 2015. That’s almost one data breach per day.

However, it often seems that data breaches only make the news when the number of impacted individuals reaches into the millions, or even the tens of millions—what we’ve come to call “mega breaches.” These breaches have a far reaching impact on a business that suffers one. A large company can watch its stock value drop at the same time consumer trust erodes away. And mega breaches are up in 2015; there were nine reported during the year.

Yet for all the attention-grabbing headlines, mega breaches are still relatively rare in the greater scheme of things. These types of breaches make up only around three percent of those reported in 2015. The fact is that most data breaches are different than this. So what do these data beaches look like?

Let’s start with a general overview of all data breaches this year. The average of identities stolen per breach was 1.3 million, but averages tend to get skewed by large numbers, which is exactly what mega breaches are in this case. In contrast the median, or the mid-point when all the breaches are lined up, has been trending downwards: from 8,350 identities per breach in 2012 to 4,885 in 2015. The median has almost halved in four years, which indicates there are far more small breaches than large ones.

There’s no question mega breaches have a significant impact on the overall number of identities exposed, and this year’s total was 429 million. However, with an 85 percent increase in the number of breaches not reporting the number of identities exposed, we believe this number to be much higher. At the very least, we estimate that half a billion identities were exposed in 2015.

It’s worth noting that this is a conservative estimate; in fact, there are other organizations that have reported much higher numbers for 2015 than Symantec has. However, we hold our count to a fairly strict methodology. For example, if a breach was reported this year, but took place during the previous year, we don’t add it to this year’s total. We also only count breaches that have been publically reported, either by a press release from the breached organization or a reliable news source. We don’t count records found exclusively on data dump sites or hacker “stolen identity collections” unless the source the data has come from is clear (these are often duplicates or old caches). That’s not to say some of these incidents aren’t legitimate breaches. We simply aim for accuracy over inclusion. Thus, while we estimate that there where at least half a billion identities exposed in 2015, it’s possible that this number is even higher, based on underreporting in the public sphere.

To get a better understanding of the size of most data breaches, let’s look at what statisticians call a boxplot. This will allow us to discard “outliers,” or unusual cases, in the data and give us further insight into what most data breaches look like, as opposed to all data breaches. (A deep understanding of boxplots isn’t necessary, but more information on them can be found here.)


It turns out that most data breaches contain under 60,000 identities, with three quarters having less than 25,000 identities. Any data breach over 60,000 is actually an outlier—an irregular occurrence that falls outside the norm.

In terms of the data being exposed, looking at these more common data breaches also paints a slightly different picture. Save a small amount of shuffling in the order, the types of data stolen is largely the same. The most obvious difference is that medical and insurance information both jump up in rankings, indicating these breaches are more likely to contain these highly sensitive pieces of personal information.


What’s interesting is the overall percentages we see in the following table. It’s concerning that the percentages rise in every instance of our top ten list. What this means is that these breaches are more likely to contain a larger variety of data about the individuals exposed.

When looking at how these breaches take place, the order of causes changes when comparing all data breaches to the most common. Overall, attackers were responsible for the largest percentage of identities exposed. This remains true for the most common breaches; however, their overall share declines. Theft or loss climbs to second place as well, dropping the share of breaches that were the result of accidental disclosure significantly. Insider theft also increases when looking at most data breaches, in comparison to all data breaches.


So why do most data breaches appear so much smaller when compared to mega breaches? It could be that most attackers are going after “soft targets,” or smaller organizations that may not have a lot of data, but also may not have strong defenses in place to protect against a data breach. The attackers get in and steal the data, but the size of the cache is about the size you would expect in a small- to medium-sized business. The data set is also richer, with more diverse types of data points.

As for the reasons most data breaches occur, the answers tend to lead to speculation, given the nature of the topic. Naturally those behind such attacks work diligently to mask their identities, which makes painting such a picture challenging. However, there have been rare cases where the motivation has come to light. These cases point to data breach goals rooted in identity theft, blackmail, cyberespionage, and even cyberactivism.

Ultimately a data breach is the end result of a larger security issue. Attackers can get in through a variety of ways, from misconfigured or unpatched servers to socially engineered phishing attacks that include malicious payloads. To avoid becoming the victim of a data breach, businesses should carry out regular security audits and employ defense-in-depth strategies that can detect and prevent intrusion attempts. Employing encryption can prevent attackers from siphoning off sensitive information that is in transit, while data loss protection (DLP) solutions can prevent the exfiltration of data if an attacker manages to make it into the internal network.

Regardless, every data breach is a serious incident. You can liken a mega breach to a plane crash, with the loss of identity being widespread and at times shocking. Meanwhile most data breaches are more akin to car crashes—far, far more frequent and an event that also leads to significant losses of identities.

These are just a few of the data breach subjects covered in the Symantec 2016 Internet Security Threat Report. Interested in what industries are at risk or what’s at play in the growing cyber insurance market?

Download the full 2016 Internet Security Threat Report

0 Favorited
0 Files

Tags and Keywords

Related Entries and Links

No Related Resource entered.