Risk Optimization in Information Centric Analytics
The mission of Symantec Information Centric Analytics is to allow enterprises to make the most of their limited resources by automating as much of the data analysis and threat hunting process as possible. Symantec Information Centric Analytics is a highly configurable platform that performs automated threat hunting utilizing proprietary statistical and machine learning algorithms. Using Symantec Information Centric Analytics, Level 1 analysts are provided with a pre-vetted list of top threats and vulnerabilities, including insider threats, compromised accounts, vulnerable/infected machines, and exposed data. The vetted list is the result of the Information Centric Analytics platform’s data ingestion, enrichment and analytics process that automatically performs the initial threat hunting. This relieves Level 1 analysts from having to pour through all that data from many different sources to figure out who should be investigated, allowing them to focus on vetting and escalation or resolution.
Information Centric Analytics out of the box Scenarios and Risk Models provide a baseline for identifying different kinds of insider threats and cyber breaches, based on common data sources found in enterprise environments. These default Scenarios and Risk Models are a starting point, from which risk models and risk vectors need to be optimized and augmented to best reflect the available data sources and business goals of the organization.
The technical objective of the administrator performing the optimization is to align risk models with the organization’s desired use cases, available data and prevalent types of threats seen in each particular environment. For example, if an environment does not include a Cloud Access Security Broker (CASB), or it is not being fed into Information Centric Analytics, there is little point in having CASB related scenarios included in a Risk Model. Similarly, if the company’s customer list is managed in a cloud application for which there is not currently a scenario configured, it certainly should be created and included in relevant risk models.
On a macro level, any given Risk Model should tell a story to the analyst without too much complexity and should result in a consumable list of Risk Model instances (the number of people/users triggering the model). Too complex and/or too many instances mean that the Risk Model is casting too wide a net and is going to result in increased false positives. Too simple and/or too few instances means that the Risk Model has too narrow a view and is going to result in increased false negatives (ie. missing threats). The “goldilocks” just right Risk Model will typically include between 2-4 “cards” triggered per stage of the model, across no more than 6 stages, with any one model resulting in a number of instances that are less than 1% of the total population of people being analyzed. For example, in an enterprise with 50k people being analyzed, no one risk model should result in more than 500 people being identified as matching the model.
Just like risk models tell a story about a particular sequence of activities that highlight a person or user is a threat, risk vectors tell the story of a user/persons overall set of activities that may indicate that they are a risk. Out of the box risk vectors use risk scores of users and people to align to common data sources and risk factors. However, if a company’s environment does not include relevant data sources or they are not applicable to the business’ use cases, then they should be adjusted to best tell the story of user/person risk. Nothing is more of a let down than drilling on a user/person, only to find their risk radar diagram and risk vectors mostly show zeros or factors that are irrelevant to the environment. Conversely, a well populated risk radar diagram with risk vectors that align with prioritized use cases show value and accelerate threat hunting activities by elevating users/persons that need to be investigated.
Ensuring risk vectors are functioning as expected is critical to accurately identifying which persons, users and computers should garner the most attention for investigation and remediation. Depending on the quality of the data being ingested, risk vectors may be adversely affected, throwing off risk scores and producing inconsistent results. The following process provides a simple and straightforward approach to analyzing risk vector results to easily pinpoint potential trouble spots and ensure a healthy environment.
1. From within Information Centric Analytics, create an Analyzer view to review Risk Vectors and how prominent they are individually and relative to each other. This view will inform you how well the Risk Vectors are balanced and configured relative to data sources and customer requirements. Drag in the following:
- Risk Vector Entity Type (Dimension on rows)
- Risk Vector (Dimension on rows)
- Risk Vector Count (Measures)
- Raw Score Max (Measures)
- Raw Score Sum (Measures)
1. For each Risk Vector Entity Type:
- In cases where the Risk Vectors that have a Risk Vector count of 0 or a count that is a small percentage of the total count of the entity type:
- For Risk Vector Count = 0, verify if the data source for the Vector is present, and if not, remove it from the entity’s risk scoring
- If the data source is present, review the parameters of the vector to see if the ranges are out of line with the customer’s environment
- If the parameters appear in line with the organization’s environment and goals, then leave as is. The other consideration is to adjust the Vector’s weighting up or down, based on the perceived riskiness of the activity.
2. For those Risk Vectors that have a Risk Vector count in line with the total count of the entity type:
- Review the Raw Score Max relative to Raw Score Sum. In general, high risk vector counts where the Raw Score Max is more than 5% of the Raw Score Sum (assuming a large sample size) indicates that one account is dominating, and the vector probably needs to be adjusted. Add the User – Account Name dimension to help determine which account is skewing the data.
- If the Raw Score Max or Raw Score Sum is very low compared to the Risk Vector Count, then the purpose and effectiveness of the Vector should be reviewed.
- If the Raw Score Max or Raw Score Sum are extraordinarily high for the event type, then the purpose and effectiveness of the Vector should be reviewed, perhaps as a candidate to be split into multiple vectors (beware of the event type – if the Vector is web hits, that will naturally be a high number compared to DIM policy violations, which should be by nature a much lower order of magnitude).
The process of optimizing risk models is part art and part science, to ensure Information Centric Analytics is presenting a picture that demonstrates an organization’s target use case and effectively catch the highest risk activities, while also minimizing false positives/negatives. Before beginning the optimization process, it is important that you understand the desired outcomes, the organization’s priorities when it comes to measuring the risk of people/users/endpoints, etc., and the available data sources integrated. Having this knowledge ahead of time will drive an informed optimization process and ensure that it hits the mark for the organization. The optimization process uses Symantec Information Centric Analytics’ scenarios, risk models and analyzer ad-hoc analysis capability.
Using the built in Analyzer capabilities within Symantec Information Centric Analytics, we can inventory available Event Scenarios and their Instance Counts to get a better idea of what scenarios are being triggered and identify if there are any that may be producing too many results – making a good candidate for tuning.
1. Open three instances of Symantec Information Centric Analytics in 2 different browser tabs
2. In the first tab, create an Analyzer view showing event scenario created date range, event scenarios and their instance counts. This view will inform you how long each scenario has been around and how often it is triggered, allowing you to make tuning decisions. Drag in the following:
3. Event Scenario Instance Count (Measures sorted descending)
4. Event Scenario Name (Dimension on Rows)
5. Event Scenario Instance Created Date Range (Dimension on Rows)
6. In the second tab, create an Analyzer view to show how many people each risk model triggered for, how many cards have triggered and how often. To create this view, drag in the following:
- Risk Model (Dimension on Rows)
- Stage (Dimension on Rows)
- Card Title (Dimension on Rows)
- Instance Count (Measures)
- Card Count (Measures)
- Card Event Count (Measures)
7. Note those Risk Models that have an Instance Count > 1% of the number of People/Users/Computers being analyzed, based on each Risk Model’s focus entity
8. Expand all Risk Models, Stages and Card Titles
9. Look for Card Titles with a Card Count of 0
- Note those Card Titles with data types that are not being ingested (note the containing Risk Model and Stage as well, and consider removing or replacing these cards)
- Note those Card Titles data types that are being ingested (these Cards should be reviewed to ensure data is coming in correctly)
10. Note Risk Model + Stages that have fewer than 2 or more than 4 Card Titles with Card Instance Count > 0
- Note the Card Instance Counts and Card Event Counts
- Consider parameter adjustments on Cards with high counts (specifically instance counts greater than 10,000) that would decrease the Card Counts while not reducing the Card’s efficacy
When reviewing risk models, it is useful to consider the organization’s fundamental use cases and data sources currently being integrated, as this will ultimately dictate what actions should be taken for tuning.
1. Identify risk models that are applicable to the company’s use cases
2. For Risk Models aligned to the company’s use cases and have the right sources currently integrated:
- Iteratively adjust cards and parameters for appropriate hit rates
3. For Risk Models aligned to company’s use cases that are missing the sources listed:
- If alternative sources are available, create cards with parameters to utilize alternatives
- If alternative sources are NOT available, cards should be removed from risk models, and additional capabilities should be considered if sources are integrated in the future
4. For Risk Models NOT aligned to the company’s use cases that have the right sources:
- Iteratively adjust cards and parameters for appropriate hit rates:
5. For example, the “Compromised User” risk model requires some indicators of attack/compromise and/or threat intelligence. Even if the organization is focused on malicious insider, if endpoint events and/or threat intel is available, leave the model on to take advantage of the added value it provides out of the box (assuming that the model populates properly from these sources). Otherwise, disable the scenarios associated to the risk model, or delete the risk model.
6. Risk Models NOT aligned to the company’s use cases that do NOT have the right sources
- Disable scenarios associated to the risk model or delete the Risk Model
- For example, the “Cyber Breach Data Loss prediction” risk model is data at rest focused – If that is not the focus of the implementation, and there is no DAR data, it should be shut off.
Security is a process and not an end result, as and such, there is no perfect configuration. Getting to Wow in Information Centric Analytics is all about the out of the box configuration telling users a balanced story that is relevant to their role and drives decisions. The above process explains the moving parts of the process, and as experience is gained through continued optimization, the elements that require “tweaking” will stand out naturally, because “it just doesn’t look right”. Experimentation is often encouraged to strike the right balance and ensure Information Centric Analytics becomes a finely tuned machine to help maximize the value of risk analysis.
For more best practice articles on Symantec Information Centric Analytics see the following posts: