Information Centric Analytics

 View Only

Information Centric Analytics Best Practices - Using the Integration Wizard 

Aug 27, 2019 07:49 PM

Integration Wizard Best Practices

With the Integration Wizard in Symantec Information Centric Analytics, users have the flexibility to import almost anything and everything from different data source. Though that flexibility has tons of benefits, this also poses as a risk for an ICA implementation. Importing too much data may cause all sorts of problems the most obvious of which is filling up the ICA database uneccesarily. On the other hand, importing too little may cause the risk analysis results that ICA produces to be less accurate. This article explains what and how much data should be imported and other related items.

What to import

While it is obvious that data being imported should have a purpose in ICA, it is not the case when choosing the specific kind of data. There are a few questions that you should be asking yourself to help answer the question of “Should I import this data?”

  • Can I map this data to an ICA entity?
  • Does the data add value to determining the risk of entities in ICA?
  • Does the data contain elements that can associate it to a source and/or destination?

The questions above are the first step in considering which pieces of data to import. Once those questions have been answered, you can proceed with the specifics for each type of data. Below are some guidelines and best practices that we have collected from the various implementations we have done.  Note that the information below is more useful for when you have to configure a custom integration wizard import and may not apply to imports done through out-of-the-box integration packs.

Computer Endpoints, IPs, Users/Persons/Organizations, Applications

With any of these base entities in ICA, always consider going to an authoritative data source. For example, for computer endpoints and users/persons/organizations, a good candidate for the source for importing these entities include Active Directory, asset management databases or CMDBs as well as HR databases for users/persons and organizations, most especially, organization hierarchy.

Authentication Events

IMPORTANT NOTE: All authentication events, failed and successful, may contribute to an entity’s risk rating.  However, capturing all authentication data, most especially if you have several sources to pull from, may cause excessive data import.  If authentication data imported needs to be limited, start by filtering out generic local user accounts that may generate a lot of noise. Next, look for events that may be authentication-related but not necessarily login successes and failures.  If you need to trim the authentication events further and have several different sources and authentication types, consider limiting the events to only ones coming from Windows and/or Unix and Linux systems.

Endpoint Protection Events

For endpoint protection events, make sure that the data imported relates to detections, infections and the like.  Administrative events such as virus definition updates, etc. should be excluded.

DLP Incidents

DLP incidents data generally include violations only or anything related to something that triggered an incident to be generated by a DLP solution.  With that, there is not much filtering that can be done to DLP incident data without losing information significant to the risk assessment and scoring function of ICA.  However, if there are any test incidents and incidents related to policy testing, those would be good candidates to exclude from DLP data imports.

Web Activity Events

IMPORTANT NOTE: As with authentication events, limiting incoming data is key. As a rule of thumb, confine the data to only include blocked web activities and web activity that exceed 10 MB for outgoing traffic (bytes out) regardless of the action taken, i.e. permitted or blocked.

Integration Wizard Recommendations

1. Review the column names from the integration mapping entity prior to developing the Data Source Query for the Integration.  This will give you a better concept of what data the integration entity will accept for the integration mapping.

2. When creating the data source query, alias the column names within the data source query to align column names from the data source to the column name defined for the entity ID.  When Column names are aliased, the Integration Wizard automatically associate the source to the target using the name and it eliminate the need to manually map the source to target columns.  For example if you are loading computer endpoints and your source column is called AssetTag you should alias the column name to SourceComputerKey.

3. In order to prevent data type errors from occurring during nightly processing, use cast statements in your SQL to ensure there are no data type conflicts and to ensure that you minimize the risk of data size conflicts when loading data into ICA.

  • Sample Cast Statement that will cast the column SamAccountName to an NVARCHAR 256 field and alias the column to AccountName.  This will ensure the data selected is capped at 256 characters and we are using the alias AccountName so the column will be automated when the query references an integration mapping. 
    • CAST ([sAMAccountName] as NVARCHAR(256)) as  AccountName 

4. There are a number of formulas that are shipped out of the box that can be used to supplement your integration the most common one used is converting EPOCH time to SQL server time when building a Splunk based integration. 

5. When Creating a formula, enclose the variable in ‘{ }’.  Doing so will allow you to specify a column value to pass during run time.  Formulas are applied when the data is going into the Stg_Preprocess tables and not the staging table used when extracting data out of the source into staging. 

6. When defining the integration mapping you can specify a fixed value, a source column from a query or you can use a pre-existing formula / create a new custom formula. 

Recommended IW Entities Data Load Order into ICA

 

1. Organizations

  • When loading Organizations there are three required fields for the Organizations entity for the IW. 
  • Organization Abbreviation – This is a free form text field that is defined to serve as the abbreviation of the organization.   The data type for Organization Abbreviation is nvarchar(10).
  • Organization Name – is a free form field that allows you to specify an Organization Name.
  • Organization SubOrgName – is a free form text field that allows you to specify a sub organization to the organization if one exists. 

2. Regions

  • Regions are associated to countries and a region can be associated to one or many countries.
  • Use a standardized listing of countries and regions if necessary to supplement incomplete country information.

3. Countries

  • A country can only be associated to one regions.
  • Countries are most commonly associated to Users and ComputerEndpoints.  They will also be associated to other entities like Authentication Events, Web Activity and DIM Incidents

4. Users

  • The Primary key for Users is Account Name and NetBIOS Domain.  In the event, you have the same account name and different domain for a userid, multiple user accounts will be created for the user.
  • When attempting to associate a user to another entity like computer endpoints, authentication events and DIM/DAR Incidents you will need to provide a combination of Account Name and NetBIOS Domain to link the user to the record.
  • Users will generate people records if the user contains email address and if the user contains a manager. 

5. Vendors

  • The primary key for a Vendor will be the Vendor Name.  Prior to uploading vendor name evaluate the data to ensure that the vendor is named in a consistent manner.  There could be inconsistencies in the way a vendor is named in the source system.
  • Vendors can be associated to many users the vendor information will be stored in an object entitled LDW_VendorsToUsers
  • Vendors can be categorized by Industry and they can be associated to Vectors and they can be assigned Vector grades.

6. Applications

  • The Primary Key column for Applications entity are Application Name and Source Application ID.
  • Users can be associated with an application via email address or you can create and associate users by providing by providing an owner account name and owner net bios domain. 
  • You can create and associate compliance scopes to an application and an application can be associated from one to many compliance scopes. 
  • Applications can also be associated to an application categories. 

7. Application Contacts

  • The primary key column for application contacts is the source application ID from the external source system.
  • To lookup users you will just need to provide an email address.  Optionally you can configure the IW to create and associate users through this feed by providing a Contact Account Name and a Contact NetBIOS Domain. 
  • When Creating application contacts, application contact roles can also be created by using the Application

8. ComputerEndpoints

  • The Primary Key columns for computer endpoint is the Computer Name and the Source Computer Key.  The Source Computer Key should serve as the primary key to a ComputerEndpoints in the source system.  The NETBIOS Domain can also be associated to a computer endpoint but it is classified as an optional field.
  • Applications can be associated to ComputerEndpoints and an Application Assignment Tier is also associated to the Computer Endpoint.
  • You can also associate a country to a computer endpoint using the country name and you can look-up and organization using the Organization Abbreviation or you can create and associate organizations by feeding an organization name from organization suborg name.

9. Authentication Events

  • For Windows authentication events, a good place to start is by filtering only to include the following security Event IDs: 528, 529, 530, 532, 533, 534, 535, 539, 540, 682, 4624, 4625, 4648, 4768, 4769, 4771, 4776
  • Consider excluding the following information:
  • Authentication coming from SYSTEM
  • Hostnames that end in $
  • Logon types 0 and 3
  • When loading authentication events, the success character is a required field, we should pass a value of 1 for Success and a value of 0 for unsuccessful. 
  • A watermark should be used when loading authentication events to ensure old authentication events are not reloaded. 
  • The Logon Type ID is not a required field but it is highly recommended that we include it when loading Authentication Events. 
  • To associate users to an authentication event you can provide an email address; to create and associate users to an authentication event you should provide an Account Name and NetBIOS Domain information.
  • When associating a computer endpoint it is a best practice to provide the destination hostname and the source hostname

10. DIM Incidents

  • The following fields are required when loading a DIM Incident
    • Incident Date
    • Match Count
    • Recipient Identifier
    • Sender Identifier
    • Source Incident ID
    • Source Policy ID
    • Source Policy Name
    • Source Rule ID
    • Source Rule Name
  • Users are associated to a DIM Incident by providing a Source Account Name and a Source Net Bios Domain.
  • Computer Endpoints are associated to a DIM Incident via Source Hostname.
  • Dim Incident Statuses and Severities can also be associated to a DIM Incident.

11. Endpoint Protection Events

  • The primary key columns for End Point protection events are the Event Date and the Source Event ID from the external system.
  • IP Addresses can be associated to end point events by providing a Destination IP Address and a Source IP Address.
  • Lookup and Associate Users by providing a Destination Email address and a Source Email address.  Alternatively, Users can be created and associated to EP Events by passing Destination Account Name & Destination Net BIOS Domain and the Source Account Name & Source Net BIOS Domain. 
  • Computer Endpoints can be associated by associating a Destination Host Name and a Source Host Name for the computer endpoint.
  • Security Risks can also be associated to EP Protection Events.

12. Web Activity Events

  • The primary key columns for loading Web Activities are the Activity Date, Source Activity ID and the URL for the web activity.
  • A Destination IP Address and Source IP Address can be associated to a web activity.
  • Lookup and Associate Users by providing a Source Email address.  Alternatively, users can be created and associated to Web Activities by passing `Source Account Name & Source Net BIOS Domain. 
  •  Web Activities can be categorized by providing a Category Name for the web activity entity.
  • Severities can be associated to a Web Activity by providing a Severity Name.
  • The action taken via a web activity can be tracked by providing the Action Taken and the Disposition.  After loading the action taken the information is stored in the object LDW_WebActivityActionTaken if you are providing new actions that are synonyms for Blocked Actions, you will be required update the action to have the ISBlocked=1 when uploading the Web Activity.

Analyzing the Data

When dealing with data imported from sources other than those that we have out-of-the-box integration packs, data may turn out to be unpredictable. Prior to importing data into ICA, more specifically, the logical data warehouse tables (LDWs), you should take the time to analyze the data. There are two primary things that you should determine when analyzing the data:

Are there rows of data that I can filter further?

 

Are there any fields that need to be manipulated?

If you do find data that requires manipulation, you have two main options:

  • Use a formula
    Formulas are good for short, non-complex data manipulations. For example, formulas are good for making sure that string data types stay within the allotted number of characters for the destination columns. To do that, simply use the LEFT function, e.g. LEFT({sourceColumn}, 256).
  • Use a secondary staging table
    To use a secondary staging table to do some data manipulation, you will have to define a second data source using a data source type of SQL Server IW with the database as the ICA database. When defining the data source query, do a SELECT FROM the staging table where the data is first imported into.

Following the above recommendations will help to ensure you do not run into a "garbage in, garbage out" scenario with ICA.

For more best practice articles on Symantec Information Centric Analytics see the following posts:

Statistics
1 Favorited
11 Views
0 Files
0 Shares
0 Downloads

Tags and Keywords

Related Entries and Links

No Related Resource entered.