Data Loss Prevention

Feb 04, 2019 11:40 AM

Christopher Bradburn

Introduction

Symantec DLP provides a wealth of information in regards to scanning Data at Rest (DAR) and Data In Motion (DIM). However, accessing this information for reporting purposes can be difficult as Symantec does not include reference documentation for how the Oracle database stores and uses the data.

The following showcases a SQL query which was written with the intention of showcasing the pertinent information for a Discover Scan history. Obtaining the raw information allows it to be manipulated in a more efficient manner. I have also added in several different columns which aid in reporting duties.

The full query is at the bottom of the article. I will go through each part of the query and break down the function. An illustration of what the report looks like is below.

An Illustration of how the report looks when the query is ran.

The advantage of this is that this report could be used as a stored procedure and ran as a scheduled task which then is emailed to relevant stakeholders, enabling to see the state of their environment on a regular basis. It can also be used to keep track of large discovery projects.

The Select Query

The first part of the query deals with selecting and ordering the columns we want to show and aliasing them for UX. Illustrated as follows.

SELECT 
C.NAME Scanner_Name 
,D.STATE 
,A.STARTDATE Start_Date 
,A.INCIDENTCOUNT
,Time_Taken_hh_mm
,GBS_PER_HOUR
,A.NUMBERRETRIEVEDFILES Number_Of_Retrieved_Files
,A.LASTSTATECHANGEDATE Last_State_Change_Date
,TB_Scanned
,GB_Scanned
,A.ERRORCOUNT Error_Count

FROM 
  WALK A

The Calculations

Note in the above snippet, there are some columns that are aliased but don't provide the name of the column they have been pulled from. This is because the raw data is extremely raw. There is a certain amount of manipulation involved to make it readable to the average user. There are certain calculations carried out which improve the usability. These calculations are as follows:

Calculating the time a scan takes
The Gigabytes scanned per hour (data scanned divided by the amount of time taken)
The total amount of Gigabytes scanned
The total amount of Terabytes scanned (GB and TB are counted separately so usually pick one or the other, depending on estate size)

The time elapsed during a scan is counted in milliseconds. Therefore to get the time in a readable format, we convert them to hours and minutes. The data calculations are carried out by multiplying the amount of bytes scanned by the appropriate formula. Bringing the two together, we add in a calculation to divide the total Gigabytes scanned, by the amount of time that has passed in hours.

A side note is that this can sometimes produce the result 0, if there is a scan paused etc. Therefore we provide a command to turn the 0 into a null. This prevents the report from quitting unexpectedly as it tries to divide by 0.

Making the above changes and adding to our query, produces the following:

SELECT 
C.NAME Scanner_Name 
,D.STATE 
,A.STARTDATE Start_Date 
,A.INCIDENTCOUNT 

,TO_CHAR(TRUNC(A.ELAPSEDTIME/3600000),'FM9900') || ':' || TO_CHAR(TRUNC(MOD(A.ELAPSEDTIME/60000,60),2),'FM00') Time_Taken_hh_mm 

,TRUNC((TRUNC(A.BYTESSCANNED*1e-9,2) ) / NULLIF(A.ELAPSEDTIME, 0) * 3600000,2) GBS_PER_HOUR ,A.NUMBERRETRIEVEDFILES Number_Of_Retrieved_Files 
,A.LASTSTATECHANGEDATE Last_State_Change_Date 
,TRUNC(A.BYTESSCANNED*1e-12, 2) TB_Scanned 
,TRUNC(A.BYTESSCANNED*1e-9,2) GB_Scanned ,A.ERRORCOUNT Error_Count 

FROM WALK A

The Joining

The final step is to add in the additional tables that are required. The table that stores the majority of information we need is the WALK table. However, we need to pull in certain aspects from other tables. This can be seen in the query. The other two tables needed are; SCANASSIGNMENT and DI_DATAFRESH_TRK. Left outer joins provide us with the information we need.

The final query then becomes as seen below. An important note is that this can be customised to a high degree, depending on what is needed. For example, a WHERE clause can be added to specify date ranges.

The Conclusion

The reasoning behind this article is that there isn't a great deal of documentation (or indeed any I could find), that explains the layout of DLP components within Oracle. It is my hope that this prompts the forthcoming of more information on the subject that will enable more progress and utilisation of the product. Below is the result in excel for this query, followed by the full SQL query.

I hope this helps.

Thanks

Chris


SELECT
  C.NAME Scanner_Name
  ,D.STATE
  ,A.STARTDATE Start_Date
  ,A.INCIDENTCOUNT
    
  ,TO_CHAR(TRUNC(A.ELAPSEDTIME/3600000),'FM9900') || ':' ||
   TO_CHAR(TRUNC(MOD(A.ELAPSEDTIME/60000,60),2),'FM00') 
  Time_Taken_hh_mm

  ,TRUNC((TRUNC(A.BYTESSCANNED*1e-9,2) ) / NULLIF(A.ELAPSEDTIME, 0) * 3600000,2) GBS_PER_HOUR
  ,A.NUMBERRETRIEVEDFILES Number_Of_Retrieved_Files
  ,A.LASTSTATECHANGEDATE Last_State_Change_Date
  ,TRUNC(A.BYTESSCANNED*1e-12, 2) TB_Scanned
  ,TRUNC(A.BYTESSCANNED*1e-9,2) GB_Scanned
  ,A.ERRORCOUNT Error_Count
    
FROM 
  WALK A
LEFT OUTER JOIN SCANASSIGNMENT C ON C.COURSEID = A.COURSEID
LEFT OUTER JOIN DI_DATAREFRESH_TRK D ON D.STATEID = A.STATE
  
ORDER BY Start_Date DESC;

Creating a SQL Report for DLP Scan History.

Introduction

The Select Query

The Calculations

The Joining

The Conclusion

Tags and Keywords

Related Entries and Links