The Brief
This article builds on a query covered in one of my previous pieces. As usual, the full query is at the bottom if you want to skip right to it. If not, I’ll go over the context and walk through why I’m doing what I’m doing. If you're interested in seeing a SQL query to show the full DLP Scan History, click here.
This task arose from a requirement from our client to have their whole estate scanned. An illustration of the previous query dealing with this is shown below:
As consultants, we provide evaluations of the client estate and from this, estimate timescales, costs, scan speeds etc. Therefore, being able to proactively monitor the client environment and tackle issues before they arise is of paramount importance. Nobody wants to feel as if their time/resources are being under-utilised.
In our case, we had undertaken a project to use DLP to scan a large client estate (Petabytes and petabytes). The original query I created for this would be fine, if not for one problem. The client has multiple scanners in use simultaneously. Therefore, a further solution was needed.
This query is the result of that process. A wordy way of putting it would be; This query provides an average of the average speeds per hour, in GBs across an entire estate.
Here is what the query looks like when it has been ran:
It may not look like much. That being said, if you calculated the project timescales, resources needed and costs based on 40GB/hour, seeing this could set off a lot of alarm bells for you right now.
The Maths
The meat of this query revolves around calculations. The first calculation is to transform bytes into GBs, the second is to transform the milliseconds into hours and the final calculation is to divide the data transferred by the amount of time taken. This is all wrapped up nicely in a SUM function. As before, we're pulling from the WALK table.
The syntax is:
AVG(TRUNC((TRUNC(BYTESSCANNED*1e-9,2) ) / NULLIF(ELAPSEDTIME, 0) * 3600000,2)) AVG_AVG
As you can see, I’ve also had to implement NULLIF. The reasoning behind this is that when a scan is queued, even if not doing anything, DLP commits this to the database. Naturally, scans that haven’t done anything yet won’t have values against columns such as BYTESSCANNED or ELAPSEDTIME.
I'm also fairly certain your aliases will be better than mine.
Following that thought, if we’re doing calculations, we don’t want to break the universe by dividing by 0. Hence the error handling.
Last thing to mention on this point is the results are truncated to two decimal places. This is mainly to ensure the data is readable in practice by the recipients of the report.
The WHERE Clause
In this query, the where clause is almost as important as the calculation itself. As we’re dealing with an aggregate function, any rows which don’t add (pun absolutely intended) to the result in a meaningful way need to be discarded, else the results will be distorted.
You’ll see below that I remove any results which are 0 from the result set from the BYTESSCANNED, ELAPSED TIME, and result of the AVG_AVG columns.
Syntax:
WHERE BYTESSCANNED NOT LIKE '0'
AND ELAPSEDTIME NOT LIKE '0'
AND TRUNC((TRUNC(BYTESSCANNED*1e-9,2) ) / NULLIF(ELAPSEDTIME, 0) * 3600000,2) NOT LIKE '0'
The Conclusion
The full query is below as promised. What we’ve seen from the above is that the exploration of SQL in relation to the DLP product is consistently yielding more powerful tools which can be used to enhance the product. In this case it has led to a direct way of monitoring a project in practical terms.
I hope this article has helped.
Thanks
Chris
SELECT
AVG(TRUNC((TRUNC(BYTESSCANNED*1e-9,2) ) / NULLIF(ELAPSEDTIME, 0) * 3600000,2)) AVG_AVG
FROM WALK
WHERE BYTESSCANNED NOT LIKE '0'
AND ELAPSEDTIME NOT LIKE '0'
AND TRUNC((TRUNC(BYTESSCANNED*1e-9,2) ) / NULLIF(ELAPSEDTIME, 0) * 3600000,2) NOT LIKE '0';
Before You Go
If you would like to read my other articles on SQL-DLP you can find them here: