Data Loss Prevention

 View Only
  • 1.  Incident queue backlogged - Alert 1814

    Posted May 04, 2012 07:40 AM

    I am getting alert 1814 "Incident queue backlogged" from Enforce (version 10.0.0.17013).

    The alerts are generated at roughly the same time of the day, but are not generated every day.  The last 4 alerts are:

    May    4, 2012 3:56:26 AM SAST There are 1119 incidents in this server's queue.
    May    2, 2012 3:29:19 AM SAST There are 1040 incidents in this server's queue.
    April 30, 2012 3:31:35 AM SAST There are 1092 incidents in this server's queue.
    April 27, 2012 3:36:26 AM SAST There are 1014 incidents in this server's queue.
     

    I have checked the /var/Vontu/incidents directory on the Enforce server and all detection servers and found no files, i.e. the directory is empty.

    I found nothing noteworthy in the IncidentPersister logs.

    Incidents are being inserted into the Oracle database, because I can see new incidents for today, in the Enforce web interface.

    Searching for "1814 incident queue backlog" (all words, not phrase) on https://kb-vontu.altiris.com returns 1 hit, for the DLP 11.1 Admin Guide.  Within the admin guide there is a single hit for "1814":  It simply lists the message code and description but does not provide any useful information on the cause of the problem or resolution.

    Has anyone encountered the same issue?  If so, what was the cause and the resolution?  Is this alert something to be concerned about?

    Thanks
    Stephen



  • 2.  RE: Incident queue backlogged - Alert 1814

    Broadcom Employee
    Posted May 04, 2012 10:49 AM

    Had you tried to restart the Oracle DB service?



  • 3.  RE: Incident queue backlogged - Alert 1814

    Posted May 04, 2012 04:57 PM

    No, I have not tried restarting the DB service.  Restarting the DB service has to follow my company's change control process and I need approvals for this.  Restarting the DB service might clear the problem *now*, but I consider it a short-term fix as it doesn't help me understand the origin of the problem and how to avoid it in future.

    Restarting the DB is my last resort, if this forum isn't able to help me.



  • 4.  RE: Incident queue backlogged - Alert 1814

    Posted May 08, 2012 10:08 AM

    Stephen,

    I think Yang's suggestion above would be one of the best to see if it's some sort of hiccup in the service.

    The other approach may be to identify the incidents that are being created around the time of the backlog. The issue may simply be a spike in traffic that is also consequently creating a large batch of incidents. Being that the messages seem to come in the AM as well, would point to this further. Some steps I would try:

    1. Find out if there are there any batch processes that run overnight (more specifically around 3-4AM) that may contain sensitive information. (i.e. an FTP upload/download)
    2. Look into the incidents that are being created around the time of the messages, this should help identify the quantity of incidents and what they are related to.
    3. If your Oracle is hosted on your own Oracle farm, inquire about any other processes that may have priority over your DLP DB. Compute resources could limit this.
    4. If you are running any portions of the DLP product on Virtual systems, inquire if there are any resource pools that could be limiting resource availability to the Enforce system specifically.
    5. Ensure no other applications are running on the DLP servers that may be competing for resources.

    I think these are some good starting points to try and identify the core of the issue. Let us know what you fidn out.



  • 5.  RE: Incident queue backlogged - Alert 1814

    Posted May 10, 2012 02:08 PM

    Stephen,

    Shawn's suggestions on looking for things that touch DLP that are scheduled to run around 3-4AM is good advice given the consistent time of day and the budding pattern of days (M/W/F)... it definately smells like a scheduled job.

    If nothing turns up on that front, it would be good to break down if the issue is happening on the detection server, Enforce or Oracle.

    First, what server is associated with the error you are seeing? If it's being generated by the detection server you should focus your attention there:

    1. Check the /var/Vontu/incidents folder on the local detection server for .idc or .bad files.
    2. Look in /var/log/Vontu/debug/BoxMonitorX.log (whichever one covers the time period you got the alert) and look for the IP/hostname of your Enforce server to see if there's been problems connecting or if it's connecting frequently (connection should be fairly persistent and constant reconnects would indicate it's getting dropped often)
    3. If you see any connection issues, shift your attention to MonitorController on Enforce - it recieves the connections from the detection servers and will give you better information

    If the errors are coming from Enforce or if there's no apperent issues on the detection server, I'd actually focus your attention to Oracle - when MonitorController (managing connection to the detection server) or Incident Persister (parsing incidents and inserting into the DB) are having an issue, they usually let you know and it sounds like you've done the intial research on them.

    For Oracle the best thing you could do in this case (short of filing change control paperwork to get it restarted) would be to turn on JDBC logging until the issue happens again:

    1. Login to the Enforce console and then manually replace the 'Dashboard.do' with 'SystemDiagnostics.do'. So your URL will look like: https://<my Enforce hostname>/ProtectManager/SystemDiagnostics.do
    2. Click 'Toggle JDBC Logging' so that it shows JDBC Logging: Enabled
    3. Once the issue happens again, go back to this page and toggle it back off
    4. Collect logs from Enforce (System > Logs) and open Enforce/logs/debug/manager_jdbc_0.log
    5. Look for long query times. Most queries should be <10ms, but anything more than 4 digits warrants a second look. If you have a text editor capable of doing RegEx searches (EditPad Lite/Pro or TextPad++ are great) you can search for the following string to find all queries 1000ms or larger: [0-9]{4,99}ms
    6. If you see any long queries, especially where it seems to be touching Incident tables, the issue is almost certainly a delay from Oracle.

    If there isn't anything abnormally long in the JDBC log, and your /var/log/incidents folders are empty on both Enforce and detection server(s), then you likely are getting a spike in Incidents that - unless they are false-positives - just means your system is doing what it's supposed to and working through the backlog without an issue.

    Hope that helps - let us know if you find anything unusual!

    - Tim