Data Loss Prevention

 View Only
  • 1.  DLP 15.5 OCR Setup Issues

    Posted Oct 14, 2020 04:39 PM
    Hi,
    I've exhausted both the documentation and Broadcom's support on this issue, and I'm hoping someone here can help. There is very limited documentation on setting up optical character recognition (OCR) servers, as generally seems to be the case for newer capabilities in DLP.

    I have a 3 tier DLP setup. We recently purchased the licensing for OCR, and I installed 8 OCR servers (all servers on Windows Server 2016). I followed the instructions to install the OCR software and then add the OCR engine configuration. The ultimate intent is to use a load balancer for these OCR servers, but I'm first trying to get just a direct connection working first. I've added OCR engine configurations for both the load balancer and for each individual OCR server.

    My problem is that when I add an OCR config to the OCR tab of a detection server, I can't get any OCR detections. Once I try using the OCR server, I get overwhelmed by 4800 (OCR Service is busy) and 4803 (OCR request was not successful) events. At first I thought that it might be a TLS connection issue, but it looks like I have good keystores on both the detection server and the OCR server with matching certs. I went into SymantecDLPOCRServer.conf and SymantecDLPOCRServer.conf to increase the Java heap sizes, but the problem is still going on.

    Does anyone have any idea how I can get this working? I have these errors even during off-peak hours when traffic is low. When I used the sizing checksheet for how many OCR servers I'd need, I decided that 7 OCR servers would be enough, but we built 8.


  • 2.  RE: DLP 15.5 OCR Setup Issues

    Broadcom Employee
    Posted Oct 21, 2020 03:51 PM
    Hi,

    If you have added in an OCR engine configuration within Enforce for the load balancer then subsequently configured your detection server to use that loadbalancer OCR configuration and you are still seeing Event 4800 (OCR Service is busy), double-check the configuration on your load balancer itself to make sure any persistence is disabled. The persistence setting would be local to your load balancer. With persistence turned on, the load balancer will try to route traffic to the same OCR server when multiple requests arrive within the configured persistence time allocation.  In effect, this still might overwhelm a single OCR server when there is a consistent flow of OCR requests.  With persistence disabled on the load balancer, you should see more of a round-robin distribution of load on all OCR servers behind it.  Hopefully this will aid in reducing/eliminating any OCR Service is busy events.

    As a sanity check, double-check your detection server configuration within the Enforce console to make sure you have selected the load balancer OCR configuration, rather than selecting an individual OCR server since you mention that you have both configurations present in the console.


  • 3.  RE: DLP 15.5 OCR Setup Issues

    Posted Oct 23, 2020 12:15 PM
    Hi Eric,
    Thanks for the response. I checked with our load balancer admin and confirmed that we're not using any session persistence, everything is set for round robin. I also appreciate the tip to confirm that I'm using the right OCR engine configuration, I've certainly made that sort of mistake before :)

    At this point, I think I'm just under-resourced for OCR. I used the planning spreadsheet to calculate how many servers I'd need, but it doesn't seem to be anywhere near enough, unfortunately. One odd thing I've noticed: when I had my email and web prevent servers in the OCR statistics-gathering mode, the web servers actually didn't even have a high enough percentage of put and post requests with images in them to work on the spreadsheet; they were all below 1%. In any case, despite that, when I tried configuring a single web prevent server with a single OCR server, even that would lead to 4800 errors. So I think I'm just at the point of needing more OCR servers, unfortunately.


  • 4.  RE: DLP 15.5 OCR Setup Issues

    Broadcom Employee
    Posted Oct 26, 2020 01:32 PM
    Just curious, what detection server types are you using with the OCR?  I've worked with some customers in the past using OCR with Network Discover scans that happened to have some directories full of image files keeping their OCR busy for bursts of time. As an example, think of system backups containing all windows backgrounds, system images, Internet browser(s) cache files, etc..., that may not be excluded from a backup going to a destination on your network that is configured for a Network Discover scan.  For Network Discovery you could consider adding in filters to exclude files or directories in this scenario to reduce some OCR traffic (where you trust them to be safe).  Perhaps there might be a pattern in your environment contributing to a high OCR load that otherwise wouldn't be the normal traffic pattern.