NFA Harvester isn't keeping up.
We recently migrated from Reporter/Analyzer to a 2-tier configuration of NFA 9.3.0. The servers are VMs under ESXi5.5, where physical hardware is consistent with CA's requirements (Dual Quad-core XEONs, onboard 12-spindle RAID, significant memory). We have about 275 interfaces, and 250 devices.
Our Harvester is configured as an 8x with 8gb RAM, running on Win2008R2. Task Manager shows 5% CPU and 35% memory usage.
We see the reaper getting behind by up to 2 hours. ReaperWork shows a steady coming and going of files, but ReaperInput is very slow to empty. The system doesn't appear to be taxed, but data is not getting processed quickly enough,
Is there some performance tweaking we can do to speed up Harvester processing - maybe more java processes or increased cache buffer sizes?
How large is a typical *.TBN file in the ReaperInput directory? The old rule of thumb from the ReporterAnalyzer 9.0 days was that you can estimate the number of flows per minute that your Harvester is processing by taking the size of an average *.TBN file in bytes and dividing it by 64.
Here's the old ReporterAnalyzer 9.0 Tech Doc on that:
One other thing that I should point out is that we recommend 12GB RAM for a Harvester:
Windows Servers - CA Network Flow Analysis - 9.3.1 - CA Wiki
Thanks Kahlil. Typical file size of a .TBN is about 175,000 KB. So (175500 *
1024) / 64 -> 2,808,000 ? What does this tell me?
2.8 Million flows per minute is a little high, but a Harvester that meets our suggested requirements should usually be able to handle that.
Make sure that the ReaperInput folder is excluded from an realtime AV scans or similar 3rd-party utilities that might interfere with the processing of the files. Also, if you check the "harvesterwork" and "reaperwork" directories, do you see any old leftover files with timestamps older than the last time you started the Harvester and Reaper services? A large number of files in the *work directories can sometimes slow down processing.
When using a VM on the ESX server, you want to make sure all resources are dedicated to this VM that NFA is installed on and not shared with other VM's.
Often a huge bottleneck is the Disk I/O on VM's since the disk is often shared across other VM's and can impact Reaper performance.
Also check to see what 3rd party applications are installed on the server, such as backup tools, Anti Virus, etc.
Symantec endpoint for example is known to cause performance issues with NFA Harvesters.
We considered the VMware disk latency – and we shut down / moved off any non-essential VMs – but we’re unable to dedicate the entire machine to this single VM. The VSphere performance graph shows disk Highest Latency average at about 10 – so this could well be a bottleneck.
I have added D:\NetQos as an exception to any Symantec scanning and we’ll see if that has an impact.
Is Symantec Endpoint installed on the server, or is it some other Symantec product.
We have found with Symantec endpoint, you can't just exclude directories, or even shut down services, it actually has to be uninstalled for it to help performance.
Other Anti-Virus program you can just exclude the D:\Netqos or \CA\NFA diretory and the C:\Windows\temp directories and you should be okay.
yes – sep 12.1.5
when we ran 3-tier under win2003 on the CA appliances, we DID have sep installed on all servers.
There are no leftover files – but I will make sure ReaperInput gets excluded from scans. Thx
Process Monitor can be used to verify no 3rd party applications are accessing these folders. Process Monitor is one of the tools in Microsoft's SysInternals suite. Just add a filter, Path, is, <path to directory>, Include. Specific directories need to be added, so for example D:\CA\NFA will usually only pick up accesses to that specific directory and not its sub directories. In the log double click on an entry for more details about the application doing the file access. In one case the application was cmd.exe, and double clicking on it showed that it was really a script calling a security application.
About Symantec End Point Protection, it does not show exclusions. The only way to make sure something is excluded is to have the security team add the exclusion again. Having the list of exclusions recorded is seen as a security risk. Also Even when a directory is excluded End Point Protection will still include it in its initial scan of the drive to identify what directories and files are on the volume.
We have migrated this harvester VM to an enterprise platform (Cisco UCS, Dell PowerVault SAN),allocated 8 CPU and 12gb memory, and have removed Symantec anti-virus.We also verified (using process monitor) that no processes other thanthe NetQos ones are accessing the datafiles directory.
However, we still see the condition where files in the ReaperInput directory arearriving at a rate faster than they can be processed.
CPU utilization is below 50%, and 3.6 of the 12gb memory is utilized.According to the flow statistics page, Harvester max flow rate is 2.5 million,and average flow rate is 1.6 million.
Is there some way to allocate more buffer space or more java processes toallow the system to process data more quickly? Or is the only solution toconfigure additional harvesters?
How many interfaces are sending Netflow to this Harvester?
As mentioned earlier in this thread, the Reaper is Disk I/O intensive, and slow drives are often one of the main causes of the Reaper Service falling behind. VM's often share the physical disk with other VM's resulting in slower processing of files. Dedicated Disk resources should help.