How does the Spectrum Trap Director mechanism work, and why should it not be used in very high trap rate environments? The trap director checks each incoming trap, and checks its internal cache to see if the agent IP in the trap is already known. If unknown, it asks each server in the DSS if the IP is modeled. Entries thus found are entered in the cache (also if nothing is found, we note that too). Any IP found on a remote server means that we will forward the trap to that server. To do that, we convert the trap to an internal alert object, and put it on the alert forwarding queue. This queue is served in order, each item processed (trying to be forwarded) at a time. Be cautious about using trap director with very high trap rates (a rule of thumb for a normal rate is 1-10 traps/second per SpectroSERVER) since the system forwards a single item at a time, and also waits for the remote server to process the item, which, if overloaded, could mean a significant delay, with alerts accumulating in the queue. On the remote server, when it receives such an alert, it finds the corresponding model, and will create an event on it, based on the alert mappings. That event can then also result in an alarm, if the event actions call for it. You may, for example, see any ‘traps from unmanaged SNMP device’ (0x10802) events on the receiving server (at the VNM model). If you, at any time, see that the alert_remote_fwd_queue_length attribute is non-zero, for a longer time, and growing, that means that the trap director can’t handle the current incoming trap rate. To summarize the flow, it is: Forwarding: Trap -> alert -> alert forwarding -> event -> alarm Local: Trap -> alert -> event -> alarm (For usage information on the Trap Director, please reference the Distributed SpectroSERVER Administrator Guide).
Just to add more detail:
Trap Director is a feature you can enable on a landscape which allows you to forward traps to other landscapes. This is useful because in a large environment, you can have just 2 IP's (primary and secondary trap receiver) that are configured on all your devices. You don't have to manually split up your estate into different regions and have e.g.: The first 33% of all configured devices send to the first landscapes IP for traps, and the second 33% on the second landscape and so forth. You can have just the 2 IP's and all devices are configured with them. When traps are received by the Trap Director, the landscape now gets the ability to send them to the devices on other landscapes. You can also move devices across landscapes and Spectrum will handle this all for you. If you did not have a Trap Director, each time you moved the device, you would have to reconfigure it to send traps to the relevant landscape.
Also communication is as follows:
1. Trap is received by Trap Receiver (Trap Director)
2. Trap is converted to an event (internal to Spectrum)
3. Is the destination IP in the Trap Director's cache? If yes, then send the new event to the relevant landscape
4. If the destination IP the trap is coming from is not on the local Landscape's cache, it asks the MLS for it.
a. If the MLS knows where the IP is, this info is handed over to the Trap Director and then the event is forwarded to the relevant landscape.
b. If the MLS does not have it, it asks all landscapes if anyone knows where a device with an interface with the IP of x.x.x.x is. Once it obtains this info, it will add it to it's local cache (default age-out timer is 3 hours).
5. if the MLS has a response it forwards the event to the relevant Landscape.
6. If the MLS does not get any answer, the Trap Director will place all events under the VNM. You will see traps coming from unknown devices (e.g. devices which are not modelled) on the Trap Director.
a. Any IP's which Spectrum knowns about, will have the trap info under the device with that IP.
b. Always make sure the traps are coming from the correct IP - if you have multiple interfaces, sometimes you can expect the trap to come from the first IP, but in fact routing setup makes it go out the second. Usually the device configuration allows you to specify the trap-source interface or IP Address.
c. If you model a device using SNMP Spectrum will be able to find out about other IP's on that device and know if it gets a trap for a specific IP that it belongs to that device. If you use ICMP, it will know only about the IP address you are pinging.
d. If you want to be able to differentiate traps use different trap community for the sending of traps. This is useful if needed so you could for example have different community strings for traps coming from e.g. voice, date, wireless, etc.
We have a high volume of traps, and Trap Director could not keep up (we had about 100 traps/second peaks and sustained 60-80 traps/second).
If you need to manipulate traps, I would suggest using TrapEXPLODER or if you want to 'replicate' traps to other hosts you can use something called samplicate.
Regarding the MLS, whenever a landscape needs to know about where devices are, it will ALWAYS first query it's local cache, and then the MLS. The MLS is like Active Directory for Spectrum. It is responsible for knowing where devices/ip's are and what landscapes they are on.
You can also set the Trap Director cache to a different value (other than 180 minutes). This is configurable.
As mentioned by nicja04, you can have performance problems when you get too many traps. If you find that alarms are being delayed, check the 'Alert forwarding queue length' (see link below).
Sorry some detail is similar to nicja04, but I had this saved as is already.
Tech Tip: Debugging trap processing in a Spectrum Distributed SpectroSERVER environment with Trap Director enabled
Spec KB: CA Spectrum Trap Director "Alert forwarding queue length" is increasing in the "Trap Management" subview of OneClick
introduced with CA Spectrum R9.4.1 and higher the TRAP-forwarding implementation is re-structured to allow more+concurrent processing. We did tests seeing 200 Traps/sec and higher (constant workload) being successfully handled in a DSS/Distributed Spectrum installation. IT IS important that the MLS is not very high workloaded (which we recommend for a bigger DSS in any case / i.e. +6 landscape). As you had explained there is a lot of "cross-communication" to find a device model for a "new IP-address". Once this is found, the landscape processing the initial inbound TRAP "knows" where to send the alert/event-info directly. This is then the best processing option.
Therefore - in case your network devices will send "continuous" Traps - then the default "trap_cache_age_out_minutes" with 180 minutes is always fine.
Just consider that commonly many devices will not send Trap before every 180 minutes - which then always cause an age out - and a "re-sync" needs to be done.
We saw a good "tuning" to set the VNM-model attribute 0x12ad5 / trap_cache_age_out_minutes" to 1440 (1day) .. that will significantly lower the cross-communication (and wont affect the high frequency devices).
Good to know about the 9.4.x trap forwarding capacity increase! I imagine with Spectrum 10 they could make it even faster (I'm assuming nothing was done to TrapDirector yet as they focused mainly on other more requested featuers such as device/model capcity, etc!).
We have ours set to the default, but you are correct, perhaps 1 day would be better!
I know this is an old post, but I want to reference the updates made to TrapDirector made in 9.2.3 H11 https://communities.ca.com/message/109888240#109888240 and note that, based on that, TrapDirector should be capable of significantly higher trap volumes now.
Yes, that is correct. And to add to that, trap rates are even higher in 10.0 and above thanks to 64 bit SS…