Idea Details

SNMP errors/warning should list device ID

Last activity 06-13-2019 09:41 AM
Hilmar Preuße's profile image
05-22-2015 11:57 AM

What is the use case of the new feature?

----------------------------------------

Polling problems due to too slow SNMP agents can be analysed more easily, if its clear which device is responding slow.

 

Describe the new feature in detail.

-----------------------------------

 

The customer sees the following messages in the Apache karaf.log on the DC.

 

015-03-10 09:17:27,429 | WARN  | Timer-2          | SnmpSession                      | m.ca.im.dm.snmp.snmp4j.SnmpProxy  188 | 197 - com.ca.im.data-collection-manager.snmp - 2.4.1.SNAPSHOT |  | Cannot find proxy listener for Response listener:  com.ca.im.dm.snmp.collector.TooBigListener@e437f22 for Request:<snip>

 

We asked CA for an explanation what that message means (00082410). The answer was:

1. SNMP Response from device comes in after timeout has expired

2. 2nd SNMP Response from device comes in after 1 response has been processed (in case of an SNMP retry)

<snip>

 

The customer noticed that in the entry the device ID is not listed. The warning message should contain the device to make debugging these problems more easy.

 

Describe how you envision this new feature being implemented.

-------------------------------------------------------------

The warning message should contain the device ID.

 

What business problem will be solved by adding this new feature?

----------------------------------------------------------------

For some devices no performance data is collected due to slow SNMP responses. The customer needs an efficient method to determine the affected devices.

 

Describe the importance and urgency.

------------------------------------

Low.


Comments

01-09-2018 11:53 AM

Thr customer liked to the solution. So feel free to mark the idea as delivered. Hilmar.

03-20-2017 07:59 AM

Status is still "Under review". Is this correct? Please review Idea status?

09-14-2015 06:24 PM

here is a view from 2.6 - if you have a chance to upgrade and check it out please send your feedback on improvements / ideas / etc.

 

09-14-2015 03:15 PM

Sounds promising! My first request was just a quick and dirty hack to get just a basic implementation/monitoring. If CA is on the way to implement a convenient GUI (like in eHealth) that request is fulfilled.

09-14-2015 09:28 AM

Hans thanks very much for the feedback. Your recommendations are very much in line with our vision as well. In 2.6 we introduce out of box self-monitoring dashboards with KPIs that allow you to monitor usage/activity as well as performance the CAPM subsystems including polling. We are also trying to put more information in the event log (which can be filtered by user/role) so for example when you have a slow device and several requests timeout we throttle further requests for the rest of the poll cycle and log an event on that device. We reviewed this enhancement and R&D suggested that we should consider adding a separate event to detect ‘slow responding’ devices by tracking latency on SNMP requests (which we currently don’t track). We will factor this feedback and any other ideas/requests as we continue to build this out – thanks again for taking the time to share your ideas!

 

Daniel Holmes

Sr. Advisor, Product Management

 

CA Technologies | 273 Corporate Dr Suite 200 | Portsmouth, NH 03801

Office: +1 603 334 2130 | Mobile: +1 603 502 5004 | Daniel.Holmes@ca.com

 

<mailto:Daniel.Holmes@ca.com>[CA]<http://www.ca.com/us/default.aspx>[Twitter]<http://twitter.com/CAInc>[Slideshare]<http://www.slideshare.net/cainc>[Facebook]<https://www.facebook.com/CATechnologies>[YouTube]<http://www.youtube.com/user/catechnologies>[LinkedIn]<http://www.linkedin.com/company/1372?goback=.cps_1244823420724_1>[Google]<https://plus.google.com/CATechnologies>[Google+]<http://www.ca.com/us/rss.aspx?intcmp=footernav>

09-14-2015 08:38 AM

I would prefer a status page in CA-PC GUI to see the status of the polling, e.g. like in eHealth where it was possible to see the names of failed objects and also the timestamp and the reason (time-out, overflow,...).

This page should include

1. an overview with

total number of polled object

number of polling failures

Duration of one poll cycle

 

2. a detailed view with object name, timestamp, and polling failtre reason in case of failed polls

09-07-2015 09:22 PM

Do you think there is a request that is at a higher level than this which is to update our self monitoring to make it easier to identify when devices are not responding in a healthy state to our SNMP requests?

05-27-2015 03:43 AM

So chances are good, that our idea won't be implemented at all.

05-26-2015 10:05 AM

I agree all logging should include context clues for what element/device is causing the issue.

 

I had also opened a support ticket for this logging to learn what it was about, and I just wanted to share the response from Support:

 

"For every SNMP request we send

out, it will be put onto a hash map and then will be removed from the map after

a response is received or the request is timed out.  There are some cases

that we tried to remove the entry twice which then caused the warning.

You can ignore it."

 

Support said these logs will actually be removed in a future CAPM release.