DX NetOps

Expand all | Collapse all

Discovery and metrics not being captured

  • 1.  Discovery and metrics not being captured

    Posted Feb 27, 2013 01:13 PM
    When discovering devices in IM 2.1.0, we have noticed that the availability metric will display 'not supported' in 9 out of 10 devices. When we go to the dashboard for those devices, the window shows 'no data'. We then return to the Monitored Inventory page under Admin and click on "Update Metric Families" on the device, after the refresh, then 'availability' will show supported and we have started to gather data to display in the dashboards.

    We need to do this for almost every single device which hinders us tremendously from discovering a collection of devices as we need to ensure that we go back to every CI and click on 'Update Metric Families'.

    We heard that this 'may' be a known issue but have not heard that there is a fix for this.

    Has anyone else encountered this issue and is there any workarounds that folks are aware of?

    thx,
    jacquie


  • 2.  RE: Discovery and metrics not being captured

    Posted Feb 27, 2013 04:00 PM
    Hi Jacquie,

    This is not something that is normal or a known issue. Does it happen for all new devices you discover, or is this something you see on the majority of devices you discovered some time ago?

    Regards,
    Kyle


  • 3.  RE: [CA Infrastructure Management] RE: Discovery and metrics not being capt

    Posted Feb 27, 2013 07:29 PM
    Hi Kyle,

    This symptom was present in our initial discovery (approx. 150 devices), where perhaps a dozen of them did not have any data for availability. We are going into a soft launch phase and discovering the remaining devices (4500+) and it has progressively gotten worse. At this time, 9 out of 10 devices are displaying this symptom.

    We do have a support ticket open; 21304496-1: METRICS SHOWING NOT SUPPORTED<https://support.ca.com/irj/portal/implsvccasedetails?issueNo=21304496-1> and I posted on the forum to see if anyone else maybe experiencing the same thing.

    As you can imagine, this is a painful process for us as we now have to go back to the device and update the metrics individually.

    I appreciate any feedback to resolving this or working around it.

    Thx,
    jacquie


    From: CA Infrastructure Management Global User Community (eHealth/Spectrum/NetQoS) [mailto:CommunityAdmin@communities-mail.ca.com]
    Sent: Wednesday, February 27, 2013 1:00 PM
    To: mb.98745085.100378209@myca-email.ca.com
    Subject: [CA Infrastructure Management] RE: Discovery and metrics not being captured

    Hi Jacquie,

    This is not something that is normal or a known issue. Does it happen for all new devices you discover, or is this something you see on the majority of devices you discovered some time ago?

    Regards,
    Kyle
    Posted by:kyle_pause
    --
    CA Communities Message Boards
    100380749
    mb.98745085.100378209@myca-email.ca.com<mailto:mb.98745085.100378209@myca-email.ca.com>
    https://communities.ca.com


  • 4.  RE: [CA Infrastructure Management] RE: Discovery and metrics not being capt

    Posted Feb 27, 2013 07:59 PM
    Havent had this problem but have had issues with discovery not properly using DNS to get hostname (using the sysname or IP instead).

    You said you were on 2.1.0. I went from 2.0 to 2.1.2 directly (Data Aggregator Version : 2.1.2.33). Not sure if that is the same version you are running but if not maybe an upgrade and see if issue still exists. On 2.0 we were mainly experimenting and didn't release to our server/networking teams so didn't really look at problems like that. Now that we have upgraded to 2.1.2 I am looking more closely and haven't noticed your issue. That being said, at least 2/3rds of the devices would have already been there under 2.0.

    Not much help to you. I do have a subnet I haven't done a discovery on yet. I will create a new discovery profile and give it a go.


  • 5.  RE: [CA Infrastructure Management] RE: Discovery and metrics not being capt

    Posted Feb 27, 2013 08:27 PM
    I just tested with a subnet not discovered yet and have same issue.

    Its not clear if the problem is in the discovery or at the host being discovered. I have seen snmp tree traverse problems with CA Spectrum and generally doing an snmp walk in the past. I am assuming the Data Aggregator works out what metric families / vendor certs it supports by traversing the snmp tree.

    In my case I got 3 out of 254 devices come up with "Not Supported" for Availability. It also seems say "Not Supported" for everything except Reachability. Normally if the device was pingable but not snmpable it would only show the Reachability under Polled Metric Families however this time it is showing the lot but just has Not Supported next to them. That indicates to me snmp agent running on the host but tree treversal not working.


  • 6.  RE: [CA Infrastructure Management] RE: Discovery and metrics not being capt

    Posted Feb 28, 2013 02:53 PM
    Hi Andrew,

    We are on 2.1.0 – are you on Beta sprints?

    Thx,
    jacquie

    From: CA Infrastructure Management Global User Community (eHealth/Spectrum/NetQoS) [mailto:CommunityAdmin@communities-mail.ca.com]
    Sent: Wednesday, February 27, 2013 4:59 PM
    To: mb.98745085.100379507@myca-email.ca.com
    Subject: RE: [CA Infrastructure Management] RE: Discovery and metrics not being capt

    Havent had this problem but have had issues with discovery not properly using DNS to get hostname (using the sysname or IP instead).

    You said you were on 2.1.0. I went from 2.0 to 2.1.2 directly (Data Aggregator Version : 2.1.2.33). Not sure if that is the same version you are running but if not maybe an upgrade and see if issue still exists. On 2.0 we were mainly experimenting and didn't release to our server/networking teams so didn't really look at problems like that. Now that we have upgraded to 2.1.2 I am looking more closely and haven't noticed your issue. That being said, at least 2/3rds of the devices would have already been there under 2.0.

    Not much help to you. I do have a subnet I haven't done a discovery on yet. I will create a new discovery profile and give it a go.
    Posted by:Andrew1
    --
    CA Communities Message Boards
    100382047
    mb.98745085.100379507@myca-email.ca.com<mailto:mb.98745085.100379507@myca-email.ca.com>
    https://communities.ca.com


  • 7.  RE: [CA Infrastructure Management] RE: Discovery and metrics not being capt

    Posted Feb 28, 2013 05:15 PM
    Hi Jacquie,

    No the 2.1.2 I downloaded was from the GA downloads area not a sprint. I think it was about 2-3 weeks ago. That being said it still seems to have the same issue as you.

    I still believe there is a timeout issue on the snmp get next request. This can sometimes occur if at the time of the request the Host being polled is under high CPU load. Sorry for going all technical. This problem was evident in CA Spectrum and showed up in the form of a "SNMP Get Next Loop Detected" alarm. In basic terms Spectrum was being impatient in wating for a response from the host when it sent out an SNMP Get Next request. It was fixed with a hotfix and customer setting change in 9.2.2.

    I suspect the Data Aggregator is having a similar problem when it does a discovery. In order to determine which metric families a host supported it would need to walk the snmp tree of the host to find out what OIDs it supports. If the timeout value on the snmp requests is too short for a busy host then it will timeout and as a result not return the supported OIDs and thus the supported metric families.

    Feel free to chip in here Kyle, is there a setting that can be changed to increase timeout values?


  • 8.  RE: [CA Infrastructure Management] RE: Discovery and metrics not being capt

    Posted Mar 01, 2013 07:00 PM
    Hi Andrew,

    We see the issue you described below with the “SNMP Get Next Loop Detected” mostly on our Cisco Nexus 7K devices. Especially when we are trying to poll CBQOS metrics. We are currently working with both Cisco and CA to resolve this particular issue.

    The amount of metrics ‘Not supported’ is painful for us as our users are used to seeing all the metrics we can gather from eHealth. Also, the self-cert process is not as intuitive as we had hoped for in IM 2.1, however we have heard that Plato will hopefully make it easier.

    Thx,
    jacquie

    From: CA Infrastructure Management Global User Community (eHealth/Spectrum/NetQoS) [mailto:CommunityAdmin@communities-mail.ca.com]
    Sent: Thursday, February 28, 2013 2:15 PM
    To: mb.98745085.100389445@myca-email.ca.com
    Subject: RE: [CA Infrastructure Management] RE: Discovery and metrics not being capt

    Hi Jacquie,

    No the 2.1.2 I downloaded was from the GA downloads area not a sprint. I think it was about 2-3 weeks ago. That being said it still seems to have the same issue as you.

    I still believe there is a timeout issue on the snmp get next request. This can sometimes occur if at the time of the request the Host being polled is under high CPU load. Sorry for going all technical. This problem was evident in CA Spectrum and showed up in the form of a "SNMP Get Next Loop Detected" alarm. In basic terms Spectrum was being impatient in wating for a response from the host when it sent out an SNMP Get Next request. It was fixed with a hotfix and customer setting change in 9.2.2.

    I suspect the Data Aggregator is having a similar problem when it does a discovery. In order to determine which metric families a host supported it would need to walk the snmp tree of the host to find out what OIDs it supports. If the timeout value on the snmp requests is too short for a busy host then it will timeout and as a result not return the supported OIDs and thus the supported metric families.

    Feel free to chip in here Kyle, is there a setting that can be changed to increase timeout values?
    Posted by:Andrew1
    --
    CA Communities Message Boards
    100391985
    mb.98745085.100389445@myca-email.ca.com<mailto:mb.98745085.100389445@myca-email.ca.com>
    https://communities.ca.com


  • 9.  RE: [CA Infrastructure Management] RE: Discovery and metrics not being capt

    Posted Mar 03, 2013 06:10 PM
    Hi Jacquie,

    Getting totally off topic, in relation to your SNMP Get Next Loop detected in Spectrum, what version are you on. Try 9.2.2.H09 or try 9.2.2 and apply PTF patch Spectrum_09.02.02.PTF_9.2.211. This patch was to fix a problem with the SNMP stack which was causing Management agent lost alarms even though the device was being polled correctly. This seems to have also clear up our Gent next loop detected alarms.

    Refer to this thread which is quite lengthy but worth a read and also refer it back to your support person : 99039947

    In regards the self-cert process, yes it could be improved but I find it a load better than ehealth. Are you a member of the polaris community. If so I have posted, although not documented very well, some of my own certification around APC environment monitors and have some more done for UPS devices (not posted to the community). The UPS ones are probably a good example to use even if you don't have those devices as they demonstrate utilising a single metric family for multiple vendor certs. I have commented to CA more examples are needed and better documetation of what each part of the XML is for.

    In regards to the problem this thread is about, yes the Not Supported problem is painful. In our case it is not coming up as much. I am holding off on a support ticket as we haven't made IM generally available to staff yet and waiting to see what 2.2 brings. There is a debug of the discovery process, have you tried that.


  • 10.  RE: [CA Infrastructure Management] RE: Discovery and metrics not being capt

    Posted Mar 04, 2013 04:25 PM
    HI Andrew,

    Thanks for the info on Spectrum.

    Coming back on topic to IM 2.x and availability not supported, you mentioned you discovered approx 250+ devices and found several of the devices with the symptoms. Did you go in to the console and display them individually or did you run some report that showed those specific devices with unsupported metrics at a glance. I'm trying to find an easier and quicker way to track which devices are showing up with unsupported metrics.

    thx,
    jacquie


  • 11.  RE: [CA Infrastructure Management] RE: Discovery and metrics not being capt

    Posted Mar 04, 2013 08:47 PM
    I did it from the console. I had just discovered a subnet I hadn't before and did a search in the Data Aggregator for that subnet. Then just manually went down through each of the devices. Not ideal for you.

    How good are you at writing perl. The following web service gives you the metric families supported and not supported for each device:

    "http://data_aggregator_host:8581/rest/devices/mfdiscoveryhistory"

    Its rather long so I would output to a file and run a script across it.