DX Unified Infrastructure Management

 View Only
  • 1.  Alerts re-appearing in CA UIM 8.31

    Posted May 26, 2016 07:07 AM

    Good day,

     

    We are encountering an issue with CA UIM 8.31 nas alerts whereby after acknowledging an alert, it re-appears with current timestamps even though the metric associated with the alert clearly indicates that the system is up and running.

     

    At first we thought it was the probe. We encountered it in websphere_mq which initially reported that the multi-instance QM was down; in this case we disabled all alerts in the probe, acknowledged the alarm, and it came back with a critical alarm saying it is back up - so we are not even sure why a positive status would even have an alarm if the system is up and no alarms have been configured as active. 

     

    We have encountered a similar situation in sqlserver probe which kept on reporting that the status of the database is down, even though the metrics clearly indicate it was up the whole time during the period specified in the alert itself.

     

    This all results in many alerts emailed which is not the objective. Does the problem lie in the nas probe or the individual probes where the alarms are configured (or not configured in the case of websphere_mq)? Any suggestions would be greatly appreciated.

     

    Regards,

    Prisca



  • 2.  Re: Alerts re-appearing in CA UIM 8.31

    Posted May 26, 2016 10:01 AM

    Hello,

     

    I would start with running a sniffer in Dr. NimBUS to see whether or not the probes in question are sending the alarms.

    Also, I would check to see if NAS is reposting any alarms to hub; filter for all NAS messages.

    Does it work fine with default nas.cfg?

    If you are not at NAS 4.80, then it is worth upgrading.

     

    Thanks



  • 3.  Re: Alerts re-appearing in CA UIM 8.31
    Best Answer

    Broadcom Employee
    Posted May 26, 2016 02:45 PM

    Hi,

     

    If you only have one NAS in your environment this would most likely be coming from the probe itself.

    if you have mutiple NAS setup with replication and forwarding there is a chance this alarm got stuck in some sore of loop.

     

    I would suggest updating all NAS to 4.80 from our support hotfix page.

    CA Unified Infrastructure Management Hotfix Index - CA Technologies

     

    When this type of thing is happening I generally will open Dr nimbus and look at the message bus.

    Narrow down the filter to alarms,probe and robot in this case and see if the alarm is coming in new.

    If  it is then check probe and probe logs.

    If it is not check nas.

     

    Some times we see clients who do not have alarms broken out in their queues.

    IE they are using a * queue or combine alarm with QOS_Messages.

     

    This can cause a problem where alarms get backed up behind QOS and can come in at much later times

    then when the problem happens.

    And because of this delay may come in even after the event is over.

     

    So just something else to check