DX Infrastructure Manager

Expand all | Collapse all

Alarm policies not clearing alarms

  • 1.  Alarm policies not clearing alarms

    Posted 11-12-2019 04:35 AM

    Hello!

    Since we are monitoring close to a 100 robots, we decided to switch to using MCS for CDM, in order to make life a bit easier.

    We've deployed the enhanced profiles:

    Setup cdm, CPU Monitor, Default Disk(s), Disk IO and Memory Monitor.

    We have 4 alarm policies. One for CPU, one for memory and two for disk.

    The problem is that while alarms are being generated by the policies, they are not cleared. 

    If we look at one of the disk policies, we've set up a condition with disc free % metric.

    The thresholds are set to <= 12% (warning), <= 8% (major) and <= 4% (critical). Alarms are being triggered when these thresholds are reached, but they are not cleared when free space is back to more than 12%.

    Has anyone experienced similar issues?

    Versions:

    Hub 9.20HF6, Data_engine 9.20HF2, mon_config_service 9.20hf1, nis_server 9.10.

    Regards

    Espen B Hanssen



  • 2.  RE: Alarm policies not clearing alarms

    Posted 11-13-2019 04:42 AM
    I tested this on a similar environment, but here the alarms are cleared.
    The only difference is that I changed the thresholds to test it.


  • 3.  RE: Alarm policies not clearing alarms

    Posted 11-13-2019 06:57 AM

    Hm....
    For a moment I thought the problem could be related to the fact that we had added a prefix to the alarm messages (Disk - for disk-alarms, CPU - for CPU-alarms etc) in order to make sorting easier.

    But even after reverting to the default setup, alarms are generated but not cleard.

    The weird thing is that we have a condition that triggers if processor queue length hits 80% over the baseline. These alarms are cleared automatically.

    I've exhausted all my ideas, so I guess I just have to open a support ticket to see if they have any ideas.

    Regards
    Espen




  • 4.  RE: Alarm policies not clearing alarms

    Posted 11-13-2019 07:07 AM
    You could use drnimbus to check if you see a clear? (not sure if he sees all clears)
    In my environment I changed the nas so that he shows also the clear message in the console (+ rule that closes these clears after xx minutes)


  • 5.  RE: Alarm policies not clearing alarms

    Posted 11-14-2019 02:35 AM

    Tried to use DrNimBus, and I can only see the alarms, no clears. Going to examine the plugin_metric.cfg a bit closer to see if it can give any clues.

    Have opened a support ticket to see if they have any tips.

    //Espen




  • 6.  RE: Alarm policies not clearing alarms

    Posted 01-15-2020 08:02 AM
    Hi Espen,

    Did you receive any answer for the Support Team?
    I have the same problem im my environment.
    I realize this happen when using more than one threslhold for the same QoS.
    I open the case 20141321. The Engineering Team is analyzing the case.

    Nei.  



  • 7.  RE: Alarm policies not clearing alarms

    Posted 01-15-2020 08:45 AM

    Hi Nei,

    No they are still working on it.

    But I'm going to try and change to one threshold pr. QoS and see if I get the same result as you.

    regards

    Espen




  • 8.  RE: Alarm policies not clearing alarms

    Posted 01-15-2020 08:29 AM
    Espen,

    Could you please provide your Support Ticket number for review?


    ------------------------------
    Technical Support Engineer
    Broadcom
    ------------------------------



  • 9.  RE: Alarm policies not clearing alarms

    Posted 01-15-2020 08:34 AM

    Hi,

    Support ticket is # 20104451


    Regards

    Espen




  • 10.  RE: Alarm policies not clearing alarms

    Posted 01-15-2020 08:39 AM
    Thanks, seems that the status is work in progress

    ------------------------------
    Technical Support Engineer
    Broadcom
    ------------------------------



  • 11.  RE: Alarm policies not clearing alarms

    Posted 01-15-2020 09:55 AM

    Removed thresholds from Condition Free Disk%, leaving only one threshold.

    Now alarms are automatically closed.

    So as Nei suggested, the issue seems to be related to having more than one threshold in a Condition:

    I've submitted and update about this on my Support Ticket.

    Regards

    Espen