DX Infrastructure Management

This is related to top process alarms under cdm probe

  • 1.  This is related to top process alarms under cdm probe

    Posted 08-13-2019 05:06 AM

    Please see here we are getting 2 different kinds of alarms . In alarm1 if you observe all the top processes percentage sums up to
    82% which is actual physical memory usage .But in alarm2 , if you observe  Total cpu on CHCLPRRPDNAP01 is now 95.41% and all the individual processes percentages are Top Processes [grep[128837]-(101.00%)];[grep[126788]-(100.00%)];[grep[127023]-(97.30%)];[grep[129056]-(95.50%)];[perl[100798]-(93.00%)] which is not summing up to 95.41 % !!!

    Why is this deviation ? Is alarms 2 top processes a false alert as the percentages are crossing actual cpu percentage ?? If so how to resolve this ??Please provide a proper analysis / justification .



    Alarm 1

    Severity               : major

    Host Name         : CHEEFRMDNOD02

    IP                            : 10.1.114.138

    Element               : Memory

    Message              : Physical memory usage on CHEEFRMDNOD02 is now 82%, which is above the warning threshold (80%). Top Processes [java[421184]-(61.90%)];[java[279277]-(2.30%)];[java[9380]-(1.50%)];[java[231616]-(1.10%)];[java[186058]-(0.60%)]

    Time                      : 08/13/19 14:25:31

    Probe                    : cdm




    Alarm 2

    Severity               : critical

    Host Name         : CHCLPRRPDNAP01

    IP                            : 10.200.121.28

    Element               : CPU

    Message              : Total cpu on CHCLPRRPDNAP01 is now 95.41%, which is above the error threshold (90%).Top Processes [grep[128837]-(101.00%)];[grep[126788]-(100.00%)];[grep[127023]-(97.30%)];[grep[129056]-(95.50%)];[perl[100798]-(93.00%)]

    Time                      : 08/13/19 14:12:07

    Probe                    : cdm


    Regards
    Amar