VMware vSphere

 View Only
  • 1.  Health Status Memory degraded

    Posted Sep 08, 2010 07:09 AM

    Hi,

    I have a problem with a DL365. VC is reporting one of the memory chips is degraded under the healthstatus. When I login via the ILO I see no errors. Which system should I believe?

    Cheers

    Michael



  • 2.  RE: Health Status Memory degraded

    Broadcom Employee
    Posted Sep 08, 2010 08:18 AM

    The ILO will be the one to trust, but that only means there is no DIMM Failures

    VMware requires the the Memory to be distribute equally between the CPUs ( x2 I presume)

    So in essence the ESX might not be happy with the memory setup

    Each CPU socket has 2 Banks with 2 DIMM slots per bank

    Make sure the DIMMS is distributed equally and on the Lid of the server The Banks will be displayed A B C and D

    Regards

    If you find this or any other answer useful please consider awarding points by marking the answer helpful or correct. Thank you.



  • 3.  RE: Health Status Memory degraded

    Posted Sep 08, 2010 08:33 AM

    The server is numa blanaced. 4X4GB dimms per CPU. The Dimm in slot 4 in the health status is marked as degraded. I wonder if the HP CIM agents are doing some further diags on the memory



  • 4.  RE: Health Status Memory degraded

    Broadcom Employee
    Posted Sep 08, 2010 09:11 AM

    So the configuration is correct :smileycool:

    The Dimm in slot 4 in the health status is marked as degraded. I wonder if the HP CIM agents are doing some further diags on the memory

    If HP health status shows it as degraded then there is an issue, and need to be replaced.

    HP Systems Insight Manager does do allot of checks on hardware and monitors them closely. The Warning should give you more info on why it shows degraded, but my guess is is a pre-failure. In short that DIMM is on its way out :|

    If at all possible, you can switch 2 DIMMs around to test, but I would just log a call and get it replaced if it is still under warranty

    If it is out of waranty I would replace that dimm with another to double check it is not the slot itself, before i spend money on the replacement parts

    Regards

    If you find this or any other answer useful please consider awarding points by marking the answer helpful or correct. Thank you.



  • 5.  RE: Health Status Memory degraded

    Posted Sep 08, 2010 08:36 PM

    Thanks, as you suggest I will get the memory swapped. Interesting the ILO doesnt flag the degraded DIMM, goes to show the CIM agents are helpful!!



  • 6.  RE: Health Status Memory degraded
    Best Answer

    Broadcom Employee
    Posted Sep 09, 2010 08:14 AM

    That could be to do with the firmware.

    Actually i should have mentioned this earlier, try doing the a firmware upgrade on that machine and see if the problem persists.

    Latest Proliant Firmware = 9.1

    Latest ILO Firmware = 2.00

    Don't be shy with points :-) hehe

    If you find this or any other answer useful please consider awarding points by marking the answer helpful or correct. Thank you.



  • 7.  RE: Health Status Memory degraded

    Posted Sep 09, 2010 08:17 AM

    points :smileyhappy:

    It is running the latest firmware versions. I think the CIM agents must flag memory errors and keep a count. After rebooting the server the error has gone. I will keep an eye on it and see if it returns

    Cheers