VMware vSphere

 View Only
Expand all | Collapse all

System Board 8 Memory - Uncorrectable ECC

goppi

goppiJan 26, 2011 07:21 PM

DSTAVERT

DSTAVERTJan 27, 2011 12:20 AM

  • 1.  System Board 8 Memory - Uncorrectable ECC

    Posted Jan 26, 2011 06:25 PM

    We setup ESXi 4.1 with latest patches applied on a brand new HP DL380 G7 with latest FW and latest ESXi Offline Bundle, which shows the ECC problem you can see from the attached screenshot.

    We opened a case at HP and they told us that none of the HP diganostics (IML + Survey) shows any problems at all. We also changed memory modules on bank 8 which didn't change anything. HP said that this seems to be a problem of ESXi displaying wrong information.

    Is there any known problem with ESXi 4.1 showing invalid information?

    Do you have any suggestions?

    Thanks.



  • 2.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jan 26, 2011 06:48 PM

    I would run an extended Memtest to make sure.



  • 3.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jan 26, 2011 07:21 PM

    Already done.

    No problem found.



  • 4.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jan 26, 2011 07:23 PM

    If you have a current VMware Support contract I would give VMware a call.



  • 5.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jan 26, 2011 09:06 PM

    Problem is Essential is only available with subscription and not with basic support so calling VMware for 300$ and getting said that it is a HP thing is not the best option.

    Cheers.



  • 6.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jan 26, 2011 09:29 PM

    Do you have power saving mode enabled in the BIOS. I can't remember the wording but try full power.



  • 7.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jan 27, 2011 12:17 AM

    We had changed that to custom -> OS controlled.

    I will try if this changes anything.

    Cheers



  • 8.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jan 27, 2011 12:20 AM

    Good bet that is the problem.



  • 9.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jan 27, 2011 12:34 AM

    Tried that but problem persists.

    To get sure it is nothing with the installation I reinstalled vanilla ESXi from scratch.

    Same errors are shown in VSphere Client after installation.

    Ran another Survey and all RAM modules are operating correctly and neither correctable nore

    uncorrectable ECC errors have been logged during operation.

    Found in the revision history of latest ESXi patches some problems were fixed

    for ESXi showing some wrong fan and temperatur values however nothing mentioned

    regarding any wrong information about ECC state.

    Cheers.



  • 10.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jan 27, 2011 12:59 AM

    You haven't used the HP version of ESXi to install. When you use the HP version CIM is enabled. When you use the generic install and use the offline bundle I am pretty sure you must enable OEM Cim providers. Also make sure that you have upgraded the firmware to the level as shown for ESXi 4.1. Just applying the latest may go beyond what is supported for ESXi. I would pay some special attention to ILO firmware.

    Try looking at the web system page for the ILO interface. It could confirm or deny HPs claim that RAM is OK.



  • 11.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jan 27, 2011 11:39 AM

    Hi.

    Thanks again for you suggestions but all this we alredy tried.

    1.) Installing vanilla ESXi 4.1 -> problem present

    2.) Adding HP's latest offline bundle -> problem present (It adds some additinal indicators like Disk)

    3.) Applying all patches (currently 2 which are mentioned on the VMware website)

    4.) Checking all diagnostics HP offers (Survey, IML, ILO)

    Running out of ides.

    Cheers.



  • 12.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jan 27, 2011 11:50 AM

    Can I just clarify the problem here.. the screenshot shows badly for me but it looks like it says "deassert" after it followed by status: Normal?



  • 13.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jan 27, 2011 12:39 PM

    J1mbo schrieb:

    Can I just clarify the problem here.. the screenshot shows badly for me but it looks like it says "deassert" after it followed by status: Normal?

    Yes.



  • 14.  RE: System Board 8 Memory - Uncorrectable ECC
    Best Answer

    Posted Jan 27, 2011 12:50 PM

    It seems to me that there is no hardware problem here and ESXi is working correctly.

    The sensor name is "System Board 8 Memory - Uncorrectable ECC", it's status is "deassert" (i.e. not asserted) and hence the health condition is "normal".  If the hardware in the server detects uncorrectable ECC events, the sensor status will change to "assert" or "failure asserted" or similar and the health would then be degraded or failed (that is, if the server was still running).

    Attached is a screenshot of some other sensors reported in this way, in this case fro,m a PowerEdge.

    Hope that helps.



  • 15.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jan 27, 2011 01:37 PM

    OK

    So you say that the shown screenshot does not indicate an error condition at all?

    Maybe we simply interprete it wrong.

    Can anybody verify that this is shown similar on other installations?

    And why it is referreing to System Board 8 Memory?

    Cheers.



  • 16.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jan 28, 2011 01:45 PM

    exactly.....

         The uncorrectable ECC is just a sensor instance. Its deasserted and hence the reading is shown as normal(Green) . If ever something fails on the device monitored by this sensor , then the state of this sensor changes to an assert. That is when the reading becomes red and lets you know it is faulty.

    So there is nothing to worry about as long as the reading is green. I have seen the same on a variety of  hardware.

    In order to confirm, do the following steps:

    1. Install a WBEM client (wbemcli a command line tool,  apt-get wbemcli on ubuntu) on a linux machine.

    2. Do a CIM query to CIM_Sensor: Copy the contents to a file:

        wbemcli ein -noverify 'https://root:<password>@<hostname>:5989/root/cimv2:CIM_Sensor' ElementName,HealthState | tee SensorList.txt

    3. Open SensorList.txt and search for ECC

    <snip>

    Host:5989/root/cimv2:OMC_DiscreteSensor.DeviceID="201.0.32.1"

    -HealthState=5
    -ElementName="Memory Device 34 MCK Mem DIMM >16 0: Uncorrectable ECC"

    </snip>

    4. If the health state above has a value 5 , you have nothing to worry about.



  • 17.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jan 28, 2011 02:34 PM

    The command in step 2 of previous comment should be:

    wbemcli ei -nl -noverify 'https://root:<password>@<hostname>:5989/root/cimv2:CIM_Sensor' ElementName,HealthState | tee SensorList.txt



  • 18.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Feb 02, 2011 08:36 PM

    Hi.

    Sorry for giving feedback so late, but the customer did not have a linux box and I did not find a live CD which includes wbemcli so I had to setup a linux machine first and install the wbem package.

    I can confirm that health state of the ECC sensors is 5 so from what I have learned no reason to worry about. It seems that I was fooled by a somewhat missleading way this information is beeing displayed.

    Thanks again to all for the usefull tips to track down the problem.

    I will try to assign points accordingly.

    Cheers.



  • 19.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted May 16, 2011 11:29 AM

    Did you get anywhere with this please?

    I have a DL380 G7 that is showing a "warning" with System Board 8 showing "deassert".

    Despite power cycling the server and clearing the IML logs in the iLo, the server shows a clean bill of health yet vsphere won't reset the "warning" status on the host hardware tab.

    I can clear the alarm, but that isn't really the point.



  • 20.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jun 25, 2011 06:46 PM

    Could you show me the screenshot



  • 21.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jun 03, 2014 07:52 PM

    All,

    I realize this is an old post I'm bringing back up but it matches exactly a problem I have.

    I understand that HP indicates it is not a problem and VMware says it is safe to ignore it. My issue is How do I reset the red indicator? This is kind of like shutting off the check engine light. Sure I can safely ignore it but other people also look at Virtual Center and ask "Whats this red indicator doing on looks like we have a problem" Then I have to prove to them it is not a problem.

    On the server itself I cleared the IML I also reset the sensors in vCenter. Yet it still indicates there was a problem in VMware.

    Regards,

    Jeff



  • 22.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jun 05, 2014 07:33 AM

    We are experiencing the same issues.

    I've upgrade the ILO Firmwares to 1.70 on DL380 G7 (ilo3) and 1.50 in DL380 G8 (ilo4) and after that 3 hosts show errors on memory within vmware

    System Board 8 Memory: Uncorrectable ECC Current State Assert

    Edit:  We use Vmware 5.5.0



  • 23.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jun 06, 2014 03:46 PM

    Same here as well. Vmware 5.1.0 (1483097)

    I also upgraded iLo4 the other day from 1.3 > 1.5. Decided today just to check everything and lo and behold, all my DL380p have this same alert.

    System Board 8 Memory: Uncorrectable ECC           "Alert"        Current State:Assert



  • 24.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jun 06, 2014 04:22 PM

    I Migrated all the vm's to another host and rebooted the host , did other firmwareupdates and after that the reset sensors did the trick, all is fine now.



  • 25.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jun 06, 2014 04:24 PM

    Same here on VMware 5.1

    We are on ILO 1.32 for DL380p gen8

    Interrestingly nothing shows up in IML.

    What version of CIM providers the guys are running seeing this issue?



  • 26.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jun 06, 2014 04:30 PM

    Mine has also disappeared after a few days of constantly showing up. I wonder if it didn't just need to wait for a little while to see that the IML had been cleared.



  • 27.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jun 06, 2014 04:51 PM

    But did it show up in IML?

    I still suspect this is a false positive.

    Normally uncorrectable ECC errors are also written to SPD of the DIMM module peristently.

    So there should be no chance to get rid of this message if it once appears and is no false alarm.



  • 28.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Jun 06, 2014 04:55 PM

    There were issues with the memory. I replaced the faulty dimms before installing ESX so it was not a false positive just lingered for awhile before clearing up.



  • 29.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Oct 23, 2012 12:30 PM

    I believe this can safely be ignored see post below:

    SUPPORT COMMUNICATION - CUSTOMER NOTICE

    Document ID: c03478508

    Version: 1

    Notice: VMware ESXi 5.0 - Physical Memory in ProLiant Server Platforms Is Reported as "Deassert" Within the Hardware Tab of VMware vCenter
    NOTICE: The information in this document, including products and software versions, is current as of the Release Date. This document is subject to change without notice.

    Release Date: 2012-09-04

    Last Updated: 2012-09-04


    DESCRIPTION

    Physical memory used in ProLiant server platforms is reported as "Deassert" on the Hardware tab of VMware vCenter. Under the Details column, the memory modules are reported as "Current State: Deassert."

    DETAILS

    The information to populate the "Current State: Deassert" field is obtained from the standard IPMI Memory sensor supported by ProLiant servers.

    The following messages are reported in VMware vCenter.

    System Board 8 Memory Status: Uncorrectable ECC Current State:Deassert

    System Board 8 Memory Status: Correctable ECC logging limit reached Current State:Deassert

    The reporting of these physical memory messages as "Current State Deassert" on the Hardware tab can be safely ignored and does not indicate that there is reason to take any action. When a "Correctable ECC logging limit reached" or an "Uncorrectable ECC" condition occurs on any DIMM in the server, this sensor will report the appropriate sensor as "asserted" and an entry will be logged into the System Event Log (SEL) and the HP ProLiant Integrated Management Log (IML).

    http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&taskId=115&prodSeriesId=4091412&prodTypeId=15351&objectID=c03478508



  • 30.  RE: System Board 8 Memory - Uncorrectable ECC

    Posted Mar 03, 2015 07:14 PM

    In my case there was no error with hardware. ESXi 5.5 2456374.

    After clearing all logs on iLo and resetting sensors on the host, I dropped down the Sensors - to - System event log (SEL) and cleared SEL too. Yellow bang is now gone.