ESXi

 View Only
  • 1.  Interpreting PSOD

    Posted Jan 03, 2011 10:56 AM

    Hello Experts,

    can someome please help me to find out what's wrong with my ESXi 4.1 server.

    I attached a photo.

    I cannot find a vmkdump, I try to generate it manually.

    Thx,

    Klaus



  • 2.  RE: Interpreting PSOD

    Posted Jan 03, 2011 11:21 AM

    Hi,

    if you refer to here, you would definately find the root cause.

    http://kb.vmware.com/kb/1005184



  • 3.  RE: Interpreting PSOD

    Posted Jan 03, 2011 11:26 AM

    Welcome to the Community,

    I'd suggest you run a hardware diagnostic on your server. If the system manufacturer does not provide such a tool, then at least run a memory check. A lot of errors are caused by defective memory. (http://kb.vmware.com/kb/831)

    André



  • 4.  RE: Interpreting PSOD

    Posted Jan 03, 2011 01:57 PM

    Hi,

    I tried to decode the MCA, but I think I didn't get it. Perheps someone could help me. I attach the kernel-dump.

    Would be nice if we could do this together. This is the first time I do this.

    Thx for your help.

    Some System information for you:

    System: ICO (Intel S5000PSL)

    CPU: Dual Intel Xeon E5410 @2.33 Ghz

    RAM: 16 GB

    Running VMs: 6 (all Windows 2003 R2 32bit)

    I told my colleagues to run Memtest asap.

    Klaus



  • 5.  RE: Interpreting PSOD

    Posted Jan 03, 2011 02:04 PM

    The dumps aren't always the easiest to decode.  MCE's are typically hardware related.  Usually DIMMs or CPU.  The first thing I would do is ensure all your firmware is current.  Then take the VMkernel dump and send it to your hardware vendor.

    I would also suggest running a vm-support dump which will incluse the VMkernel dump and send to your hardware vendor as well.



  • 6.  RE: Interpreting PSOD

    Posted Jan 10, 2011 08:28 AM

    Hi,

    I did memtest 86+ for 72h. No Errors.

    Now I'm searching for a (freeware) CPU stress test util.

    Do you know a goog one?

    Thx,

    Klaus



  • 7.  RE: Interpreting PSOD

    Broadcom Employee
    Posted Jan 21, 2012 01:19 PM

    Hi,

    If you look carefully at the second line of the PSOD screen itself, it explains to a certain degree what the problem is.

    Your server has experienced what is known as a Machine-Check exception (#MC).  Near the bottom of the screen you should see the information from the registers of the Machine-Check Architecture of the CPU that generated the exception.

    Going back to the second line, we also see that VMware ESX has decoded the "MCA Error Code" of the status code.  In that it says that a Bus and Interconnect error was seen.  Memory tests alone may not help in identifying the problematic hardware.  Use the data from the screen and provided it to your hardware vendor to review so they can take the required action to correct the hardware problem that caused this crash in the first place.

    I hope this helps.

    Faisal Akber