ESXi

Expand all | Collapse all

3 times Purple Screen of death

  • 1.  3 times Purple Screen of death

    Posted Jan 04, 2014 01:20 PM

    Hi

    I'm looking for help, last few days im getting PSOD

    What is cause of that .

    My machine conflagration

    ESXi host is installed on USB stick.

    Supermicro X10SLH-F with Intel Xeon E3-1270 v3 on board

    2 x 8GB GOODRAM ECC UNBUFFERED DDR3 1600MHz PC3-12800E UDIMM | W-MEM1600E38G

    Be Quiet! Dark Power PRO 10 650W 80PLUS Gold

    LSI MegaRAID SAS 9271-8i

    6 x Seagate SV35 Series (3TB, 64MB, SATA III-600) (ST3000VX000)

    Please take a look on below print screens.

    Best regards and im waiting for quick replay

    http://s29.postimg.org/vdx717mo6/ESXi_Purple_Screen_of_death_4_1_2014.jpg

    http://s29.postimg.org/fw9oxig7a/ESXi_Purple_Screen_of_death_2_1_2014.jpg



    http://s29.postimg.org/5jnebfmo6/esxi_purple_screen.jpg



  • 2.  RE: 3 times Purple Screen of death

    Posted Jan 04, 2014 05:51 PM

    Looks like the print screens that you have uploaded is broken. Please attach them again.

    Meantime, Have a look at this - VMware KB: Interpreting an ESX/ESXi host purple diagnostic screen



  • 3.  RE: 3 times Purple Screen of death

    Posted Jan 04, 2014 06:03 PM

    i can see sceens, what iswrong ?



  • 4.  RE: 3 times Purple Screen of death

    Posted Jan 05, 2014 02:59 AM



  • 5.  RE: 3 times Purple Screen of death

    Posted Jan 05, 2014 11:15 AM


  • 6.  RE: 3 times Purple Screen of death



  • 7.  RE: 3 times Purple Screen of death

    Broadcom Employee
    Posted Jan 06, 2014 12:11 AM

    The exception 14 is a page file fault meaning its tried to load a page file into memory but its failed.

    I have seen this when storage drops off.

    And looking at you logs there is an heap of storage errors just before it PSOD like below which would account for failed pages

    2013-12-19T20:28:43.051Z cpu5:34045)ScsiDeviceIO: 2337: Cmd(0x412e80824e00) 0x85, CmdSN 0x9f from world 34427 to dev "naa.600605b005d40d601a29058659cdb9ce" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

    2013-12-19T20:28:43.051Z cpu5:34045)ScsiDeviceIO: 2337: Cmd(0x412e80824e00) 0x4d, CmdSN 0xa0 from world 34427 to dev "naa.600605b005d40d601a29058659cdb9ce" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

    2013-12-19T20:28:43.051Z cpu5:34045)ScsiDeviceIO: 2337: Cmd(0x412e80824e00) 0x1a, CmdSN 0xa1 from world 34427 to dev "naa.600605b005d40d601a29058659cdb9ce" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

    2013-12-19T20:58:43.055Z cpu5:32794)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x85 (0x412e80849840, 34427) to dev "naa.600605b005d40d601a29058659cdb9ce" on path "vmhba1:C2:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

    Does the server have a power saving option?

    I have noticed that a low c state can cause the storage to drop, I recently came across identical issues with HP blades that had HP dynamic power mode settings set, When changing this removed the PSODs that you are seeing here. So give that a go im not familiar with supermicro but im sure it would have power saving features which do not mix well with ESXi



  • 8.  RE: 3 times Purple Screen of death

    Posted Jan 06, 2014 09:40 PM

    At the moment havent UPS.

    Other hints ?



  • 9.  RE: 3 times Purple Screen of death

    Broadcom Employee
    Posted Jan 06, 2014 11:48 PM

    not a UPS, but the CPU powersaving mode
    that is normally set in the bios....



  • 10.  RE: 3 times Purple Screen of death

    Posted Jan 06, 2014 11:53 PM

    Tommorow i will check this via IPMI

    and I will provide screens :smileyhappy:

    I saw also that option into bios...



  • 11.  RE: 3 times Purple Screen of death

    Posted Jan 07, 2014 04:04 AM

    This is actually a known issue with the Intel E1000 NIC that is used by the guest VM. Power down the VM, then change the network adapter to VMXNET3 (for all the VMs on the host), and the host will stop PSODing.

    It can be worked around by following this KB:

    VMware KB: ESXi 5.x host experiences a purple diagnostic screen with errors for E1000PollRxRing and E1000DevRx



  • 12.  RE: 3 times Purple Screen of death

    Posted Jan 08, 2014 04:58 PM

    I think this was my issue.  Has this yesterday and again today.  I just P2Ved a machine and it had a E1000 NIC on it.  Will find out if it crashes again, although I already moved a all the production servers off this VM.

    Thanks,

    -Jeff



  • 13.  RE: 3 times Purple Screen of death

    Posted Jan 08, 2014 06:10 PM

    Okay, make sure to change the VM's NIC to VMXNET3 and then you shouldn't have those problems anymore. :smileyhappy: