VMware vSphere

 View Only
  • 1.  Purple screen of death

    Posted Oct 22, 2009 08:48 PM

    Got a surprise today. The ESXi box wasn't responding so I went to check the physical server and found that it had posted a purple error screen.

    I've attched the pics of the error page.

    Any idea what the cause was?

    Whatever the case, it seems to be hardware related. Tried rebooting the system, but now it won't post... I'll have to try some troubleshooting tomorrow.



  • 2.  RE: Purple screen of death

    Posted Oct 22, 2009 08:52 PM

    I will take a stab and say memory gone bad

    Steve Beaver

    VMware Communities User Moderator

    VMware vExpert 2009

    ====

    Co-Author of "VMware ESX Essentials in the Virtual Data Center"

    (ISBN:1420070274) from Auerbach

    Come check out my blog: www.theVirtualBlackHole.com

    Come follow me on twitter

    *Virtualization is a journey, not a project.*



  • 3.  RE: Purple screen of death

    Posted Oct 22, 2009 08:56 PM

    Hello.

    You might be able to pull something out of kb 1005184.

    Good Luck!



  • 4.  RE: Purple screen of death

    Posted Oct 22, 2009 09:02 PM

    Typically when this happens it a hardware issue. Run diagnostics if you can to see if memory, Hard drive or other parts went bad....



  • 5.  RE: Purple screen of death

    Posted Oct 23, 2009 06:43 PM

    I got the system to start up, but on the first boot it reported that the BIOS had been corrupted. The motherboard has a built-in recovery ability so after the BIOS was restored, the system posted fine and loaded ESXi.

    The system ran for a little while but then crashed again. Purple screen but a new crash condition. (photo's attached)

    Ran a memory test with MemTest86+, after 6 passes there were zero errors. Not an exhaustive test, but usually I've seen errors after one pass. I may try a memory swap with a stable system if I don't find the root problem elsewhere.

    When I had the system running for a while I noticed some problems in vShere. On the Inventory panel most VM's are now listed as Unknown (see screenshot).

    Also the Server 2003 that was running fine when I took that screenshot

    now either blue screens while loading windows, or crashes at some point

    after I login and get to the desktop. On one of the crashes I get an IRQL_NOT_LESS_OR_EQUAL blue screen...

    I also have another VM that's unstable. I wonder if their instability is caused by the same issue, or if they've become corrupt because of the crash.



  • 6.  RE: Purple screen of death

    Posted Oct 24, 2009 04:19 AM

    If the disks are mountable in another system you might be better off not trying to restart potentially risking more corruption. Set the disks aside and wait until you have a reliable platform to run on. Were you running ESXi on the hard disks or USB?



  • 7.  RE: Purple screen of death

    Posted Oct 24, 2009 10:18 PM

    ESXi was installed on an internal hard drive.

    Are there issues associated with running on a USB drive?



  • 8.  RE: Purple screen of death

    Posted Oct 24, 2009 10:23 PM

    Servers from HP, Dell, IBM, etc can be ordered that way. I use HP and only use USB.



  • 9.  RE: Purple screen of death

    Posted Oct 24, 2009 10:35 PM

    There are many disk cloning software solutions. You want something that clones sector by sector. There is a download-able Linux Ghost work alike (google search). I would clone your hard drive(s) and work with clones. If you have irreplaceable data keep it safe. I don't know whether this is RAID? If you can get yourself another platform to run on use a USB stick to install ESXi 4. Installing to USB is an install option. Don't put any drives in the machine until you are installed and set up. If you have RAID you will need to find out if the controller is OK. Install the controller in your new platform add a blank drive and do some tests on the drive. If it works plug in your clone disk(s) and see if ESXi finds your datastore. You may need to rescan the disks from the 4 client. ?????? from there.



  • 10.  RE: Purple screen of death

    Posted Oct 27, 2009 03:12 AM

    No RAID on the system at the moment.

    I've tested the VM's on another machine and they seem ok.

    I managed to get the system running for a few days without any crashes by reflashing the BIOS to the latest version. For about 3 days it was solid under various loads.

    Yesterday it crashed and and has crashed two more times since then, presenting a different error message on the purple screen each time, usually to do with memory.

    I gave support a call and they suggested that the motherboard could have a grounding issue. I'll try removing it from the chassis to see if that's the issue. If not I'll RMA it.