ESXi

 View Only
  • 1.  Purple screen

    Posted Apr 10, 2012 03:55 AM

    I am getting the below purple screens on a system. Is this a CPU problem? Since it shows PCPU 2 didn't have a heartbeat or could this be a system board problem. This is a white box system with I7-2600 cpu.



  • 2.  RE: Purple screen

    Posted Apr 10, 2012 04:11 AM

    It's more like a problem with USB device



  • 3.  RE: Purple screen

    Posted Apr 10, 2012 09:18 AM

    I agree.  Notice the first line of the stack trace (?) mentions ehci_irq and the second line mentions usb.

    You should be able to either pin down which USB device is causing the problem or if you don't use it then disable the controllers one by one in the BIOS.

    If you only have one controller and need a USB keyboard then that could be tricky 8)

    Cheers

    Jon



  • 4.  RE: Purple screen

    Posted May 19, 2012 11:13 AM

    I've run into a similar problem, see attached. P8 Z68-v/GEN3 ASUS MB, 32GIG, also running an i7-2600. esxi 5.0.0 (Build 469512) is running on local ATA, and the data stores are on an Apaptec 5404 RAID 10. This is a new build, running 3 VMs.

    This has happened on 3 of 6 nights, it seems when there is low traffic. The purple screens are all similar to the attached, but different PCPU affected.

    2012-05-14T10:13:11.811Z [2B6C8B90 info 'ha-eventmgr'] Event 129 : Issue detected on vmlocal.stsolo.com in ha-datacenter: Heartbeat: 618: PCPU 6 didn't have a heartbeat for 8 seconds. *may* be locked up


    2012-05-14T10:27:22.808Z [2B687B90 info 'ha-eventmgr'] Event 154 : Issue detected on vmlocal.stsolo.com in ha-datacenter: Heartbeat: 618: PCPU 4 didn't have a heartbeat for 8 seconds. *may* be locked up


    2012-05-14T19:27:01.723Z [FFADDA90 info 'ha-eventmgr'] Event 191 : Issue detected on vmlocal.stsolo.com in ha-datacenter: Heartbeat: 618: PCPU 4 didn't have a heartbeat for 8 seconds. *may* be locked up


    2012-05-18T23:37:22.928Z [FF911A90 info 'ha-eventmgr'] Event 49 : Issue detected on vmlocal.stsolo.com in ha-datacenter: Heartbeat: 618: PCPU 0 didn't have a heartbeat for 8 seconds. *may* be locked up


    2012-05-19T02:43:26.927Z [FFF03B90 info 'ha-eventmgr'] Event 51 : Issue detected on vmlocal.stsolo.com in ha-datacenter: Heartbeat: 618: PCPU 6 didn't have a heartbeat for 8 seconds. *may* be locked up

    In my case, unlike in ehinkle's, it lists "vmware.driveAPI" in the first stack trace.

    I'm having trouble figuring out what is up here. I have a couple of other esxi 4 machines running similar hardware without issue.



  • 5.  RE: Purple screen

    Posted May 19, 2012 01:23 PM

    In both psods, i suspect the idt  is not able to release a lock. Can any of you confirm if the interrupts are being shared between usb and another device in first psod and between e1000 and another device in the second psod



  • 6.  RE: Purple screen

    Posted May 20, 2012 01:18 PM

    Thanks zXi_Gamer. That is the case for my P8Z68-V/GEN3 motherboard:

    "The PCIe x16_3 slot shares bandwidth with PCIe x1_1 slot, PCIe x1_2 slot, USB3_34 and eSATA."

    I have removed the e1000 card from the PCIe x16_3 slot and moved it to a different one. I will post if this change is successful.

    ========

    A little more to this for the others with this issue, here is how you check your interrupts:

    http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003710



  • 7.  RE: Purple screen

    Posted May 19, 2012 08:03 PM

    I have had the issue..

    abirhasan 


  • 8.  RE: Purple screen

    Posted Jul 20, 2012 02:05 PM

    This is about the same stsolo machine as mentioned second in this thread. After pulling the duel nic from the system, the machine indeed has performed flawlessly since that date -- until last night. Then, this purple screen.

    There are no USB devices attached to the machine, the VM OS runs on a sata drive, the datastore on a RAID 10. Again, I'm confused as to were to start in identifying the culprit here. Any guidance would be appreciated.



  • 9.  RE: Purple screen

    Posted Jul 20, 2012 02:51 PM

    looks like the message points to usb agani, but you said you ahve no usb devices attached.

    does any vm on the machine have a client conencted usb attached? shouldnt matter but just a tought...

    also did you disable the usb via bios like gerdesj said?



  • 10.  RE: Purple screen

    Posted Jul 20, 2012 02:59 PM

    Thanks. No, no USB connections/ devices within any of the vm's. I didn't disable the bios USB because the machine was rock solid after the removal of the duel nic. I will, however do that tonight.

    Thanks for the feedback.