ESXi

 View Only

ESX Server Hang, best way to investigate

  • 1.  ESX Server Hang, best way to investigate

    Posted Apr 15, 2010 10:03 PM

    One of the hosts in my vSphere cluster hung up yesterday. I did not even realize it until the guest I was working on stopped responding and became unpingable. When I walked up to the server console I was able to type in the username, hit enter, but was never prompted for a password. Eventually I just had to power cycle the box. I have spent most of today running Dell hardware diagnostics but all tests are passing (the server is a Poweredge 2950). The build of vSphere is current from a patch/build perspective. There are 5 other servers in the cluster that have not exhibited the same behavior, thankfully. Where is the best place to look to try and determine what happened? I've tried to look at some of the logs but I have yet to find anything that scream "error!". Below are the last events from the vkernel log leading up to the servers sudden lockup, and then eventual forced reboot. They look harmless enough, or does it signify a problem?

    Apr 14 11:35:28 advs11 vmkernel: 8:19:21:05.088 cpu2:8589)Tcpip_Vmk: 1107: Affinitizing 10.10.0.92 to world 8606, Success

    Apr 14 11:35:28 advs11 vmkernel: 8:19:21:05.088 cpu2:8589)VMotion: 1808: 1271259282987038 S: Set ip address '10.10.0.92' worldlet affinity to send World ID 8606

    Apr 14 11:35:28 advs11 vmkernel: 8:19:21:05.090 cpu2:4233)MigrateNet: vm 4233: 1130: Accepted connection from <10.10.0.93>
    Apr 14 11:35:28 advs11 vmkernel: 8:19:21:05.090 cpu2:4233)MigrateNet: vm 4233: 1144: dataSocket 0x4100b819ad20 send buffer size is 263536
    Apr 14 11:35:49 advs11 vmkernel: 8:19:21:26.387 cpu2:8590)VMotion: 2617: 1271259282987038 S: Stopping pre-copy: only 1610 pages were modified, which can be sent within the switchover time goal of 0.500 seconds (network bandwidth ~126.909 MB/s)
    Apr 14 11:35:49 advs11 vmkernel: 8:19:21:26.408 cpu1:8589)VSCSI: 6011: handle 8307(vscsi0:0):Destroying Device for world 8590 (pendCom 0)
    Apr 14 11:35:50 advs11 vmkernel: 8:19:21:27.605 cpu0:8606)VMotionSend: 2913: 1271259282987038 S: Sent all modified pages to destination (network bandwidth ~109.238 MB/s)
    Apr 14 11:42:13 advs11 vmkernel: 8:19:27:50.410 cpu0:7185)<6>megasas_service_aen[4]: aen received
    Apr 14 11:42:13 advs11 vmkernel: 8:19:27:50.410 cpu1:4204)<6>megasas_hotplug_work[4]: event code 0x002c
    Apr 14 11:42:13 advs11 vmkernel: 8:19:27:50.421 cpu1:4204)<6>megasas_hotplug_work[4]: aen registered

    Apr 14 12:48:43 advs11 vmkernel: 0:00:01:33.758 cpu2:4107)World: vm 4243: 1098: Starting world net-cdp with flags 4

    Apr 14 12:48:43 advs11 vmkernel: 0:00:01:33.915 cpu3:4106)World: vm 4244: 1098: Starting world vmkiscsid with flags 4