ESXi

 View Only
  • 1.  Purple Screen - Requesting help to understand root cause

    Posted Sep 04, 2016 02:20 PM

    Hi Forum, My host experience purple screen MCE. I took a screen pic and collected zdump however I am not able to interpret the root cause because of my limited understanding of troubleshooting techniques. I have attached the pic and extracted log file. Requesting you help to understand the root cause. Adding some portion of the log below.

    2016-09-04T10:45:22.687Z cpu3:33122)<6>hub 2-0:1.0: suspended

    2016-09-04T10:46:17.228Z cpu0:32782)World: 9740: PRDA 0x418040000000 ss 0x0 ds 0x4018 es 0x4018 fs 0x0 gs 0x0

    2016-09-04T10:46:17.228Z cpu1:33079)World: 9740: PRDA 0x418040400000 ss 0x0 ds 0x4018 es 0x4018 fs 0x0 gs 0x0

    2016-09-04T10:46:17.228Z cpu2:32781)World: 9740: PRDA 0x418040800000 ss 0x0 ds 0x4018 es 0x4018 fs 0x0 gs 0x0

    2016-09-04T10:46:17.228Z cpu3:35426)World: 9740: PRDA 0x418040c00000 ss 0x4018 ds 0x4018 es 0x4018 fs 0x0 gs 0x0

    2016-09-04T10:46:17.228Z cpu1:33079)World: 9742: TR 0x4000 GDT 0xfffffffffc60a000 (0xffff) IDT 0xfffffffffc608000 (0xffff)

    2016-09-04T10:46:17.228Z cpu0:32782)World: 9742: TR 0x4000 GDT 0xfffffffffc60a000 (0xffff) IDT 0xfffffffffc608000 (0xffff)

    2016-09-04T10:46:17.228Z cpu2:32781)World: 9742: TR 0x4000 GDT 0xfffffffffc60a000 (0xffff) IDT 0xfffffffffc608000 (0xffff)

    2016-09-04T10:46:17.228Z cpu1:33079)World: 9743: CR0 0x80050031 CR3 0x17b6bb000 CR4 0x42668

    2016-09-04T10:46:17.228Z cpu0:32782)World: 9743: CR0 0x80050031 CR3 0x17ba56000 CR4 0x42668

    2016-09-04T10:46:17.228Z cpu3:35426)World: 9742: TR 0x4000 GDT 0xfffffffffc60a000 (0xffff) IDT 0xfffffffffc608000 (0xffff)

    2016-09-04T10:46:17.228Z cpu2:32781)World: 9743: CR0 0x80050031 CR3 0x16908a000 CR4 0x42668

    2016-09-04T10:46:17.228Z cpu3:35426)World: 9743: CR0 0x80050031 CR3 0x17c0e3000 CR4 0x42668

    2016-09-04T10:46:17.259Z cpu1:33079)Panic: 634: Panic from another CPU (cpu 1, world 33079): ip=0x418011c780a0 randomOff=0x11c00000:

    Machine Check Exception: Fatal (unrecoverable) MCE on PCPU1 in world 33079:helper39-3

    System has encountered a Hardware Error - Please contact the hardware vendor

    2016-09-04T10:46:17.259Z cpu1:33079)Backtrace for current CPU #1, worldID=33079, rbp=0x410014740008

    2016-09-04T10:46:17.259Z cpu1:33079)0x4390c9b9ba80:[0x41801246093b]e1000_intr@<None>#<None>+0x77 stack: 0x0, 0x41801231005d, 0x101c, 0x

    2016-09-04T10:46:17.259Z cpu1:33079)0x4390c9b9baa0:[0x41801231005d]Linux_IRQHandler@com.vmware.driverAPI#9.2+0x25 stack: 0x418012310054

    2016-09-04T10:46:17.259Z cpu1:33079)0x4390c9b9bad0:[0x418011c5a3d6]IntrCookie_DoInterrupt@vmkernel#nover+0x41e stack: 0x780, 0x0, 0x430

    2016-09-04T10:46:17.259Z cpu1:33079)0x4390c9b9bb80:[0x418011c56940]IDT_IntrHandler@vmkernel#nover+0x104 stack: 0x0, 0x418040400200, 0x0

    2016-09-04T10:46:17.259Z cpu1:33079)0x4390c9b9bbb0:[0x418011cc7044]gate_entry_@vmkernel#nover+0x0 stack: 0x0, 0x0, 0x0, 0x0, 0x41804040

    2016-09-04T10:46:17.259Z cpu1:33079)0x4390c9b9bc70:[0x418011f0263a]Power_HaltPCPU@vmkernel#nover+0x1f2 stack: 0x417fd1e83ea0, 0x4180405

    2016-09-04T10:46:17.259Z cpu1:33079)0x4390c9b9bcc0:[0x418011e0fc68]CpuSchedIdleLoopInt@vmkernel#nover+0x2f8 stack: 0x27f2159ff62b, 0x10

    2016-09-04T10:46:17.259Z cpu1:33079)0x4390c9b9bd40:[0x418011e133bd]CpuSchedDispatch@vmkernel#nover+0x16b5 stack: 0xffffffffffffffff, 0x

    2016-09-04T10:46:17.259Z cpu1:33079)0x4390c9b9be60:[0x418011e13f84]CpuSchedWait@vmkernel#nover+0x240 stack: 0x0, 0x43054b304240, 0x5d01

    2016-09-04T10:46:17.259Z cpu1:33079)0x4390c9b9bee0:[0x418011e141be]CpuSched_TimedWaitIRQ@vmkernel#nover+0x7e stack: 0x43054b304240, 0x4

    2016-09-04T10:46:17.259Z cpu1:33079)0x4390c9b9bf30:[0x418011c503ce]helpFunc@vmkernel#nover+0x5f2 stack: 0x0, 0x43054b3037e0, 0x27, 0x0,

    2016-09-04T10:46:17.259Z cpu1:33079)0x4390c9b9bfd0:[0x418011e14c1e]CpuSched_StartWorld@vmkernel#nover+0xa2 stack: 0x0, 0x0, 0x0, 0x0, 0

    2016-09-04T10:46:17.259Z cpu1:33079)Panic: 769: Halting PCPU 1.

    2016-09-04T10:46:17.290Z cpu3:35426)Panic: 634: Panic from another CPU (cpu 3, world 35426): ip=0x418011c780a0 randomOff=0x11c00000:

    Machine Check Exception: Fatal (unrecoverable) MCE on PCPU3 in world 35426:vmm0:win7

    System has encountered a Hardware Error - Please contact the hardware vendor

    2016-09-04T10:46:17.290Z cpu3:35426)Backtrace for current CPU #3, worldID=35426, rbp=0x0

    2016-09-04T10:46:17.290Z cpu3:35426)0x43911311bcf8:[0x418011f0263a]Power_HaltPCPU@vmkernel#nover+0x1f2 stack: 0x417fd1e83ea0, 0x418040d

    2016-09-04T10:46:17.290Z cpu3:35426)0x43911311bd48:[0x418011e0fc68]CpuSchedIdleLoopInt@vmkernel#nover+0x2f8 stack: 0x27f2159b4ea2, 0x10

    2016-09-04T10:46:17.290Z cpu3:35426)0x43911311bdc8:[0x418011e133bd]CpuSchedDispatch@vmkernel#nover+0x16b5 stack: 0x4391135a7b00, 0x4391

    2016-09-04T10:46:17.290Z cpu3:35426)0x43911311bee8:[0x418011e13f84]CpuSchedWait@vmkernel#nover+0x240 stack: 0x410014a2cde0, 0x0, 0xa000

    2016-09-04T10:46:17.290Z cpu3:35426)0x43911311bf68:[0x418011e140da]CpuSched_VcpuHalt@vmkernel#nover+0x11e stack: 0xffffffff00002001, 0x

    2016-09-04T10:46:17.290Z cpu3:35426)0x43911311bfb8:[0x418011cabe39]VMMVMKCall_Call@vmkernel#nover+0x139 stack: 0x418011cab988, 0x0, 0x4

    2016-09-04T10:46:17.290Z cpu3:35426)Panic: 769: Halting PCPU 3.

    2016-09-04T10:46:17.352Z cpu2:32781)Panic: 634: Panic from another CPU (cpu 2, world 32781): ip=0x418011c780a0 randomOff=0x11c00000:

    Machine Check Exception: Fatal (unrecoverable) MCE on PCPU2 in world 32781:coalesceWorl

    System has encountered a Hardware Error - Please contact the hardware vendor

    2016-09-04T10:46:17.352Z cpu2:32781)Backtrace for current CPU #2, worldID=32781, rbp=0x0

    2016-09-04T10:46:17.352Z cpu2:32781)0x4390c069bbe0:[0x418011f0263a]Power_HaltPCPU@vmkernel#nover+0x1f2 stack: 0x417fd1e83ea0, 0x4180409

    2016-09-04T10:46:17.352Z cpu2:32781)0x4390c069bc30:[0x418011e0fc68]CpuSchedIdleLoopInt@vmkernel#nover+0x2f8 stack: 0x27f2159b79d1, 0x10

    2016-09-04T10:46:17.352Z cpu2:32781)0x4390c069bcb0:[0x418011e133bd]CpuSchedDispatch@vmkernel#nover+0x16b5 stack: 0xef, 0x439080d04001,

    2016-09-04T10:46:17.352Z cpu2:32781)0x4390c069bdd0:[0x418011e13f84]CpuSchedWait@vmkernel#nover+0x240 stack: 0x0, 0x0, 0x80069be78, 0x0,

    2016-09-04T10:46:17.352Z cpu2:32781)0x4390c069be50:[0x418011e144bf]CpuSched_SleepUntilTC@vmkernel#nover+0x8f stack: 0x400002001, 0x4390

    2016-09-04T10:46:17.352Z cpu2:32781)0x4390c069beb0:[0x418011db9b64]NetCoalesceDefaultWorldCB@vmkernel#nover+0x190 stack: 0x0, 0x0, 0x0,

    2016-09-04T10:46:17.352Z cpu2:32781)0x4390c069bfd0:[0x418011e14c1e]CpuSched_StartWorld@vmkernel#nover+0xa2 stack: 0x0, 0x0, 0x0, 0x0, 0

    2016-09-04T10:46:17.352Z cpu2:32781)Panic: 769: Halting PCPU 2.

    2016-09-04T10:46:17.383Z cpu0:32782)Backtrace for current CPU #0, worldID=32782, rbp=0x0

    2016-09-04T10:46:17.383Z cpu0:32782)0x4390c071bcd0:[0x418011f0263a]Power_HaltPCPU@vmkernel#nover+0x1f2 stack: 0x417fd1e83ea0, 0x4180401

    2016-09-04T10:46:17.383Z cpu0:32782)0x4390c071bd20:[0x418011e0fc68]CpuSchedIdleLoopInt@vmkernel#nover+0x2f8 stack: 0x27f2159bc891, 0x10

    2016-09-04T10:46:17.383Z cpu0:32782)0x4390c071bda0:[0x418011e133bd]CpuSchedDispatch@vmkernel#nover+0x16b5 stack: 0x43018bafc634, 0x4180

    2016-09-04T10:46:17.383Z cpu0:32782)0x4390c071bec0:[0x418011e13f84]CpuSchedWait@vmkernel#nover+0x240 stack: 0x0, 0x0, 0x80071bf68, 0x0,

    2016-09-04T10:46:17.383Z cpu0:32782)0x4390c071bf40:[0x418011e144bf]CpuSched_SleepUntilTC@vmkernel#nover+0x8f stack: 0x20c49ba500002001,

    2016-09-04T10:46:17.383Z cpu0:32782)0x4390c071bfa0:[0x418011dbd5ba]NetCoalesce2WorldCB@vmkernel#nover+0xde stack: 0x4390c00a7100, 0x439

    2016-09-04T10:46:17.383Z cpu0:32782)0x4390c071bfd0:[0x418011e14c1e]CpuSched_StartWorld@vmkernel#nover+0xa2 stack: 0x0, 0x0, 0x0, 0x0, 0

    2016-09-04T10:46:17.413Z cpu0:32782) [45m [33;1mVMware ESXi 6.0.0 [Releasebuild-4192238 x86_64] [0m

    Machine Check Exception: Fatal (unrecoverable) MCE on PCPU0 in world 32782:netCoalesce2

    System has encountered a Hardware Error - Please contact the hardware vendor

    2016-09-04T10:46:17.416Z cpu0:32782)cr0=0x8001003d cr2=0x1bc1ef78 cr3=0x8001b000 cr4=0x216c

    2016-09-04T10:46:17.416Z cpu0:32782)Last branch from 0x418011f02533 to 0x418011f025eb

    2016-09-04T10:46:17.417Z cpu0:32782)frame=0x4390c071bc10 ip=0x418011f0263a err=18 rflags=0x10202

    2016-09-04T10:46:17.418Z cpu0:32782)rax=0x0 rbx=0x418040000000 rcx=0x0

    2016-09-04T10:46:17.418Z cpu0:32782)rdx=0x0 rbp=0x0 rsi=0x27f2159bb9c6

    2016-09-04T10:46:17.418Z cpu0:32782)rdi=0x43004d0f51f0 r8=0x15 r9=0x0

    2016-09-04T10:46:17.419Z cpu0:32782)r10=0x0 r11=0x43004d0c8438 r12=0x418040000200

    2016-09-04T10:46:17.419Z cpu0:32782)r13=0x0 r14=0x40 r15=0x0

    2016-09-04T10:46:17.419Z cpu0:32782)pcpu:0 world:32782 name:"netCoalesce2World" (S)

    2016-09-04T10:46:17.420Z cpu0:32782)pcpu:1 world:33079 name:"helper39-3" (SH)

    2016-09-04T10:46:17.420Z cpu0:32782)pcpu:2 world:32781 name:"coalesceWorld-0" (S)

    2016-09-04T10:46:17.420Z cpu0:32782)pcpu:3 world:35426 name:"vmm0:win7" (V)

    2016-09-04T10:46:17.420Z cpu0:32782)@BlueScreen: Machine Check Exception: Fatal (unrecoverable) MCE on PCPU0 in world 32782:netCoalesce2

    System has encountered a Hardware Error - Please contact the hardware vendor

    2016-09-04T10:46:17.420Z cpu0:32782)Code start: 0x418011c00000 VMK uptime: 0:04:53:32.503

    2016-09-04T10:46:17.421Z cpu0:32782)0x4390c071bcd0:[0x418011f0263a]Power_HaltPCPU@vmkernel#nover+0x1f2 stack: 0x417fd1e83ea0

    2016-09-04T10:46:17.422Z cpu0:32782)0x4390c071bd20:[0x418011e0fc68]CpuSchedIdleLoopInt@vmkernel#nover+0x2f8 stack: 0x27f2159bc891

    2016-09-04T10:46:17.423Z cpu0:32782)0x4390c071bda0:[0x418011e133bd]CpuSchedDispatch@vmkernel#nover+0x16b5 stack: 0x43018bafc634

    2016-09-04T10:46:17.424Z cpu0:32782)0x4390c071bec0:[0x418011e13f84]CpuSchedWait@vmkernel#nover+0x240 stack: 0x0

    2016-09-04T10:46:17.425Z cpu0:32782)0x4390c071bf40:[0x418011e144bf]CpuSched_SleepUntilTC@vmkernel#nover+0x8f stack: 0x20c49ba500002001

    2016-09-04T10:46:17.426Z cpu0:32782)0x4390c071bfa0:[0x418011dbd5ba]NetCoalesce2WorldCB@vmkernel#nover+0xde stack: 0x4390c00a7100

    2016-09-04T10:46:17.427Z cpu0:32782)0x4390c071bfd0:[0x418011e14c1e]CpuSched_StartWorld@vmkernel#nover+0xa2 stack: 0x0

    2016-09-04T10:46:17.435Z cpu0:32782)base fs=0x0 gs=0x418040000000 Kgs=0x0

    2016-09-04T10:46:17.435Z cpu0:32782)3 other PCPUs are in panic.

    2016-09-04T10:46:17.228Z cpu3:35426)MC:PCPU3 B:5 S:0xb200000080200e0f M:0x0 A:0x0 5

    2016-09-04T10:46:17.228Z cpu0:32782)MC:PCPU0 B:5 S:0xb200001044100e0f M:0x0 A:0x0 4

    2016-09-04T10:46:17.228Z cpu2:32781)MC:PCPU2 B:5 S:0xb200000084200e0f M:0x0 A:0x0 5

    2016-09-04T10:46:17.228Z cpu1:33079)MC:PCPU1 B:5 S:0xb200001040100e0f M:0x0 A:0x0 4


    Thanks in advance!!



  • 2.  RE: Purple Screen - Requesting help to understand root cause

    Posted Sep 04, 2016 04:42 PM

    Hi,

    I checked the logs. Your hardware is broken. Can you check hardware? Cause of this problem is hardware.

    Thanks.



  • 3.  RE: Purple Screen - Requesting help to understand root cause

    Posted Sep 04, 2016 04:59 PM

    ‌Thanks for the reply! Is it possible to pinpoint the component like RAM, CPU, storage etc.?



  • 4.  RE: Purple Screen - Requesting help to understand root cause

    Posted Sep 04, 2016 05:20 PM

    I think this CPU issue. But you have to run the diagnostic on the server.



  • 5.  RE: Purple Screen - Requesting help to understand root cause

    Posted Sep 05, 2016 09:10 PM

    Thanks, one more question...what does world 33079:helper39-3 mean in Fatal (unrecoverable) MCE on PCPU1 in world 33079:helper39-3? Did this error originate from a virtual machine or from host itself?