VMware vSphere

 View Only
  • 1.  PSOD ESXi 6 - PCPU 1 locked up. Failed to ack TLB invalidate

    Posted Dec 24, 2017 10:55 AM

    Hello everyone,

    On my whitebox ESXi 6.0U3 in my home lab, I have a problem with a PSOD, this is the second time it happened, last time was more than a month ago.

    I looked a the vmkernel-log file from the diagnostic dump but I don't understand what could be causing this.
    I hope someone here can shed a light on the situation, thanks!

    2017-12-24T05:40:41.850Z cpu2:32999) [45m [33;1mVMware ESXi 6.0.0 [Releasebuild-6765062 x86_64] [0m

    PCPU 1 locked up. Failed to ack TLB invalidate (total of 2 locked up, PCPU(s): 0,1). 

    2017-12-24T05:40:41.850Z cpu2:32999)cr0=0x8001003d cr2=0x1c2f8740080 cr3=0xcd83f000 cr4=0x216c

    Log and photo attached.



  • 2.  RE: PSOD ESXi 6 - PCPU 1 locked up. Failed to ack TLB invalidate

    Posted Dec 24, 2017 03:41 PM

    First, read and understand this KB if you haven't already. Second, understand that with whitebox servers (i.e. unsupported hardware) your results may be unpredictable with stability not guaranteed. This is one of many possible side effects.



  • 3.  RE: PSOD ESXi 6 - PCPU 1 locked up. Failed to ack TLB invalidate

    Posted Dec 25, 2017 05:43 PM

    I did read the article beforehand, but could not extract any useful information other than:

    The Failed to ack TLB Invalidate is caused by either a hardware or a software issue.

    I just would like to know if someone can extract relevant information from the log to conclude if it's hardware or software at fault.

    I understand the consequence of a whitebox ESXi, but i have been running them for years in my homelab.

    Thanks



  • 4.  RE: PSOD ESXi 6 - PCPU 1 locked up. Failed to ack TLB invalidate

    Posted Dec 28, 2017 06:19 PM

    Let's hope it was a software bug.

    I did an update of the esxi 6.0U3 to version 6.0.0-3.79.6921384



  • 5.  RE: PSOD ESXi 6 - PCPU 1 locked up. Failed to ack TLB invalidate

    Posted Dec 30, 2017 07:44 AM

    you could check the firmware or driver comparability.Can you try to Upgrade  the ESXi build  and see the PSOD is re-occuring

    Regards,

    Randhir



  • 6.  RE: PSOD ESXi 6 - PCPU 1 locked up. Failed to ack TLB invalidate

    Posted Jan 22, 2018 10:01 PM

    Another PSOD today. #PF Exception 14

    It all seems to point toward a hardware issue, gonna need to do some mem and cpu testing :smileysad:

    2018-01-22T20:00:01.318Z cpu0:422337)World: 9762: PRDA 0x418040000000 ss 0x0 ds 0x10b es 0x10b fs 0x0 gs 0x13b

    2018-01-22T20:00:01.318Z cpu0:422337)World: 9764: TR 0x4020 GDT 0x43944e0a1000 (0x402f) IDT 0x4180310ca000 (0xfff)

    2018-01-22T20:00:01.318Z cpu0:422337)World: 9765: CR0 0x80010031 CR3 0x16da26000 CR4 0x42768

    2018-01-22T20:00:01.322Z cpu0:422337)Backtrace for current CPU #0, worldID=422337, rbp=0x4308c7b5ac70

    2018-01-22T20:00:01.322Z cpu0:422337)0x43944e09bbf0:[0x418031347e7f]PT_GetNextLevel@vmkernel#nover+0x1b stack: 0x4308c7b5ac70, 0x43944e0

    2018-01-22T20:00:01.322Z cpu0:422337)0x43944e09bc20:[0x418031347f78]PT_GetL1Table@vmkernel#nover+0x24 stack: 0x0, 0x1d, 0x0, 0x3ffffffff

    2018-01-22T20:00:01.322Z cpu0:422337)0x43944e09bc30:[0x418031648446]UserPT_LookupPageTable@<None>#<None>+0x4e stack: 0x0, 0x3fffffffff,

    2018-01-22T20:00:01.322Z cpu0:422337)0x43944e09bc80:[0x4180315e09e1]UserMem_HandleMapFault@<None>#<None>+0x865 stack: 0x418040901e00, 0x

    2018-01-22T20:00:01.322Z cpu0:422337)0x43944e09bec0:[0x4180315c6f82]User_Exception@<None>#<None>+0x126 stack: 0x0, 0x43944e09bf30, 0x439

    2018-01-22T20:00:01.322Z cpu0:422337)0x43944e09bf10:[0x418031055953]Int14_PF@vmkernel#nover+0x17f stack: 0x0, 0x4180310c8067, 0x0, 0x13b

    2018-01-22T20:00:01.322Z cpu0:422337)0x43944e09bf30:[0x4180310c8067]gate_entry_@vmkernel#nover+0x0 stack: 0x0, 0xa5a5c8b, 0xfff35d94, 0x

    2018-01-22T20:00:01.323Z cpu0:422337) [45m [33;1mVMware ESXi 6.0.0 [Releasebuild-7504637 x86_64] [0m

    #PF Exception 14 in world 422337:hostd-probe IP 0x418031347e7f addr 0x6e2e8d

    PTEs:0x6e2f04027;0x24ae88027;0x0;

    2018-01-22T20:00:01.323Z cpu0:422337)cr0=0x80010031 cr2=0x6e2e8d cr3=0x16da26000 cr4=0x42768

    2018-01-22T20:00:01.323Z cpu0:422337)frame=0x43944e09bb30 ip=0x418031347e7f err=2 rflags=0x10297

    2018-01-22T20:00:01.323Z cpu0:422337)rax=0x6e2f04 rbx=0xa5a5 rcx=0xffff81016da26001

    2018-01-22T20:00:01.323Z cpu0:422337)rdx=0xa5a5 rbp=0x4308c7b5ac70 rsi=0x6e2f04

    2018-01-22T20:00:01.323Z cpu0:422337)rdi=0x3 r8=0x43006200e180 r9=0xffff8101c8ea2

    2018-01-22T20:00:01.323Z cpu0:422337)r10=0xffff8101c8ea2d28 r11=0x0 r12=0x43944e09be58

    2018-01-22T20:00:01.323Z cpu0:422337)r13=0x3fffffffff r14=0x0 r15=0x4308c7b5ac70

    2018-01-22T20:00:01.323Z cpu0:422337)pcpu:0 world:422337 name:"hostd-probe" (U)

    2018-01-22T20:00:01.323Z cpu0:422337)pcpu:1 world:35576 name:"vmm0:BackupSvr" (V)

    2018-01-22T20:00:01.323Z cpu0:422337)pcpu:2 world:35599 name:"vmm0:ARES" (V)

    2018-01-22T20:00:01.323Z cpu0:422337)pcpu:3 world:35580 name:"vmm3:BackupSvr" (V)

    2018-01-22T20:00:01.323Z cpu0:422337)pcpu:4 world:422336 name:"python" (U)

    2018-01-22T20:00:01.323Z cpu0:422337)pcpu:5 world:35579 name:"vmm2:BackupSvr" (V)

    2018-01-22T20:00:01.323Z cpu0:422337)pcpu:6 world:35578 name:"vmm1:BackupSvr" (V)

    2018-01-22T20:00:01.323Z cpu0:422337)pcpu:7 world:35647 name:"vmm1:vCenterApp" (V)

    2018-01-22T20:00:01.323Z cpu0:422337)@BlueScreen: #PF Exception 14 in world 422337:hostd-probe IP 0x418031347e7f addr 0x6e2e8d

    PTEs:0x6e2f04027;0x24ae88027;0x0;

    2018-01-22T20:00:01.324Z cpu0:422337)Code start: 0x418031000000 VMK uptime: 7:08:07:11.719

    2018-01-22T20:00:01.324Z cpu0:422337)0x43944e09bbf0:[0x418031347e7f]PT_GetNextLevel@vmkernel#nover+0x1b stack: 0x4308c7b5ac70

    2018-01-22T20:00:01.324Z cpu0:422337)0x43944e09bc20:[0x418031347f78]PT_GetL1Table@vmkernel#nover+0x24 stack: 0x0

    2018-01-22T20:00:01.324Z cpu0:422337)0x43944e09bc30:[0x418031648446]UserPT_LookupPageTable@<None>#<None>+0x4e stack: 0x0

    2018-01-22T20:00:01.324Z cpu0:422337)0x43944e09bc80:[0x4180315e09e1]UserMem_HandleMapFault@<None>#<None>+0x865 stack: 0x418040901e00

    2018-01-22T20:00:01.324Z cpu0:422337)0x43944e09bec0:[0x4180315c6f82]User_Exception@<None>#<None>+0x126 stack: 0x0

    2018-01-22T20:00:01.324Z cpu0:422337)0x43944e09bf10:[0x418031055953]Int14_PF@vmkernel#nover+0x17f stack: 0x0

    2018-01-22T20:00:01.324Z cpu0:422337)0x43944e09bf30:[0x4180310c8067]gate_entry_@vmkernel#nover+0x0 stack: 0x0

    2018-01-22T20:00:01.326Z cpu0:422337)base fs=0x0 gs=0x418040000000 Kgs=0x0