VMware vSphere

 View Only
  • 1.  eSXI VMFS-6 Datastore Corruption After Host Reboot

    Posted Sep 27, 2018 04:01 PM

    Hey guys, hopefully somebody can help with this or point me in the right direction. I lost one of my datastores after rebooting an eSXI 6.7.0 host (VMs were shut down and host was in maintenance mode), and it no longer shows up in the storage/datastore tab of esxi.

    However, the VMFS partition is still displayed when viewing the storage device structure. VOMA shows output as below, I would assume the ON-DISK ERROR is the culprit. Manually mounting the uuid doesn't work, and VOMA doesn't have a fix option for VMFS-6 yet, so I'm not sure where to go from here. Hopefully someone can point me in the right direction, thanks in advance.

    Phase 1: Checking VMFS header and resource files

       Detected VMFS-6 file system (labeled:'Primary') with UUID:5b0440a2-7dbb4c4b-de69-a0369fe03066, Version 6:82

       Found stale lock [type 10c00003 offset 286449664 v 2, hb offset 3837952

             gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 37

             num 0 gblnum 0 gblgen 0 gblbrk 0]

       Found stale lock [type 10c00003 offset 15070576640 v 2, hb offset 3833856

             gen 103, mode 1, owner 5bab9ade-3cf65242-a144-a0369fe03066 mtime 429

             num 0 gblnum 0 gblgen 0 gblbrk 0]

       Found stale lock [type 10c00008 offset 16195584 v 6, hb offset 3837952

             gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 81

             num 0 gblnum 0 gblgen 0 gblbrk 0]

       Found stale lock [type 10c00002 offset 9928704 v 6, hb offset 3837952

             gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 35

             num 0 gblnum 0 gblgen 0 gblbrk 0]

       Found stale lock [type 10c00002 offset 16392192 v 6, hb offset 3837952

             gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 29

             num 0 gblnum 0 gblgen 0 gblbrk 0]

      Cluster 785 unmap lock set while no pending unmaps, stale lock

    ON-DISK ERROR: Cluster 785 free locked for unmap 457 should be 224

       Found stale lock [type 10c00002 offset 16465920 v 4, hb offset 3837952

             gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 32

             num 0 gblnum 0 gblgen 0 gblbrk 0]

    Phase 2: Checking VMFS heartbeat region

    Marking Journal addr (14, 0) in use

    Phase 3: Checking all file descriptors.

    Phase 4: Checking pathname and connectivity.

    Phase 5: Checking resource reference counts.

    Total Errors Found:           1

    Also the vmkernel log also shows this warning several times

    2018-09-26T17:13:18.685Z cpu2:2097320)WARNING: Vol3: 3102: Primary/5b0440a2-7dbb4c4b-de69-a0369fe03066: Invalid physDiskBlockSize 512



  • 2.  RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

    Posted Sep 27, 2018 07:18 PM

    Hello

    have a look at Locked files with VMFS 6 | VM-Sickbay
    If necessary create a VMFS header dump if you want me to have a closer look - see
    Create a VMFS-Header-dump using an ESXi-Host in production | VM-Sickbay
    Ulli



  • 3.  RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

    Posted Sep 28, 2018 01:40 AM

    I've made a header backup and uploaded it here and attached it. Replacing the heartbeat section with a clean one did not resolve the issue, this header dump is prior to overwriting the corrupted partition's heartbeat section. Thanks for your help so far.

    Edit: also here's a new voma output

    Checking if device is actively used by other hosts

    Scanning for VMFS-6 host activity (4096 bytes/HB, 1024 HBs).

    Running VMFS Checker version 2.1 in default mode

    Initializing LVM metadata, Basic Checks will be done

    Phase 1: Checking VMFS header and resource files

       Detected VMFS-6 file system (labeled:'Primary') with UUID:5b0440a2-7dbb4c4b-de69-a0369fe03066, Version 6:82

       Found stale lock [type 10c00003 offset 286449664 v 2, hb offset 3837952

             gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 37

             num 0 gblnum 0 gblgen 0 gblbrk 0]

       Found stale lock [type 10c00003 offset 15070576640 v 2, hb offset 3833856

             gen 103, mode 1, owner 5bab9ade-3cf65242-a144-a0369fe03066 mtime 429

             num 0 gblnum 0 gblgen 0 gblbrk 0]

       Found stale lock [type 10c00008 offset 16195584 v 6, hb offset 3837952

             gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 81

             num 0 gblnum 0 gblgen 0 gblbrk 0]

       Found stale lock [type 10c00002 offset 9928704 v 6, hb offset 3837952

             gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 35

             num 0 gblnum 0 gblgen 0 gblbrk 0]

       Found stale lock [type 10c00002 offset 16392192 v 6, hb offset 3837952

             gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 29

             num 0 gblnum 0 gblgen 0 gblbrk 0]

      Cluster 785 unmap lock set while no pending unmaps, stale lock

    ON-DISK ERROR: Cluster 785 free locked for unmap 457 should be 224

       Found stale lock [type 10c00002 offset 16465920 v 4, hb offset 3837952

             gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 32

             num 0 gblnum 0 gblgen 0 gblbrk 0]

    Phase 2: Checking VMFS heartbeat region

    Phase 3: Checking all file descriptors.

    Phase 4: Checking pathname and connectivity.

    Phase 5: Checking resource reference counts.

    ON-DISK ERROR: JBC inconsistency found: (14,0) allocated in bitmap, but never used

    Total Errors Found:           2



  • 4.  RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

    Posted Sep 28, 2018 02:56 AM

    Just downloaded the dump ...
    This is a tough one ...
    OSF-Windows-Server-2016 seems readable , OSF-CentOS-Plesk has a problem.
    I will definetely need more time for this
    Ulli



  • 5.  RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

    Posted Sep 28, 2018 11:23 PM

    The Plesk VM is not entirely necessary I have a pretty recent complete backup of it



  • 6.  RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

    Posted Sep 30, 2018 11:51 PM

    Please run the command
    dd if=/dev/disks/device bs=1M count=10 skip=278540 of=tmp/test.bin
    device is the same as you used to create the vmfs-header dump
    Download /tmp/test.bin
    Compress the file and attach it to your next reply.



  • 7.  RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

    Posted Oct 01, 2018 12:35 AM

    Here you go. Thanks again



  • 8.  RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

    Posted Oct 01, 2018 04:20 AM

    Please look at this partitiontable - is this the Windows-bootdisk you need ?

    If yes - install Anydesk and call me / send a message via skype.
    Ulli



  • 9.  RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

    Posted Oct 01, 2018 05:17 PM

    Please let me know if you are still interested.
    The success rate of such operations is much better if there is no unnecessary delay between each steps ....



  • 10.  RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

    Posted Oct 01, 2018 07:21 PM

    Yes I am the partition table looks about right for the windows disk. I'll contact you on Skype shortly