VMware vSphere

View Only

Back to discussions

Expand all | Collapse all

eSXI VMFS-6 Datastore Corruption After Host Reboot

1. eSXI VMFS-6 Datastore Corruption After Host Reboot

Recommend
ic2018
Posted Sep 27, 2018 04:01 PM

Reply Reply Privately
Hey guys, hopefully somebody can help with this or point me in the right direction. I lost one of my datastores after rebooting an eSXI 6.7.0 host (VMs were shut down and host was in maintenance mode), and it no longer shows up in the storage/datastore tab of esxi.
However, the VMFS partition is still displayed when viewing the storage device structure. VOMA shows output as below, I would assume the ON-DISK ERROR is the culprit. Manually mounting the uuid doesn't work, and VOMA doesn't have a fix option for VMFS-6 yet, so I'm not sure where to go from here. Hopefully someone can point me in the right direction, thanks in advance.
Phase 1: Checking VMFS header and resource files
   Detected VMFS-6 file system (labeled:'Primary') with UUID:5b0440a2-7dbb4c4b-de69-a0369fe03066, Version 6:82
   Found stale lock [type 10c00003 offset 286449664 v 2, hb offset 3837952
         gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 37
         num 0 gblnum 0 gblgen 0 gblbrk 0]
   Found stale lock [type 10c00003 offset 15070576640 v 2, hb offset 3833856
         gen 103, mode 1, owner 5bab9ade-3cf65242-a144-a0369fe03066 mtime 429
         num 0 gblnum 0 gblgen 0 gblbrk 0]
   Found stale lock [type 10c00008 offset 16195584 v 6, hb offset 3837952
         gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 81
         num 0 gblnum 0 gblgen 0 gblbrk 0]
   Found stale lock [type 10c00002 offset 9928704 v 6, hb offset 3837952
         gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 35
         num 0 gblnum 0 gblgen 0 gblbrk 0]
   Found stale lock [type 10c00002 offset 16392192 v 6, hb offset 3837952
         gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 29
         num 0 gblnum 0 gblgen 0 gblbrk 0]
Cluster 785 unmap lock set while no pending unmaps, stale lock
ON-DISK ERROR: Cluster 785 free locked for unmap 457 should be 224
   Found stale lock [type 10c00002 offset 16465920 v 4, hb offset 3837952
         gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 32
         num 0 gblnum 0 gblgen 0 gblbrk 0]
Phase 2: Checking VMFS heartbeat region
Marking Journal addr (14, 0) in use
Phase 3: Checking all file descriptors.
Phase 4: Checking pathname and connectivity.
Phase 5: Checking resource reference counts.
Total Errors Found:           1
Also the vmkernel log also shows this warning several times
2018-09-26T17:13:18.685Z cpu2:2097320)WARNING: Vol3: 3102: Primary/5b0440a2-7dbb4c4b-de69-a0369fe03066: Invalid physDiskBlockSize 512
2. RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

Recommend
continuum
Posted Sep 27, 2018 07:18 PM

Reply Reply Privately
Hello
have a look at Locked files with VMFS 6 | VM-Sickbay
If necessary create a VMFS header dump if you want me to have a closer look - see
Create a VMFS-Header-dump using an ESXi-Host in production | VM-Sickbay
Ulli
3. RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

Recommend
ic2018
Posted Sep 28, 2018 01:40 AM

Reply Reply Privately
I've made a header backup and uploaded it here and attached it. Replacing the heartbeat section with a clean one did not resolve the issue, this header dump is prior to overwriting the corrupted partition's heartbeat section. Thanks for your help so far.
Edit: also here's a new voma output
Checking if device is actively used by other hosts
Scanning for VMFS-6 host activity (4096 bytes/HB, 1024 HBs).
Running VMFS Checker version 2.1 in default mode
Initializing LVM metadata, Basic Checks will be done
Phase 1: Checking VMFS header and resource files
   Detected VMFS-6 file system (labeled:'Primary') with UUID:5b0440a2-7dbb4c4b-de69-a0369fe03066, Version 6:82
   Found stale lock [type 10c00003 offset 286449664 v 2, hb offset 3837952
         gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 37
         num 0 gblnum 0 gblgen 0 gblbrk 0]
   Found stale lock [type 10c00003 offset 15070576640 v 2, hb offset 3833856
         gen 103, mode 1, owner 5bab9ade-3cf65242-a144-a0369fe03066 mtime 429
         num 0 gblnum 0 gblgen 0 gblbrk 0]
   Found stale lock [type 10c00008 offset 16195584 v 6, hb offset 3837952
         gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 81
         num 0 gblnum 0 gblgen 0 gblbrk 0]
   Found stale lock [type 10c00002 offset 9928704 v 6, hb offset 3837952
         gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 35
         num 0 gblnum 0 gblgen 0 gblbrk 0]
   Found stale lock [type 10c00002 offset 16392192 v 6, hb offset 3837952
         gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 29
         num 0 gblnum 0 gblgen 0 gblbrk 0]
Cluster 785 unmap lock set while no pending unmaps, stale lock
ON-DISK ERROR: Cluster 785 free locked for unmap 457 should be 224
   Found stale lock [type 10c00002 offset 16465920 v 4, hb offset 3837952
         gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 32
         num 0 gblnum 0 gblgen 0 gblbrk 0]
Phase 2: Checking VMFS heartbeat region
Phase 3: Checking all file descriptors.
Phase 4: Checking pathname and connectivity.
Phase 5: Checking resource reference counts.
ON-DISK ERROR: JBC inconsistency found: (14,0) allocated in bitmap, but never used
Total Errors Found:           2
4. RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

Recommend
continuum
Posted Sep 28, 2018 02:56 AM

Reply Reply Privately
Just downloaded the dump ...
This is a tough one ...
OSF-Windows-Server-2016 seems readable , OSF-CentOS-Plesk has a problem.
I will definetely need more time for this
Ulli
5. RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

Recommend
ic2018
Posted Sep 28, 2018 11:23 PM

Reply Reply Privately
The Plesk VM is not entirely necessary I have a pretty recent complete backup of it
6. RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

Recommend
continuum
Posted Sep 30, 2018 11:51 PM

Reply Reply Privately
Please run the command
dd if=/dev/disks/device bs=1M count=10 skip=278540 of=tmp/test.bin
device is the same as you used to create the vmfs-header dump
Download /tmp/test.bin
Compress the file and attach it to your next reply.
7. RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

Recommend
ic2018
Posted Oct 01, 2018 12:35 AM

Reply Reply Privately
Here you go. Thanks again
8. RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

Recommend
continuum
Posted Oct 01, 2018 04:20 AM

Reply Reply Privately
Please look at this partitiontable - is this the Windows-bootdisk you need ?
If yes - install Anydesk and call me / send a message via skype.
Ulli
9. RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

Recommend
continuum
Posted Oct 01, 2018 05:17 PM

Reply Reply Privately
Please let me know if you are still interested.
The success rate of such operations is much better if there is no unnecessary delay between each steps ....
10. RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

Recommend
ic2018
Posted Oct 01, 2018 07:21 PM

Reply Reply Privately
Yes I am the partition table looks about right for the windows disk. I'll contact you on Skype shortly

VMware vSphere

eSXI VMFS-6 Datastore Corruption After Host Reboot

ic2018Sep 27, 2018 04:01 PM

continuumSep 27, 2018 07:18 PM

ic2018Sep 28, 2018 01:40 AM

continuumSep 28, 2018 02:56 AM

ic2018Sep 28, 2018 11:23 PM

continuumSep 30, 2018 11:51 PM

ic2018Oct 01, 2018 12:35 AM

continuumOct 01, 2018 04:20 AM

continuumOct 01, 2018 05:17 PM

ic2018Oct 01, 2018 07:21 PM

1. eSXI VMFS-6 Datastore Corruption After Host Reboot

2. RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

3. RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

4. RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

5. RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

6. RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

7. RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

8. RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

9. RE: eSXI VMFS-6 Datastore Corruption After Host Reboot

10. RE: eSXI VMFS-6 Datastore Corruption After Host Reboot