ESXi

 View Only
  • 1.  After recovery of a failed raid array my datastore is missing

    Posted Sep 03, 2014 08:18 PM

    Hi all,

    I'm running ESXi at home, but I'm having a problem I require some help with :smileyhappy:

    First of let me start with the specs:

    ESXI: 5.1.0,799733

    RAID Controller: Adaptec 6805 (Community supported driver)

    RAID Configuration: RAID 5 12,71 TB disk

    I had some problems with my RAID controller which I resolved using the KB article from Adaptec: Can a failed array be recovered? I ended up deleting the array and recreating it.

    The RAID Array is now back to OPTIMAL state, so I booted ESXi and logged in using vSphere to find all my VM's as Unknown (Inaccessible).

    I started googling and found some information, but nothing so far to fix this issue.

    I tried info from these links:

    - After RAID5 problems which are now solved, I am not able to mount VMFS

    - VMware KB:    Recreating a missing VMFS datastore partition in VMware vSphere 5.0/5.1/5.5 

    ~ #  df -k

    Filesystem 1k-blocks      Used Available Use% Mounted on

    VMFS-5     483131392 319600640 163530752  66% /vmfs/volumes/500GB

    vfat         4192960    141952   4051008   3% /vmfs/volumes/51f45583-c7047877-9ff7-0015178fcc59

    vfat          255716    138872    116844  54% /vmfs/volumes/55f80475-edb62e49-366c-1f1975e107c2

    vfat          255716    135420    120296  53% /vmfs/volumes/8367f77f-57cb0e3e-888b-046f29890a95

    vfat          292688    206792     85896  71% /vmfs/volumes/51f4556a-e6c74124-5a63-0015178fcc59

    ~ # fdisk -l

    ***

    *** The fdisk command is deprecated: fdisk does not handle GPT partitions.  Please use partedUtil

    ***

    Disk /dev/disks/t10.ATA_____SAMSUNG_HD501LJ_________________________410111CPC53662000000000000: 500.1 GB, 500107862016 bytes

    255 heads, 63 sectors/track, 60801 cylinders

    Units = cylinders of 16065 * 512 = 8225280 bytes

                                                                                         Device Boot      Start         End      Blocks  Id System

    /dev/disks/t10.ATA_____SAMSUNG_HD501LJ_________________________410111CPC53662000000000000p1               1         115      917504   5 Extended

    Partition 1 does not end on cylinder boundary

    /dev/disks/t10.ATA_____SAMSUNG_HD501LJ_________________________410111CPC53662000000000000p2             115         637     4193280   6 FAT16

    Partition 2 does not end on cylinder boundary

    /dev/disks/t10.ATA_____SAMSUNG_HD501LJ_________________________410111CPC53662000000000000p3             637       60802   483271704  fb VMFS

    /dev/disks/t10.ATA_____SAMSUNG_HD501LJ_________________________410111CPC53662000000000000p4   *           1           1        4080   4 FAT16 <32M

    Partition 4 does not end on cylinder boundary

    /dev/disks/t10.ATA_____SAMSUNG_HD501LJ_________________________410111CPC53662000000000000p5               1          33      255984   6 FAT16

    /dev/disks/t10.ATA_____SAMSUNG_HD501LJ_________________________410111CPC53662000000000000p6              33          65      255984   6 FAT16

    /dev/disks/t10.ATA_____SAMSUNG_HD501LJ_________________________410111CPC53662000000000000p7              65          79      112624  fc VMKcore

    /dev/disks/t10.ATA_____SAMSUNG_HD501LJ_________________________410111CPC53662000000000000p8              79         115      292848   6 FAT16

    Partition table entries are not in disk order

    fdisk: device has more than 2^32 sectors, can't use all of them

    Found valid GPT with protective MBR; using GPT

    Disk /dev/disks/mpx.vmhba1:C0:T0:L0: 4294967295 sectors, 4095M

    Logical sector size: 512

    Disk identifier (GUID): 0652d5d8-10f3-4264-9d35-1e12b2d23b76

    Partition table holds up to 128 entries

    First usable sector is 34, last usable sector is 27304898526

    Number  Start (sector)    End (sector)  Size       Code  Name

       1            2048     27304898526       25.4G   0700

    ~ # esxcfg-volume -l

    <No results>

    # ls /vmfs/devices/disks/

    mpx.vmhba1:C0:T0:L0                                                               vml.0100000000343130313131435043353336363200000000000053414d53554e

    mpx.vmhba1:C0:T0:L0:1                                                             vml.0100000000343130313131435043353336363200000000000053414d53554e:1

    t10.ATA_____SAMSUNG_HD501LJ_________________________410111CPC53662000000000000    vml.0100000000343130313131435043353336363200000000000053414d53554e:2

    t10.ATA_____SAMSUNG_HD501LJ_________________________410111CPC53662000000000000:1  vml.0100000000343130313131435043353336363200000000000053414d53554e:3

    t10.ATA_____SAMSUNG_HD501LJ_________________________410111CPC53662000000000000:2  vml.0100000000343130313131435043353336363200000000000053414d53554e:4

    t10.ATA_____SAMSUNG_HD501LJ_________________________410111CPC53662000000000000:3  vml.0100000000343130313131435043353336363200000000000053414d53554e:5

    t10.ATA_____SAMSUNG_HD501LJ_________________________410111CPC53662000000000000:4  vml.0100000000343130313131435043353336363200000000000053414d53554e:6

    t10.ATA_____SAMSUNG_HD501LJ_________________________410111CPC53662000000000000:5  vml.0100000000343130313131435043353336363200000000000053414d53554e:7

    t10.ATA_____SAMSUNG_HD501LJ_________________________410111CPC53662000000000000:6  vml.0100000000343130313131435043353336363200000000000053414d53554e:8

    t10.ATA_____SAMSUNG_HD501LJ_________________________410111CPC53662000000000000:7  vml.01000000003630356634396164524149443520

    t10.ATA_____SAMSUNG_HD501LJ_________________________410111CPC53662000000000000:8  vml.01000000003630356634396164524149443520:1

    # partedUtil getptbl /vmfs/devices/disks/mpx.vmhba1:C0:T0:L0

    gpt

    1699651 255 63 27304898560

    1 2048 27304898526 AA31E02A400F11DB9590000C2911D1B8 vmfs 0

    [EDIT] I didn't notice before that the mpx.vmhba had an additional listing in /vmfs/devices/disks

    #  partedUtil getptbl /vmfs/devices/disks/mpx.vmhba1:C0:T0:L0:1

    unknown

    1699651 255 63 27304896479

    I attached vmkernel.log

    I hope someone can point me in the right direction, I have a backup of my critical/personal data, but it would be nice to recover the rest as well...

    Thanks in advance for the help!

    Message was edited by: Sebastiaan Grob added info about /vmfs/devices/disks/mpx.vmhba1:C0:T0:L0:1



  • 2.  RE: After recovery of a failed raid array my datastore is missing

    Posted Sep 05, 2014 03:15 PM

    Yesterday I found out that the newly created array lists the hard disks in a different order than that they were before

    Which means that RAID array segments and disks are no longer in the same order as they were before

    I tried to fix that by looking at the RAID connector (CN0/CN1) and the device connectors (Dev00..Dev03)

    After I reconnected the disks to match the new order I ran esxcfg-volume -l with the following results:

    ~ #  esxcfg-volume -l

    VMFS UUID/label: n.a./n.a.

    Can mount: No (some extents missing)

    Can resignature: No (some extents missing)

    Extent name: mpx.vmhba1:C0:T0:L0:1      range: 524288 - 2094079 (MB)

    Extent name: mpx.vmhba1:C0:T0:L0:1      range: 2094080 - 3664895 (MB)

    Extent name: mpx.vmhba1:C0:T0:L0:1      range: 4189184 - 4711423 (MB)

    Extent name: mpx.vmhba1:C0:T0:L0:1      range: 5767168 - 6284287 (MB)

    Extent name: mpx.vmhba1:C0:T0:L0:1      range: 6284288 - 7330815 (MB)

    Extent name: mpx.vmhba1:C0:T0:L0:1      range: 9425920 - 9433087 (MB)

    Extent name: mpx.vmhba1:C0:T0:L0:1      range: 9433088 - 10996735 (MB)

    Extent name: mpx.vmhba1:C0:T0:L0:1      range: 11521024 - 12045311 (MB)

    Extent name: mpx.vmhba1:C0:T0:L0:1      range: 12569600 - 13091839 (MB)

    Extent name: mpx.vmhba1:C0:T0:L0:1      range: 13107200 - 13616127 (MB)

    I then installed a Windows 7 Virtual on the boot disk of my server (also a datastore which I use for ESXi and isos and stuff) and installed remote arcconf

    The results I got from arcconf getconfig 1 differ from the results I got before all problems

    So I sorted out the raid config and I ran the command again:

    ~ #  esxcfg-volume -l

    VMFS UUID/label: 51f4625a-6482a0da-c112-0015178fcc59/Adaptec RAID 1

    Can mount: Yes

    Can resignature: Yes

    Extent name: mpx.vmhba1:C0:T0:L0:1      range: 0 - 13332223 (MB)

    But when i Rescan all in vSphere I does not find the datastore

    I'll google for answers, but if somebody can point me in de right direction, that would be nice :smileywink:



  • 3.  RE: After recovery of a failed raid array my datastore is missing

    Posted Sep 05, 2014 03:25 PM

    From the /var/log/vmkernel.log after running "vmkfstools -V"

    2014-09-05T16:25:04.888Z cpu2:8694)LVM: 8315: Device mpx.vmhba1:C0:T0:L0:1 detected to be a snapshot:

    2014-09-05T16:25:04.888Z cpu2:8694)LVM: 8322:   queried disk ID: <type 1, len 14, lun 0, devType 0, scsi 0, h(id) 15725092533628176734>

    2014-09-05T16:25:04.888Z cpu2:8694)LVM: 8329:   on-disk disk ID: <type 1, len 14, lun 0, devType 0, scsi 0, h(id) 6163779895204625397>

    2014-09-05T16:25:04.909Z cpu2:8694)Vol3: 692: Couldn't read volume header from control: Not supported

    2014-09-05T16:25:04.909Z cpu2:8694)Vol3: 692: Couldn't read volume header from control: Not supported

    2014-09-05T16:25:04.909Z cpu2:8694)FSS: 4972: No FS driver claimed device 'control': Not supported

    2014-09-05T16:25:04.916Z cpu2:8694)VC: 1547: Device rescan time 51 msec (total number of devices 6)

    2014-09-05T16:25:04.916Z cpu2:8694)VC: 1550: Filesystem probe time 20 msec (devices probed 4 of 6)

    I found some info on this snapshot stuff on google, looking into that :smileyhappy:



  • 4.  RE: After recovery of a failed raid array my datastore is missing
    Best Answer

    Posted Sep 05, 2014 07:44 PM

    Running some more commands:

    ~ # esxcli storage vmfs snapshot list

    51f4625a-6482a0da-c112-0015178fcc59

       Volume Name: Adaptec RAID 1

       VMFS UUID: 51f4625a-6482a0da-c112-0015178fcc59

       Can mount: true

       Reason for un-mountability:

       Can resignature: true

       Reason for non-resignaturability:

       Unresolved Extent Count: 1

    Then I tried the Add Storage option as described in this (VMware KB: vSphere handling of LUNs detected as snapshot LUNs ) KB article and my datastore is back, no dataloss!!