vSAN1

 View Only
  • 1.  Missing objects on vSAN

    Posted Jan 08, 2024 03:57 PM

    Hi

    Recently my lab vSAN suffered powerloss and simultaneous disk failures. I had backups for all my VMs but one. This VMs files cannot be found on vSAN files list. I run rvc and got this result: 

    | Docker                                                                                  | 5 |     |
    | [vsanDatastore] cbad4460-12cf-702e-9b59-5cb9018acd30/Docker Server 2.0.vmx | | 0/4 |
    | [vsanDatastore] cbad4460-12cf-702e-9b59-5cb9018acd30/Docker Server 2.0-000001.vmdk | | 4/4 |
    | [vsanDatastore] cbad4460-12cf-702e-9b59-5cb9018acd30/Docker Server 2.0.vmdk | | 3/3 |
    | [vsanDatastore] cbad4460-12cf-702e-9b59-5cb9018acd30/Docker Server 2.0_1.vmdk | | 4/4 |
    | [vsanDatastore] cbad4460-12cf-702e-9b59-5cb9018acd30/Docker Server 2.0-15d3ae19.vswp | | 4/4 |

    I think that means that all vdmk files are fine. 

    Next I get UUID for one of the files by running esxcli vsan debug object list. With command cmmds-tool find -t DOM_OBJECT -u 5ff54460-c4f4-c451-d0ca-5cb9018acd30

    I got this 

    owner=5a679d85-2799-7550-1548-5cb9018ac7bc(Health: Healthy) uuid=5ff54460-c4f4-c451-d0ca-5cb9018acd30 type=DOM_OBJECT rev=535 minHostVer=3 [content = ("Configuration" (("CSN" l13114) ("SCSN" l12991) ("addressSpace" l42949672960) ("scrubStartTime" l+1698943167434358) ("objectVersion" i10) ("highestDiskVersion" i10) ("muxGroup" l217506247220509020) ("groupUuid" cbad4460-12cf-702e-9b59-5cb9018acd30) ("raidFact" i133) ("scrubEta" i8966287) ("compositeUuid" 5ff54460-c4f4-c451-d0ca-5cb9018acd30)) ("RAID_5" (("stripeBlockSize" l1048576) ("scope" i3)) ("Component" (("capacity" (l0 l14316208128)) ("addressSpace" l14316208128) ("componentState" l5) ("componentStateTS" l1702904798) ("faultDomainId" 5ea342a1-7b04-83a5-8a45-2c44fd958dd0) ("nVotes" i2) ("lastScrubbedOffset" l13281918976) ("subFaultDomainId" 5ea342a1-7b04-83a5-8a45-2c44fd958dd0)) 17976562-9ee0-158c-240b-001b21bbca50 528c01e7-1203-055a-c7c3-00848c29fa6e) ("Component" (("capacity" (l0 l14316208128)) ("addressSpace" l14316208128) ("componentState" l5) ("componentStateTS" l1702904798) ("faultDomainId" 61c8b1c6-eb3b-2ba4-7ec6-e8393512b3b5) ("lastScrubbedOffset" l13281918976) ("subFaultDomainId" 61c8b1c6-eb3b-2ba4-7ec6-e8393512b3b5)) ad8b6562-05a0-569b-12d1-001b21bbca50 52291d42-21f7-83b1-b615-1b0d3a989843) ("Component" (("capacity" (l0 l14317256704)) ("addressSpace" l14317256704) ("componentState" l5) ("componentStateTS" l1702921350) ("faultDomainId" 643c1dd3-7706-9e78-b158-5cb9018acd30) ("lastScrubbedOffset" l13281918976) ("subFaultDomainId" 643c1dd3-7706-9e78-b158-5cb9018acd30)) ca458065-53c2-2401-f0f0-5cb9018ac7bc 5216468f-2498-1e07-83bb-d738501dcd56) ("Component" (("capacity" (l0 l14317256704)) ("addressSpace" l14317256704) ("componentState" l5) ("componentStateTS" l1702904798) ("faultDomainId" 5a679d85-2799-7550-1548-5cb9018ac7bc) ("lastScrubbedOffset" l13281918976) ("subFaultDomainId" 5a679d85-2799-7550-1548-5cb9018ac7bc)) 2ee37b62-96d6-945c-666a-001b21bbca50 5233886f-675e-7ed1-92ed-9e9c790a052a)))], errorStr=(null)

    This claims that file is healthy. 

    However I cannot get access to that file. I tried create a new directory and change file path attribute to point in that directory but directory remains empty. 

    My question is, how can I get access to these files or can I?



  • 2.  RE: Missing objects on vSAN

    Posted Jan 08, 2024 04:25 PM

    you can try the following to access the missing file 

    • In vCenter, right-click the vSAN datastore and select "Rescan for Updates." This might refresh the inventory and reveal the files.
    • Use vdq -q object list to list objects and their health status. Pay close attention to the affected VM's objects.
    • Use cmmds-tool find -t DOM_OBJECT -u <UUID> to confirm component states for each VMDK. Ensure they're all "Healthy."


  • 3.  RE: Missing objects on vSAN

    Posted Jan 08, 2024 04:47 PM

    cmmds-tool find -t DOM_OBJECT -u <UUID> claims that all three vdmk files are healthy. Rescan did not helped.



  • 4.  RE: Missing objects on vSAN

    Posted Jan 10, 2024 08:24 PM

    , Sorry but it is frowned upon to give blatantly incorrect information on tech forums - if you give someone a command that you don't know what it does and it ends up causing more harm then good then this is not helping:


    "In vCenter, right-click the vSAN datastore and select "Rescan for Updates." This might refresh the inventory and reveal the files."
    No, that's not how vsanDatastore works.
    "Use vdq -q object list to list objects and their health status. Pay close attention to the affected VM's objects."
    'vdq' queries physical disks and their partitions, this has nothing to do with objects nor VMs.
    "Use cmmds-tool find -t DOM_OBJECT -u <UUID> to confirm component states for each VMDK. Ensure they're all "Healthy.""
    No, it doesn't do this.

     

    | [vsanDatastore] cbad4460-12cf-702e-9b59-5cb9018acd30/Docker Server 2.0.vmx | | 0/4 |

    This indicates all 4 components of the namespace object (UUID: cbad4460-12cf-702e-9b59-5cb9018acd30 - the folder object where the .vmx and .vmdk descriptors reside) are inaccessible.

     

    "This claims that file is healthy. "

    No - this indicates that the CMMDS DOM_OBJECT entry itself is 'healthy', this has zero bearing on the actual health of the actual object in question.
    That being said, yes this object is healthy and all components are Active (component state i5).

     

    "However I cannot get access to that file. I tried create a new directory and change file path attribute to point in that directory but directory remains empty."
    Specifically what did you change/try?

     

    I would start by trying to determine why object cbad4460-12cf-702e-9b59-5cb9018acd30 has zero of the 4 components available, look it up using:

    # esxcli vsan debug object list -u cbad4460-12cf-702e-9b59-5cb9018acd30


    This entry will also show which node is the current DOM-Owner of that object, try owner abdicating it - this is basically a method of attempting force recheck of the objects component states.
    (run from the node that is DOM-Owner of the object):

    # vsish -e set /vmkModules/vsan/dom/ownerAbdicate cbad4460-12cf-702e-9b59-5cb9018acd30


    Now recheck the object's component state using debug object list again - if the components are all now Active (or at least enough of them to make the object accessible again) then validate you can access the namespace object, register the VM and power it on.

     

    If that doesn't change anything ping back here and I will tell you plan B.



  • 5.  RE: Missing objects on vSAN

    Posted Jan 11, 2024 04:57 PM

    Hi  

    I did what you asked me to do but it unfortunately did not helped. Here is output of the last command you instructed me to run:

    Object UUID: cbad4460-12cf-702e-9b59-5cb9018acd30
       Version: 10
       Health: inaccessible - Lost data availability.
       Owner: cloud1.sveok.lan
       Size: 0.00 GB
       Used: 1.21 GB
       Policy:
       Configuration:
    
          RAID_5
             Component: d1d76562-8a7b-302d-19b1-001b21bbca50
               Component State: ACTIVE,  Address Space(B): 91268055040 (85.00GB),  Disk UUID: 52291d42-21f7-83b1-b615-1b0d3a989843,  Disk Name: naa.5002538f31189435:2
               Votes: 2,  Capacity Used(B): 666894336 (0.62GB),  Physical Capacity Used(B): 658505728 (0.61GB),  Host Name: cloud1.sveok.lan
             Component: 28abeb61-85aa-ac58-dd45-001b21bbca50
               Component State: ACTIVE,  Address Space(B): 91268055040 (85.00GB),  Disk UUID: 529dc36f-0c88-4f03-1830-23fbcf0f98f6,  Disk Name: naa.600508b1001ccb2f7e7c048e2f0f0fd2:2
               Votes: 1,  Capacity Used(B): 650117120 (0.61GB),  Physical Capacity Used(B): 641728512 (0.60GB),  Host Name: vsan2.sveok.lan
             Component: 3c013d64-853b-6d76-54f0-e8393512b1dd
               Component State: ABSENT,  Address Space(B): 91268055040 (85.00GB),  Disk UUID: 527bf033-f7d2-7fe1-d37a-f02a1c0f4035,  Disk Name: N/A
               Votes: 1,  Host UUID: None
             Component: 33774c62-877a-2191-ce4e-5cb9018acd30
               Component State: ABSENT,  Address Space(B): 91268055040 (85.00GB),  Disk UUID: 52518767-a0c0-6ef1-4f47-3c47da81d4aa,  Disk Name: N/A
               Votes: 1,  Host UUID: None
    
       Type: N/A
       Path: N/A
       Group UUID: cbad4460-12cf-702e-9b59-5cb9018acd30
       Directory Name: N/A

    And to answer to your earlier question, I create an new hard drive, edited its descriptors and attributes as you instructed in this message.



  • 6.  RE: Missing objects on vSAN

    Posted Jan 12, 2024 06:40 AM

    Hi TheBobkin,

    Apologies for the delayed response. I replied earlier, but for some unknown reason, that response was marked as spam and removed.

    Anyway, I followed your instructions, but it didn't help. Two out of the four components changed status from absent to active, but the object is still missing. vSan shows it as an unknown object.

    Regarding your earlier question about what I did: I created a dummy VM with the exact same disk size as the original disk. Then, I edited the vmdk file descriptors to point to the original vmdk files. I also edited the attributes of the original files to point to these dummy files. You suggested this process in one of your messages in 2017



  • 7.  RE: Missing objects on vSAN

    Posted Jan 12, 2024 08:03 PM

     

    "Regarding your earlier question about what I did: I created a dummy VM with the exact same disk size as the original disk. Then, I edited the vmdk file descriptors to point to the original vmdk files. I also edited the attributes of the original files to point to these dummy files. You suggested this process in one of your messages in 2017"

    Yes, this was plan B but it should work so we should check exactly what you did to ensure it is correct - can you send the link of the Communities thread you are referencing?

     

    You mentioned "I tried create a new directory and change file path attribute to point in that directory but directory remains empty."
    The new directory should just be the namespace of the donor-VM you created, what do you mean this is empty?

     

    Can you attach (or PM me if don't want to share online) the donor-VM vmdk descriptors, objtool getAttr output of the original vmdk objects and the debug object list output of the 3 vmdks? ('Docker Server 2.0.vmdk', 'Docker Server 2.0-000001.vmdk' and 'Docker Server 2.0_1.vmdk')
    e.g.:

    # esxcli vsan debug object list --all > /tmp/objout
    # less /tmp/objout


    Then type 'Esc', '/', '12cf' to go to the first object, copy the full output of it's entry then press 'n' to go to next one and do the same for all 3 vmdk objects.
    This will give you the object UUIDs of the 3 vmdk objects ('Docker Server 2.0.vmdk', 'Docker Server 2.0-000001.vmdk' and 'Docker Server 2.0_1.vmdk'), do a lookup of these from any node in the cluster and save the output:

    # /usr/lib/vmware/osfs/bin/objtool getAttr -u UUIDGoesHere


    cd to the namespace directory of the donor-VM and cat each of the 3 replacement vmdk descriptors you should have and save the output.

     

    Just to re-iterate the correct steps for doing this:
    1. Identify the object UUIDs and sizes (SIZE not USED space) from esxcli vsan debug object list for all vmdks belonging to the VM (including snapshots).
    2. Create a donor-VM for ease of creating vmdk descriptors (this can be done manually or via other means but this is simplest), this should have the same number of disks as base-vmdks and they should be the exact same size as shown in debug object list. You only have a single snapshot on one of the disks but not the other here - this can have a descriptor created for just it (but not the other disk) by detaching the vmdk that doesn't have a snapshot then 'take snapshot' of the VM and then re-add the other disk. Generally one can guess by size which is the boot vmdk and thus which should be attached to vscsi0:0 .
    3. Go to the namespace of the donor-VM and edit each of the vmdks using vi, replacing the object UUID (which currently point to empty new objects which can be later deleted) with the original vmdk object UUID e.g. if object UUID is 'a1c26155-5678-1012-a1fb-ecf4bbcfca20' in the vmdk you would replace 'a5c26155-1234-b111-bc9c-ecf4bbcfca20' with 'a1c26155-5678-1012-a1fb-ecf4bbcfca20' in this line of the vmdk:
    RW 46800640 VMFS "vsan://a5c26155-1234-b111-bc9c-ecf4bbcfca20"
    4. The original objects are still trying to be resolved at their original vmdk location, this needs to be changed to the new vmdks in the new namespace location of the donor-VM, this should be done for each vmdk object using:

    # /usr/lib/vmware/osfs/bin/objtool setAttr -u <Object UUID> -d <Path to VMDK>


    e.g.:

    # /usr/lib/vmware/osfs/bin/objtool setAttr -u a1c26155-5678-1012-a1fb-ecf4bbcfca20 -d "/vmfs/volumes/vsan:1234abcd56781234-1234abcd56781234/DonorVMNamespaceUUID/Donor-VM.vmdk"

     

    This kb covers most of this and can be used as a reference for most parts https://kb.vmware.com/s/article/70774.



  • 8.  RE: Missing objects on vSAN

    Posted Jan 14, 2024 07:04 PM

     

    I managed to get the leyvs to appear. I even succeeded in rebooting the virtual machine twice. Unfortunately, it only contained a portion of the files and not a single important file. I suspect that the issue is with the disk named 000001. I'm not sure if it automatically connects to the correct file extension. Additionally, mounting the second disk was unsuccessful.

    Now I don't know what to do next. It feels like with every attempt, I'm just digging myself into a deeper hole. Any suggestions on what to try next?