vSAN1

 View Only
  • 1.  VSAN health check - component metadata health

    Posted Aug 31, 2015 11:20 AM

    Hi,

    I just installed VSAN health plugin on vCenter Appliance Server and four brand new servers. Servers consist of a total of 8 SSD's each.

    Just after checking basic test in vCenter I saw this metadata error on one of the hosts:

    Have anyone encountered this error before and found a solution? When looking at vmware's pages and FAQ it seems error can be related to RAID controller, SSD failure and other issues.

    Any other suggestions of pinpointing location of this metadata health error?

    Thanks and regards :smileyhappy:

    BT



  • 2.  RE: VSAN health check - component metadata health

    Posted Mar 15, 2016 10:19 PM

    In my environment, I got this message directly after upgrading everything feom 6.0 U1 to 6.0 U1b.

    I have no idea how to solve it. There is a useless KB Article (VMware KB: Virtual SAN Health Service - Physical Disk Health - Metadata Health).

    If only i could find out on which disk that component is sitting and what that component is.

    Looking everywhere and anywhere, all the disks, VM etc. etc. etc. are all fine. So i have no idea what this message would mean.

    I tried to find out that object with RVC to no avail (i blame my limited experience with rvc for that).



  • 3.  RE: VSAN health check - component metadata health

    Broadcom Employee
    Posted Mar 16, 2016 09:00 AM

    I agree - this is a real drag. Unfortunately, for you, it will have to be via RVC.

    First, search on the component UUID, to get the disk UUID:

    /localhost/Cork-Datacenter/computers> vsan.cmmds_find 0 -u dc3ae056-0c5d-1568-8299-a0369f56ddc0

    +---+-------------+--------------------------------------+-------------------------+---------+-----------------------------------------------------------+

    | # | Type        | UUID                                 | Owner                   | Health  | Content                                                   |

    +---+-------------+--------------------------------------+-------------------------+---------+-----------------------------------------------------------+

    | 1 | LSOM_OBJECT | dc3ae056-0c5d-1568-8299-a0369f56ddc0 | esxi-hp-05.rainpole.com | Healthy | {"diskUuid"=>"52e5ec68-00f5-04d6-a776-f28238309453",      |

    |   |             |                                      |                         |         |  "compositeUuid"=>"92559d56-1240-e692-08f3-a0369f56ddc0", |

    |   |             |                                      |                         |         |  "capacityUsed"=>167772160,                               |

    |   |             |                                      |                         |         |  "physCapacityUsed"=>167772160,                           |

    |   |             |                                      |                         |         |  "dedupUniquenessMetric"=>0,                              |

    |   |             |                                      |                         |         |  "formatVersion"=>1}                                      |

    +---+-------------+--------------------------------------+-------------------------+---------+-----------------------------------------------------------+

    /localhost/Cork-Datacenter/computers>

    Now that you have the diskUuid, you can use that in the next command:

    /localhost/Cork-Datacenter/computers> vsan.cmmds_find 0 -t DISK -u 52e5ec68-00f5-04d6-a776-f28238309453

    +---+------+--------------------------------------+-------------------------+---------+-------------------------------------------------------+

    | # | Type | UUID                                 | Owner                   | Health  | Content                                               |

    +---+------+--------------------------------------+-------------------------+---------+-------------------------------------------------------+

    | 1 | DISK | 52e5ec68-00f5-04d6-a776-f28238309453 | esxi-hp-05.rainpole.com | Healthy | {"capacity"=>145303273472,                            |

    |   |      |                                      |                         |         |  "iops"=>100,                                         |

    |   |      |                                      |                         |         |  "iopsWritePenalty"=>10000000,                        |

    |   |      |                                      |                         |         |  "throughput"=>200000000,                             |

    |   |      |                                      |                         |         |  "throughputWritePenalty"=>0,                         |

    |   |      |                                      |                         |         |  "latency"=>3400000,                                  |

    |   |      |                                      |                         |         |  "latencyDeviation"=>0,                               |

    |   |      |                                      |                         |         |  "reliabilityBase"=>10,                               |

    |   |      |                                      |                         |         |  "reliabilityExponent"=>15,                           |

    |   |      |                                      |                         |         |  "mtbf"=>1600000,                                     |

    |   |      |                                      |                         |         |  "l2CacheCapacity"=>0,                                |

    |   |      |                                      |                         |         |  "l1CacheCapacity"=>16777216,                         |

    |   |      |                                      |                         |         |  "isSsd"=>0,                                          |

    |   |      |                                      |                         |         |  "ssdUuid"=>"52bbb266-3a4e-f93a-9a2c-9a91c066a31e",   |

    |   |      |                                      |                         |         |  "volumeName"=>"NA",                                  |

    |   |      |                                      |                         |         |  "formatVersion"=>"3",                                |

    |   |      |                                      |                         |         |  "devName"=>"naa.600508b1001c5c0b1ac1fac2ff96c2b2:2", |

    |   |      |                                      |                         |         |  "ssdCapacity"=>0,                                    |

    |   |      |                                      |                         |         |  "rdtMuxGroup"=>80011761497760,                       |

    |   |      |                                      |                         |         |  "isAllFlash"=>0,                                     |

    |   |      |                                      |                         |         |  "maxComponents"=>47661,                              |

    |   |      |                                      |                         |         |  "logicalCapacity"=>0,                                |

    |   |      |                                      |                         |         |  "physDiskCapacity"=>0,                               |

    |   |      |                                      |                         |         |  "dedupScope"=>0}                                     |

    +---+------+--------------------------------------+-------------------------+---------+-------------------------------------------------------+

    /localhost/Cork-Datacenter/computers>

    In the devName field above, you now have the NAA id (the SCSI id) of the disk.

    I will leave some feedback on the KB on how to determine the disk through RVC.

    A word of caution however - this health check is transitory in nature. This does not mean that there is anything inherently wrong with the device, It could be that there is some peak load running on the system temporarily, and that the threshold set for the health check has been passed. I would regularly revisit the health check and periodically test to see if the check is still failing. If you are still concerned, please discuss it with our support organisation.



  • 4.  RE: VSAN health check - component metadata health

    Posted Apr 04, 2016 01:56 PM

    in my case, i kept having the error for weeks. But all ran perfectly fine and the device was good.

    The error went away when i upgraded to 6.2 and the FIlesystem was upgraded from v2 to v3.  Due to the "re-allocate everything on a node -> upgrade fs -> move it all back" action during the upgrade, the data was apparently rebuild "correctly" and the false data got deleted. Something along those lines.



  • 5.  RE: VSAN health check - component metadata health

    Broadcom Employee
    Posted Apr 05, 2016 12:55 PM

    Yes - we are aware of a cosmetic issue around this health check where it can give a false positive with a status of "invalid state", not "failed". We're working to have that addressed.

    In the meantime, if anyone sees this error and wants to check whether it is a false positive, open an SR with our support folks and they can verify it for you once they have the logs.



  • 6.  RE: VSAN health check - component metadata health

    Posted Sep 29, 2016 12:48 PM

    Hi

    When its allowed, i like to explain my workaround for this issue. Please correct me, when i am wrong, but this helps me to solve the "wrong" status. Its possible a one of hundred solution for this little problem.

    1. I put the host who stands in the Host field (Component of issues) in maintenance mode an choose "Full Data Migration". I am not sure, if this task is necessary.

    2. When the data was fully migrated, i have rebooted the affected host. Because i remembered me, that there was a VSAN Preparing Process or something else where the VSAN component for this host will reinitialize?

    After the reboot, the health check was completely green and successful.

    I hope this can be a input for others and perhaps someone can verify this procedure?



  • 7.  RE: VSAN health check - component metadata health

    Posted Oct 02, 2016 04:31 PM

    Hi, Nocturne

    It seems this problem has been solved recently. If not there is a solution: Component metadata health check fails with invalid state error (2145347) | VMware KB

    From that you have to remove the disk from the disk group or destroy(recreate) the entire disk group