VSAN health check - component metadata health

View Only

Back to discussions

Expand all | Collapse all

VSAN health check - component metadata health

1. VSAN health check - component metadata health

Recommend
bardtr
Posted Aug 31, 2015 11:20 AM

Reply Reply Privately
Hi,
I just installed VSAN health plugin on vCenter Appliance Server and four brand new servers. Servers consist of a total of 8 SSD's each.
Just after checking basic test in vCenter I saw this metadata error on one of the hosts:
Have anyone encountered this error before and found a solution? When looking at vmware's pages and FAQ it seems error can be related to RAID controller, SSD failure and other issues.
Any other suggestions of pinpointing location of this metadata health error?
Thanks and regards :smileyhappy:
BT
2. RE: VSAN health check - component metadata health

Recommend
srodenburg
Posted Mar 15, 2016 10:19 PM

Reply Reply Privately
In my environment, I got this message directly after upgrading everything feom 6.0 U1 to 6.0 U1b.
I have no idea how to solve it. There is a useless KB Article (VMware KB: Virtual SAN Health Service - Physical Disk Health - Metadata Health).
If only i could find out on which disk that component is sitting and what that component is.
Looking everywhere and anywhere, all the disks, VM etc. etc. etc. are all fine. So i have no idea what this message would mean.
I tried to find out that object with RVC to no avail (i blame my limited experience with rvc for that).
3. RE: VSAN health check - component metadata health

Recommend
Broadcom Employee

Cormac Hogan
Posted Mar 16, 2016 09:00 AM

Reply Reply Privately
I agree - this is a real drag. Unfortunately, for you, it will have to be via RVC.
First, search on the component UUID, to get the disk UUID:
/localhost/Cork-Datacenter/computers> vsan.cmmds_find 0 -u dc3ae056-0c5d-1568-8299-a0369f56ddc0
+---+-------------+--------------------------------------+-------------------------+---------+-----------------------------------------------------------+
| # | Type        | UUID                                 | Owner                   | Health | Content                                                   |
+---+-------------+--------------------------------------+-------------------------+---------+-----------------------------------------------------------+
| 1 | LSOM_OBJECT | dc3ae056-0c5d-1568-8299-a0369f56ddc0 | esxi-hp-05.rainpole.com | Healthy | {"diskUuid"=>"52e5ec68-00f5-04d6-a776-f28238309453",      |
|   |             |                                      |                         |         | "compositeUuid"=>"92559d56-1240-e692-08f3-a0369f56ddc0", |
|   |             |                                      |                         |         | "capacityUsed"=>167772160,                               |
|   |             |                                      |                         |         | "physCapacityUsed"=>167772160,                           |
|   |             |                                      |                         |         | "dedupUniquenessMetric"=>0,                              |
|   |             |                                      |                         |         | "formatVersion"=>1}                                      |
+---+-------------+--------------------------------------+-------------------------+---------+-----------------------------------------------------------+
/localhost/Cork-Datacenter/computers>
Now that you have the diskUuid, you can use that in the next command:
/localhost/Cork-Datacenter/computers> vsan.cmmds_find 0 -t DISK -u 52e5ec68-00f5-04d6-a776-f28238309453
+---+------+--------------------------------------+-------------------------+---------+-------------------------------------------------------+
| # | Type | UUID                                 | Owner                   | Health | Content                                               |
+---+------+--------------------------------------+-------------------------+---------+-------------------------------------------------------+
| 1 | DISK | 52e5ec68-00f5-04d6-a776-f28238309453 | esxi-hp-05.rainpole.com | Healthy | {"capacity"=>145303273472,                            |
|   |      |                                      |                         |         | "iops"=>100,                                         |
|   |      |                                      |                         |         | "iopsWritePenalty"=>10000000,                        |
|   |      |                                      |                         |         | "throughput"=>200000000,                             |
|   |      |                                      |                         |         | "throughputWritePenalty"=>0,                         |
|   |      |                                      |                         |         | "latency"=>3400000,                                  |
|   |      |                                      |                         |         | "latencyDeviation"=>0,                               |
|   |      |                                      |                         |         | "reliabilityBase"=>10,                               |
|   |      |                                      |                         |         | "reliabilityExponent"=>15,                           |
|   |      |                                      |                         |         | "mtbf"=>1600000,                                     |
|   |      |                                      |                         |         | "l2CacheCapacity"=>0,                                |
|   |      |                                      |                         |         | "l1CacheCapacity"=>16777216,                         |
|   |      |                                      |                         |         | "isSsd"=>0,                                          |
|   |      |                                      |                         |         | "ssdUuid"=>"52bbb266-3a4e-f93a-9a2c-9a91c066a31e",   |
|   |      |                                      |                         |         | "volumeName"=>"NA",                                  |
|   |      |                                      |                         |         | "formatVersion"=>"3",                                |
|   |      |                                      |                         |         | "devName"=>"naa.600508b1001c5c0b1ac1fac2ff96c2b2:2", |
|   |      |                                      |                         |         | "ssdCapacity"=>0,                                    |
|   |      |                                      |                         |         | "rdtMuxGroup"=>80011761497760,                       |
|   |      |                                      |                         |         | "isAllFlash"=>0,                                     |
|   |      |                                      |                         |         | "maxComponents"=>47661,                              |
|   |      |                                      |                         |         | "logicalCapacity"=>0,                                |
|   |      |                                      |                         |         | "physDiskCapacity"=>0,                               |
|   |      |                                      |                         |         | "dedupScope"=>0}                                     |
+---+------+--------------------------------------+-------------------------+---------+-------------------------------------------------------+
/localhost/Cork-Datacenter/computers>
In the devName field above, you now have the NAA id (the SCSI id) of the disk.
I will leave some feedback on the KB on how to determine the disk through RVC.
A word of caution however - this health check is transitory in nature. This does not mean that there is anything inherently wrong with the device, It could be that there is some peak load running on the system temporarily, and that the threshold set for the health check has been passed. I would regularly revisit the health check and periodically test to see if the check is still failing. If you are still concerned, please discuss it with our support organisation.
4. RE: VSAN health check - component metadata health

Recommend
srodenburg
Posted Apr 04, 2016 01:56 PM

Reply Reply Privately
in my case, i kept having the error for weeks. But all ran perfectly fine and the device was good.
The error went away when i upgraded to 6.2 and the FIlesystem was upgraded from v2 to v3. Due to the "re-allocate everything on a node -> upgrade fs -> move it all back" action during the upgrade, the data was apparently rebuild "correctly" and the false data got deleted. Something along those lines.
5. RE: VSAN health check - component metadata health

Recommend
Broadcom Employee

Cormac Hogan
Posted Apr 05, 2016 12:55 PM

Reply Reply Privately
Yes - we are aware of a cosmetic issue around this health check where it can give a false positive with a status of "invalid state", not "failed". We're working to have that addressed.
In the meantime, if anyone sees this error and wants to check whether it is a false positive, open an SR with our support folks and they can verify it for you once they have the logs.
6. RE: VSAN health check - component metadata health

Recommend
Nocturne
Posted Sep 29, 2016 12:48 PM

Reply Reply Privately
Hi
When its allowed, i like to explain my workaround for this issue. Please correct me, when i am wrong, but this helps me to solve the "wrong" status. Its possible a one of hundred solution for this little problem.
1. I put the host who stands in the Host field (Component of issues) in maintenance mode an choose "Full Data Migration". I am not sure, if this task is necessary.
2. When the data was fully migrated, i have rebooted the affected host. Because i remembered me, that there was a VSAN Preparing Process or something else where the VSAN component for this host will reinitialize?
After the reboot, the health check was completely green and successful.
I hope this can be a input for others and perhaps someone can verify this procedure?
7. RE: VSAN health check - component metadata health

Recommend
Csh2
Posted Oct 02, 2016 04:31 PM

Reply Reply Privately
Hi, Nocturne
It seems this problem has been solved recently. If not there is a solution: Component metadata health check fails with invalid state error (2145347) | VMware KB
From that you have to remove the disk from the disk group or destroy(recreate) the entire disk group

vSAN1

VSAN health check - component metadata health

bardtrAug 31, 2015 11:20 AM

srodenburgMar 15, 2016 10:19 PM

Cormac HoganMar 16, 2016 09:00 AM

srodenburgApr 04, 2016 01:56 PM

Cormac HoganApr 05, 2016 12:55 PM

NocturneSep 29, 2016 12:48 PM

Csh2Oct 02, 2016 04:31 PM

1. VSAN health check - component metadata health

2. RE: VSAN health check - component metadata health

3. RE: VSAN health check - component metadata health

4. RE: VSAN health check - component metadata health

5. RE: VSAN health check - component metadata health

6. RE: VSAN health check - component metadata health

7. RE: VSAN health check - component metadata health