vSAN1

 View Only

 Patch with vCenter state is authoritative error

Jump to  Best Answer
BobTheBuilder001's profile image
BobTheBuilder001 posted Nov 13, 2024 08:29 AM

Hello All!

I have a 3-node vSphere cluster with vSAN. Recently the Unicastagent list was fixed manually, and the vCenter says "vCenter state is authoritative". I looked after the error and I found the related KB, which says I have to click on "Upgrade ESXi configuration" in the Monitoring -> vSAN Health. The problem is that the vCenter version is 6.5U1g and I can use only the vSphere Client (HTML5) version, in which the vSAN-related functions are not implemented yet.

In another thread I read that clicking on the  "Upgrade ESXi configuration" will resync the Unicastagent data of the nodes with the vCenter and if the settings on the hosts are correct, then it will not change anything. Furthermore, I learned that also if the settings are not matching with the vCenter settings in the worst-case scenario some nodes get isolated, and I have to fix it manually again, but the VMs and data will not lost or damaged. (Is that correct?)

I double-checked the vSAN IP and UUID pairs with "esxcli vsan cluster unicastagent list" on all nodes with CLI command, and I checked the "esxcli vsan cluster get" command on each host which shows 3 members on all nodes.

My questions:
1) Since I can not use the "Upgrade ESXi configuration" in this HTML client version, is there any command that I can execute to do the same thing from CLI?
2) Is it safe to patch the vCenter up to 6.5U3 with this error present? (Then I can remediate the error because the vSAN related functions become available)

Thank you for your answers in advance!

TheBobkin's profile image
TheBobkin  Best Answer

Hello @BobTheBuilder001

"Recently the Unicastagent list was fixed manually, and the vCenter says "vCenter state is authoritative"."
The first thing I would be suspicious of here is 'why' - generally if that is necessary then either there was some change causing an outage (e.g. someone changed vSAN-VLAN to one that didn't exist on vDS and nodes isolated and vCenter went down) or there was some issue which caused vCenter to remove some nodes unicastagent entries from some/all nodes (e.g. ESXi cert issue or vDS sync/entity state issue). If it is the former then there shouldn't be too much concern that vC will do the same thing again resulting in the same issue, but if it is the latter then you should get the root cause of that resolved before re-pushing configuration via 'Update ESXi configuration'.

"Furthermore, I learned that also if the settings are not matching with the vCenter settings in the worst-case scenario some nodes get isolated, and I have to fix it manually again, but the VMs and data will not lost or damaged. (Is that correct?)"
Correct, but note that you can and should avoid it breaking the unicastagent list(s) again, some cause of that are fixed in some vC versions but the 3 things I check before ever repushing list is 1. vSphere client shows all vmk that should be tagged for vsan-traffic as having the tag (e.g. conforms with the output of 'esxcli vsan network list' on the node) - if it doesn't then there is an issue with the vC vDS entries/entities, 2. vsanmgmtd is running on all nodes and 3. All host show valid and non-expired host certificate in the vSphere client.
Note as well that if 'esxcfg-advcfg -g /VSAN/IgnoreClusterMemberListupdates' is enabled on a node that applying 'Update ESXi configuration' won't do anything nor clear the health alert state as the nodes have been set to ignore any update/check/change.

"1) Since I can not use the "Upgrade ESXi configuration" in this HTML client version, is there any command that I can execute to do the same thing from CLI?"
No, not aware of any (RVC doesn't have such function and this pre-dates vSAN MOB).

"2) Is it safe to patch the vCenter up to 6.5U3 with this error present? (Then I can remediate the error because the vSAN related functions become available)"
It should be yes, you can enable /VSAN/IgnoreClusterMemberListupdates on all nodes to prevent any further node isolation in the interim, then disable that once you have the option in Skyline Health to 'Update ESXi configuration' and use that.

Any reason you are running on such old ESXi+vC build here? If for hardware/other reasons you are bound to 6.x then you should at the minimum update to latest 6.7 U3 build.

BobTheBuilder001's profile image
BobTheBuilder001

Hello @TheBobkin!

Thank you for your answer!
The cluster was recovered manually after all ESXi hosts' certificates expired and the nodes were isolated.

However, you hit the nail, because the first host (let's name it ESX1) vmk2 in the vCenter shows there is no 'Enabled service' for it, but it should be the vSAN vmk. This vmk2 is set on the other 2 hosts as a vSAN service adapter. Other settings (IP, VLAN ID, etc. are okay) However, the 'esxcli vsan network list' command also shows the vmk2 as the vSAN adapter on the ESX1 similar to the other hosts. I think this is the discrepancy that caused the error message. Am I thinking right?

If I edit the vmk2 and set the service back to vSAN on the vCenter UI could it be the right solution?
Do I need to enable "/VSAN/IgnoreClusterMemberListupdates" on all nodes before the vCenter setting change and then disable back or it is not necessary?
FYI: vCenter is running on ESX1 hosts but on a dedicated VMFS not on vSAN datastore.

I have checked the vsanmgmtd service and certificates also, the former are running on all hosts, and the latter are valid on all hosts.

Thank you for your help!

BobTheBuilder001's profile image
BobTheBuilder001

Hello @TheBobkin!

Finally, I did the vC patch upgrade, after enabling "/VSAN/IgnoreClusterMemberListupdates" on the nodes. Now, I have the Skyline Health, and it surprisingly shows the ESX2 and ESX3 are out-of-sync. And the last update was in July?? updated by the current vC :)

My plan is to Edit the vmk2 in the vCenter for the ESX1 host and set it to a vSAN adapter (as it would be). Then disable the "/VSAN/IgnoreClusterMemberListupdates" on the nodes, finally, I will initiate an 'Update ESXi configuration'.
Is there any glitch in this plan that I have to pay attention to?


Thank you for your answer in advance!

BobTheBuilder001's profile image
BobTheBuilder001

Hello @TheBobkin!

I tried to change the vmk2 adapter tag to vSAN in the vCenter, and although it shows the task completed the Enabled service is still empty for the adapter. I checked the vmk2 tag on the ESXi host with "esxcli network ip interface tag get -i vmk2" and it shows vSAN.

I tried with disabled "/VSAN/IgnoreClusterMemberListupdates" but it still does not change in the vCenter, but the task runs with completed status.

Do you have any idea?

TheBobkin's profile image
TheBobkin

Hello @BobTheBuilder001

"However, the 'esxcli vsan network list' command also shows the vmk2 as the vSAN adapter on the ESX1 similar to the other hosts. I think this is the discrepancy that caused the error message. Am I thinking right?"
Yes, that is the source of the issue - due to the issue with the vDS/vmk entry in vCenter it considers this vmk not tagged for vsan-traffic and thus reacts by removing that nodes unicastagent entry from the other nodes (like what would happen if you actually did untag vsan-traffic on it).

"I tried to change the vmk2 adapter tag to vSAN in the vCenter, and although it shows the task completed the Enabled service is still empty for the adapter. I checked the vmk2 tag on the ESXi host with "esxcli network ip interface tag get -i vmk2" and it shows vSAN."
Tagging vsan-traffic (or any other traffic type e.g. vmotion) won't work here, you can try put the node in Maintenance Mode, remove the vmk completely, re-add it back and validate it can be configured for vsan-traffic, if it still can't then with the node in Maintenance Mode, remove it from vSphere inventory and re-add it, then recheck that it can have traffic types tagged correctly.

BobTheBuilder001's profile image
BobTheBuilder001

Hello @TheBobkin!

In the meantime, a hardware issue was on the ESX3, so it was restarted. Surprisingly, after the restart, the vmk2 on the ESX1 showed the vSAN tag as we expected! So I did a quick recheck on the settings and finally, I clicked on the "Upgrade ESXi configuration". That solved the "vCenter state is authoritative" error and now the cluster runs without any error.

Thank you for being so helpful!

Bests!