After VxRail update I am unable to set node into maintenance mode. I am getting the error "not allowed in current state"
I tried to set the node into MM using CLI which first looked like it works but the task never ended (waited until the next day) with "esxcli system maintenanceMode set -m ensureObjectAccessibility -e true"
- All VMs were already migrated off from the node successfully, so it is unlikely that a VM which could not be migrated off the node was blocking the node from entering MM.
- VSANmgmt.log was checked but nothing found
- VSANsystem.log was checked but nothing found
- services.sh used to restart the services on the node => same behavior afterwards
Hostd shows host going into MM but the task doesn't fail
2020-11-02T12:39:33.933Z info hostd[2103296] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 79104 : The host has begun entering maintenance mode.
2020-11-02T12:39:33.934Z info hostd[2100959] [Originator@6876 sub=Vimsvc.TaskManager opID=f04735b4 user=vpxuser] Task Created : haTask--vim.event.EventHistoryCollector.readNext-372223905
2020-11-02T12:39:33.935Z info hostd[2103296] [Originator@6876 sub=Vimsvc.TaskManager opID=f04735b4 user=vpxuser] Task Completed : haTask--vim.event.EventHistoryCollector.readNext-372223905 Status success2020-11-02T12:39:33.971Z info hostd[2101423] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=vim-cmd-f3-35a9 user=root] Event 79105 : Host xxx.xxx in ha-datacenter has started to enter maintenance mode
2020-11-02T12:39:33.971Z info hostd[2101423] [Originator@6876 sub=Hostsvc opID=vim-cmd-f3-35a9 user=root] Message bus proxy is stopped already.
2020-11-02T12:39:33.971Z info hostd[2100957] [Originator@6876 sub=Vimsvc.TaskManager opID=04ba010d-35b2 user=dcui:vsanmgmtd] Task Created : vmodlTask-ha-host-372223906
2020-11-02T12:39:33.971Z info hostd[2100957] [Originator@6876 sub=Vimsvc.TaskManager opID=04ba010d-35b2 user=dcui:vsanmgmtd] Task Completed : haTask--vim.TaskManager.createTask-372223904 Status success
2020-11-02T12:39:33.973Z info hostd[2103354] [Originator@6876 sub=Vimsvc.TaskManager opID=f04735b5 user=vpxuser] Task Created : haTask--vim.event.EventHistoryCollector.readNext-372223907
2020-11-02T12:39:33.973Z info hostd[2103296] [Originator@6876 sub=Vimsvc.TaskManager opID=f04735b5 user=vpxuser] Task Completed : haTask--vim.event.EventHistoryCollector.readNext-372223907 Status success
2020-11-02T12:39:33.976Z info hostd[2100957] [Originator@6876 sub=Vimsvc.TaskManager opID=f04735ba user=dcui:vsanmgmtd] Task Created : haTask-ha-host-vim.Task.UpdateDescription-372223908
The vobd.log show that the host entered MM at 12:39 on 02/11 and exited at 07:14 on 03/11 between those times the log was spammed with "Firewall configuration has changed. Operation 'enable' for rule set esxupdate succeeded. Firewall configuration has changed. Operation 'disable' for rule set esxupdate succeeded."
Just before the host existed MM there was an alert that VMNIC2, VMNIC3 were down
2020-11-02T12:39:14.987Z: [UserLevelCorrelator] 2775057927891us: [esx.audit.ssh.session.opened] SSH session was opened for 'root@xxx.xxx.xxx'.
2020-11-02T12:39:33.933Z: [GenericCorrelator] 2775076873416us: [vob.user.maintenancemode.entering] The host has begun entering maintenance mode
2020-11-02T12:39:33.933Z: [UserLevelCorrelator] 2775076873416us: [vob.user.maintenancemode.entering] The host has begun entering maintenance mode
2020-11-02T12:39:33.933Z: [UserLevelCorrelator] 2775076873812us: [esx.audit.maintenancemode.entering] The host has begun entering maintenance mode.
2020-11-02T12:44:29.351Z: [GenericCorrelator] 2775372291176us: [vob.user.ssh.session.opened] SSH session was opened for 'root@xxx.xxx.xxx
***
2020-11-03T07:11:27.937Z: [netCorrelator] 23239165us: [vob.net.vmnic.linkstate.up] vmnic vmnic0 linkstate up
2020-11-03T07:11:27.944Z: [netCorrelator] 23246255us: [vob.net.vmnic.linkstate.up] vmnic vmnic1 linkstate up
2020-11-03T07:11:27.946Z: [netCorrelator] 23248365us: [vob.net.vmnic.linkstate.down] vmnic vmnic2 linkstate down
2020-11-03T07:11:27.948Z: [netCorrelator] 23250489us: [vob.net.vmnic.linkstate.down] vmnic vmnic3 linkstate down
2020-11-03T07:11:28.002Z: [netCorrelator] 23303941us: [esx.clear.net.vmnic.linkstate.up] Physical NIC vmnic0 linkstate is up
2020-11-03T07:11:28.002Z: An event (esx.clear.net.vmnic.linkstate.up) could not be sent immediately to hostd; queueing for retry.
2020-11-03T07:11:28.002Z: [netCorrelator] 23304011us: [esx.clear.net.vmnic.linkstate.up] Physical NIC vmnic1 linkstate is up
2020-11-03T07:11:28.002Z: An event (esx.clear.net.vmnic.linkstate.up) could not be sent immediately to hostd; queueing for retry.
2020-11-03T07:11:28.002Z: [netCorrelator] 23304035us: [esx.problem.net.vmnic.linkstate.down] Physical NIC vmnic2 linkstate is down
2020-11-03T07:11:28.002Z: An event (esx.problem.net.vmnic.linkstate.down) could not be sent immediately to hostd; queueing for retry.
2020-11-03T07:11:28.002Z: [netCorrelator] 23304058us: [esx.problem.net.vmnic.linkstate.down] Physical NIC vmnic3 linkstate is down
2020-11-03T07:11:28.002Z: An event (esx.problem.net.vmnic.linkstate.down) could not be sent immediately to hostd; queueing for retry.
2020-11-03T07:11:29.664Z: [netCorrelator] 24965794us: [vob.net.vmnic.linkstate.up] vmnic vusb0 linkstate up
***
2020-11-03T07:12:59.447Z: [UserLevelCorrelator] 114269950us: [vob.user.host.boot] Host has booted.
2020-11-03T07:12:59.447Z: [GenericCorrelator] 114269950us: [vob.user.host.boot] Host has booted.
2020-11-03T07:12:59.447Z: [UserLevelCorrelator] 114270270us: [esx.audit.host.boot] Host has booted.
***
2020-11-03T07:14:28.003Z: Successfully sent event (esx.audit.net.firewall.config.changed) after 1 failure.
2020-11-03T07:14:28.003Z: Successfully sent event (esx.audit.dcui.enabled) after 1 failure.
2020-11-03T07:14:28.003Z: Successfully sent event (esx.audit.shell.enabled) after 1 failure.
2020-11-03T07:14:28.003Z: Successfully sent event (esx.problem.clock.correction.adjtime.sync) after 1 failure.
2020-11-03T07:14:42.043Z: [GenericCorrelator] 216865712us: [vob.user.maintenancemode.exited] The host has exited maintenance mode
2020-11-03T07:14:42.043Z: [UserLevelCorrelator] 216865712us: [vob.user.maintenancemode.exited] The host has exited maintenance mode
2020-11-03T07:14:42.043Z: [UserLevelCorrelator] 216866113us: [esx.audit.maintenancemode.exited] The host has exited maintenance mode.
I thought if you just disable HA before remediation the process should work. As per this doc https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.update_manager.doc/GUID-90AA4FDB-B29B-4F7F-A400-38EFC4110024.html
I did disable the HA in the cluster but the behavior is still the same. I am not able to place a node in to MM using vCenter (results in Operation not allowed in current state) and the CLI reacts the same, so the node starts to enter MM with ensure accessibility but never ends the job.
There is some errors quoting 'firewall' could this be the issue?
Any help is appreciated, thank you