@mpac, this is typically caused by one of two things:
1. You have included the Witness Appliance VM in scheduled snapshot-based backups - taking snapshots on the Witness Appliance is not supported and the stun from snapshot create/consolidate can cause the disks to go offline, this can also irreparably damage Witnesses so don't do this.
If this is the case it is easy to confirm, just check the tasks on the VM, you can also correlate the timing of the disks going offline with the backup/snapshot tasks.
2. You have flakey backing storage where the VMs data is stored and this is going APD/PDL intermittently - this could have so many different causes that I won't go into it here, but pretty simple to confirm by determining are other VMs on the same datastore/array impacted at the same time and what is observed in vmkernel.log at the time of occurrence.
Original Message:
Sent: Aug 21, 2024 10:14 AM
From: mpac
Subject: Not assignable vsan device error only on witness appliance
I have some errors on a witness appliance version 7.03 23794027.
vSAN device 52ab0112-7544-104f-2bc0-378c1cb3d174 is being repaired due to I/O failures, and will be out of service until the repair is complete. If the device is part of a dedup disk group, the entire disk group will be out of service until the repair is complete.
vSAN device 52ab0112-7544-104f-2bc0-378c1cb3d174 has gone offline.
The vSAN hosts have no alarms or messages but the vSAN health monitor starts to report object problems because it seems that the witness appliance stops working when this happens. A reboot of the appliance repairs all problems and the system works for some weeks. The out-of-band management of the hardware host shows no problems and the esxi which hosts the appliance has no problems too.
Did anybody experience similar problems?