vSAN1

 View Only
  • 1.  Not assignable vsan device error only on witness appliance

    Posted Aug 21, 2024 10:15 AM

    I have some errors on a witness appliance version 7.03 23794027.

    vSAN device 52ab0112-7544-104f-2bc0-378c1cb3d174 is being repaired due to I/O failures, and will be out of service until the repair is complete. If the device is part of a dedup disk group, the entire disk group will be out of service until the repair is complete.

    vSAN device 52ab0112-7544-104f-2bc0-378c1cb3d174 has gone offline.

    The vSAN hosts have no alarms or messages but the vSAN health monitor starts to report object problems because it seems that the witness appliance stops working when this happens. A reboot of the appliance repairs all problems and the system works for some weeks. The out-of-band management of the hardware host shows no problems and the esxi which hosts the appliance has no problems too.

    Did anybody experience similar problems?



  • 2.  RE: Not assignable vsan device error only on witness appliance

    Posted Aug 22, 2024 07:07 AM

    I did have an issue similar to what you are seeing in my VSAN last week. The host showed no problem, but Skyline Health reported a disk issue. I could see the physical disk issue when I clicked on the cluster and then went to Configure then Disk Management under VSAN. It showed an unhealthy host. When I selected that host, and went to storage devices, I saw a "missing\dead" disk. I contacted Broadcom support since the DRAC reported the hardware as fine. They insisted it was a hardware issue and to contact the vendor. Of course, Dell made me upgrade the DRAC, BIOS, Array Controller, and disk firmware. I will say when I check storage and disk, it showed the drive as good, but 0 capacity. Dell ended up replacing the drive and the error went away. 




  • 3.  RE: Not assignable vsan device error only on witness appliance
    Best Answer

    Posted Aug 22, 2024 04:38 PM
    @mpac, this is typically caused by one of two things:
     
    1. You have included the Witness Appliance VM in scheduled snapshot-based backups - taking snapshots on the Witness Appliance is not supported and the stun from snapshot create/consolidate can cause the disks to go offline, this can also irreparably damage Witnesses so don't do this.
    If this is the case it is easy to confirm, just check the tasks on the VM, you can also correlate the timing of the disks going offline with the backup/snapshot tasks.
     
    2. You have flakey backing storage where the VMs data is stored and this is going APD/PDL intermittently - this could have so many different causes that I won't go into it here, but pretty simple to confirm by determining are other VMs on the same datastore/array impacted at the same time and what is observed in vmkernel.log at the time of occurrence.