Hi Bob,
Good point - no, the logs don't show UNDERRUN
I have restarted one of the hosts, and run HCIbench to give the vSAN some work, rather than leaving it over night idling (which would also fail). Attached is the vmkernel.log which contains the following highlights towards the end, when the Disks were failed:
AHCI is really unhappy..
2020-06-09T20:49:48.922Z cpu0:526070)osfs: OSFS_GetMountPointList:3696: mountPoints[0] inUse pid [ vsan], cid 527d400a075db425-2a34056e0cf09036
2020-06-09T20:49:48.922Z cpu0:526070)osfs: OSFS_GetMountPointList:3696: mountPoints[0] inUse pid [ vsan], cid 527d400a075db425-2a34056e0cf09036
2020-06-09T20:50:00.247Z cpu10:530622)osfs: OSFS_GetMountPointList:3696: mountPoints[0] inUse pid [ vsan], cid 527d400a075db425-2a34056e0cf09036
2020-06-09T20:50:18.751Z cpu10:526031)vmw_ahci[00000017]: CompletionBottomHalf:Error port=2, PxIS=0x08000000, PxTDF=0xc0,PxSERR=0x00400100, PxCI=0x000001c0, PxSACT=0x000001f8, ActiveTags=0x000001f8
2020-06-09T20:50:18.751Z cpu10:526031)vmw_ahci[00000017]: CompletionBottomHalf:SCSI cmd 0x2a on slot 6 lba=0x895280, lbc=0x80
2020-06-09T20:50:18.751Z cpu10:526031)vmw_ahci[00000017]: CompletionBottomHalf:cfis->command= 0x61
2020-06-09T20:50:18.751Z cpu10:526031)vmw_ahci[00000017]: LogExceptionSignal:Port 2, Signal: --|--|--|--|--|TF|--|--|--|--|--|-- (0x0020) Curr: --|--|--|--|--|--|--|--|--|--|--|-- (0x0000)
2020-06-09T20:50:18.751Z cpu8:524909)vmw_ahci[00000017]: LogExceptionProcess:Port 2, Process: --|--|--|--|--|TF|--|--|--|--|--|-- (0x0020) Curr: --|--|--|--|--|TF|--|--|--|--|--|-- (0x0020)
2020-06-09T20:50:18.751Z cpu8:524909)vmw_ahci[00000017]: ExceptionHandlerWorld:Performing device reset due to Task File Error.
2020-06-09T20:50:18.751Z cpu8:524909)vmw_ahci[00000017]: ExceptionHandlerWorld:hardware stop on slot 0x6, activeTags 0x000001f8
2020-06-09T20:50:18.752Z cpu8:524909)vmw_ahci[00000017]: _IssueComReset:Issuing comreset...
2020-06-09T20:50:18.827Z cpu8:524909)vmw_ahci[00000017]: _IssueComReset:Issuing comreset...
2020-06-09T20:50:18.835Z cpu8:524909)vmw_ahci[00000017]: ExceptionHandlerWorld:fail a command on slot 4
2020-06-09T20:50:18.835Z cpu8:524909)vmw_ahci[00000017]: IssueCommand:tag: 3 already active during issue, reissue_flag:1
2020-06-09T20:50:18.835Z cpu7:525052)NMP: nmp_ThrottleLogForDevice:3856: Cmd 0x2a (0x453ffb9d6580, 0) to dev "t10.ATA_____Samsung_SSD_860_QVO_1TB_________________S4CZNF0N368707N_____" on path "vmhba0:C0:T2:L0" Failed:
2020-06-09T20:50:18.835Z cpu8:524909)vmw_ahci[00000017]: IssueCommand:tag: 5 already active during issue, reissue_flag:1
2020-06-09T20:50:18.835Z cpu7:525052)NMP: nmp_ThrottleLogForDevice:3865: H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44 0x0. Act:NONE. cmdId.initiator=0x430366aae5c0 CmdSN 0xcd95
2020-06-09T20:50:18.835Z cpu8:524909)vmw_ahci[00000017]: IssueCommand:tag: 6 already active during issue, reissue_flag:1
2020-06-09T20:50:18.835Z cpu7:525052)ScsiDeviceIO: 4062: Cmd(0x453ffb9d6580) 0x2a, CmdSN 0xcd95 from world 0 to dev "t10.ATA_____Samsung_SSD_860_QVO_1TB_________________S4CZNF0N368707N_____" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44 0x0.
2020-06-09T20:50:18.835Z cpu8:524909)vmw_ahci[00000017]: IssueCommand:tag: 7 already active during issue, reissue_flag:1
2020-06-09T20:50:18.835Z cpu8:524909)vmw_ahci[00000017]: IssueCommand:tag: 8 already active during issue, reissue_flag:1
2020-06-09T20:50:18.835Z cpu8:524909)vmw_ahci[00000017]: ProcessActiveCommands:Commands completed: 0, re-issued: 5
2020-06-09T20:50:18.835Z cpu7:525052)WARNING: PLOG: PLOGPropagateErrorInt:3955: Permanent error event on 52dc7fd0-9c95-5521-eb69-9b82b448d9a2
2020-06-09T20:50:18.835Z cpu11:526001)LSOM: LSOMLogDiskEvent:7628: Disk Event permanent error for MD 52dc7fd0-9c95-5521-eb69-9b82b448d9a2 (t10.ATA_____Samsung_SSD_860_QVO_1TB_________________S4CZNF0N368707N_____:2)
2020-06-09T20:50:18.835Z cpu11:526001)WARNING: LSOM: LSOMEventNotify:7877: vSAN device 52dc7fd0-9c95-5521-eb69-9b82b448d9a2 is under permanent error.
2020-06-09T20:50:18.835Z cpu7:525052)LSOMCommon: IORETRYCompleteIO:483: Throttled: 0x4540016e7940 IO type 304 (WRITE) isOrdered:NO isSplit:YES isEncr:NO since 85 msec status I/O error
2020-06-09T20:50:18.836Z cpu7:525052)WARNING: LSOMCommon: IORETRYParentIODoneCB:2219: Throttled: split status I/O error
2020-06-09T20:50:18.836Z cpu7:525052)WARNING: PLOG: PLOGElevWriteMDCb:746: MD UUID 52dc7fd0-9c95-5521-eb69-9b82b448d9a2 write failed I/O error
2020-06-09T20:50:19.836Z cpu4:525951)PLOG: PLOGElevHandleDeviceError:1024: Elevator for t10.ATA_____Samsung_SSD_860_QVO_1TB_________________S4CZNF0N368707N_____:2 UUID 52dc7fd0-9c95-5521-eb69-9b82b448d9a2 moving to cleanup state
2020-06-09T20:50:19.922Z cpu11:525951)PLOG: PLOGElevTaskComplete:3442: PLOG Elevator exited