WSFC Failing when expanded non-shared disk

View Only

Back to discussions

Expand all | Collapse all

WSFC Failing when expanded non-shared disk

cjscolApr 27, 2020 05:07 PM

I have a 2 node Windows Server Failover Cluster (WSFC). The nodes are running Windows 2012 R2. They are ...

lsimons77Jun 13, 2022 03:58 PM

Hi , Did you solve the problem? Thanks Lorenzo

1. WSFC Failing when expanded non-shared disk

Recommend
cjscol
Posted Apr 27, 2020 05:07 PM

Reply Reply Privately
I have a 2 node Windows Server Failover Cluster (WSFC). The nodes are running Windows 2012 R2. They are running on different ESXi 6.5U3 hosts.
Each node is configured with two local disks C: and D: and a shared disk E:.
Disks C: and D: are on a VMFS volume connected via FC and attached to a LSI Logic SAS SCSI Controller (SCSI0) set to Not Shared, these are C: SCSI0:0, D: SCSI0:1
Disk E: is an RDM accessed over FC and connected to a Paravirtual SCSI Controller (SCSI1) configured with Physical SCSI Bus Sharing. The RDM was attached to one node in the cluster and the other node was configured by adding an existing disk and selected the RDM pointer from the 1^st node. Disk E: is SCSI1:0 on both VMs.
There is a Role configured on the Windows Cluster containing the E: drive as a Storage resource and a number of Windows Services.
When I increase the size of the non-shared D: drive on the node that is currently running the role the E: drive resource fails on the cluster, taking the services offline as they are dependent on the E: drive.
[HKLM]\SYSTEM\CurrentControlSet\Services\disk\TimeoutValue is set to 190 on both nodes.
I also have the following 2 settings on both VMs
scsi0.returnNoConnectDuringAPD = “TRUE”
scsi0.returnBusyOnNoConnectStatus = “FALSE”
The sequence of events are: -
Increase the size of Hard Disk 2 (D:) on the VM.
In the vmkernel.log of the ESXi host the VM is running on I see
2020-04-24T16:38:42.655Z cpu1:73226)VSCSI: 6590: handle 8192(vscsi0:0):Destroying Device for world 73030 (pendCom 0)
2020-04-24T16:38:42.655Z cpu1:73226)VSCSI: 6590: handle 8193(vscsi0:1):Destroying Device for world 73030 (pendCom 0)
2020-04-24T16:38:42.655Z cpu1:73226)VSCSI: 6590: handle 8194(vscsi1:0):Destroying Device for world 73030 (pendCom 0)
2020-04-24T16:38:42.905Z cpu2:73226)VSCSI: 3801: handle 8195(vscsi0:0):Creating Virtual Device for world 73030 (FSS handle 5640191) numBlocks=125829120 (bs=512)
2020-04-24T16:38:42.905Z cpu2:73226)VSCSI: 273: handle 8195(vscsi0:0):Input values: res=0 limit=-1 bw=-1 Shares=-1
2020-04-24T16:38:42.906Z cpu2:73226)VSCSI: 3801: handle 8196(vscsi0:1):Creating Virtual Device for world 73030 (FSS handle 4067328) numBlocks=71303168 (bs=512)
2020-04-24T16:38:42.906Z cpu2:73226)VSCSI: 273: handle 8196(vscsi0:1):Input values: res=0 limit=-2 bw=-1 Shares=1000
2020-04-24T16:38:42.910Z cpu2:73226)VSCSI: 3801: handle 8197(vscsi1:0):Creating Virtual Device for world 73030 (FSS handle 5574657) numBlocks=62926605 (bs=512)
2020-04-24T16:38:42.910Z cpu2:73226)VSCSI: 273: handle 8197(vscsi1:0):Input values: res=0 limit=-1 bw=-1 Shares=-1
2020-04-24T16:38:42.912Z cpu14:73029)VSCSI: 273: handle 8196(vscsi0:1):Input values: res=0 limit=-2 bw=-1 Shares=1000
2020-04-24T16:38:42.914Z cpu14:73029)VSCSI: 273: handle 8196(vscsi0:1):Input values: res=0 limit=-2 bw=-1 Shares=1000
2020-04-24T16:38:42.914Z cpu14:73029)VSCSI: 273: handle 8196(vscsi0:1):Input values: res=0 limit=-2 bw=-1 Shares=1000
2020-04-24T16:38:42.919Z cpu20:73029)VSCSI: 273: handle 8196(vscsi0:1):Input values: res=0 limit=-2 bw=-1 Shares=1000
2020-04-24T16:38:42.920Z cpu20:73029)VSCSI: 273: handle 8196(vscsi0:1):Input values: res=0 limit=-2 bw=-1 Shares=1000
2020-04-24T16:38:42.920Z cpu20:73029)VSCSI: 273: handle 8196(vscsi0:1):Input values: res=0 limit=-2 bw=-1 Shares=1000
In the Windows System Event Log I see the following
FailoverClustering Event ID 1038 Physical Disk Resource
Ownership of cluster disk ‘Cluster Disk’ has been unexpectedly lost by this node. Run the Validate a Configuration Wizard to check your storage configuration
FailoverClustering Event ID 1069 Resource Control Manager
Cluster Resource ‘Cluster Disk’ of type ‘Physical Disk’ in clustered role ‘MyRole’ failed.
All of the services configured on the cluster role stop
And then about 30 seconds later I get the following in the Windows System Event Log
Ntfs (Microsoft-Windows-Ntfs) Event ID 98
Volume E: (\Device\HarddiskVolume5) is healthy. No action needed
The Cluster Disk resource comes back online and the services start up again, but I have had an outage in my services. I have been able to repeat this 100% of the time. Any ideas why this is happening. I have a couple of environments, one based on ESXi6.0 and the other ESXi6.5. I get the same symptoms on both.
2. RE: WSFC Failing when expanded non-shared disk

Recommend
lsimons77
Posted Jun 13, 2022 03:58 PM

Reply Reply Privately
Hi ,
Did you solve the problem?
Thanks
Lorenzo

VMware vSphere

WSFC Failing when expanded non-shared disk

cjscolApr 27, 2020 05:07 PM

lsimons77Jun 13, 2022 03:58 PM

1. WSFC Failing when expanded non-shared disk

2. RE: WSFC Failing when expanded non-shared disk