Sorry it's taking me so long to respond! (Was fighting other fires.)
i. Could you confirm that it was only the switches firmware that was updated, and that none of, or any other parts of the infrastructure or configuration was changed (i.e. that switches swapped out, switches config, routing, cabling, and any config on the ESXi or iSCSI target were not changed in any way, what so ever) ?
To my knowledge, just the firmware - although can't be 100% sure. The network infra is handled by someone else, and I have limited access to it. On VMware side - I do have full access, and do not see any changes made to ESXis or to the iSCSI system.
(One possible relevant bit of info: the affected ESXis (3 out of 4) pre-date me joining the team, i.e. configured by someone else originally. The last one, that is unaffected by the change - added to the cluster and configured by yours truly. I pored over network config pages on all ESXis trying to zero in on what could be different between the affected and unaffected ESXis - can't find anything. Did the same in Meraki - ditto.)
ii. How many switches are involved, if more than one, which ESXi and iSCSI target is connected to which switch ?
Around 4-5: there are four 10Gb ports on each switch, each connected system uses 2 of them, and between 4 ESXis and 1 iSCSI target, ten 10Gb ports are used across a number of switches.
iii. Is there only one iSCSI target for all of the four ESXi servers ?
Two for the 1st three ESXis, one for the last one. The 2nd one is a Dell/EMC ME4024 flash array direct-attached (via direct 10Gb links, no switches involved) to the 1st three ESXis.
iv. Are all of the switches at the same firmware version ?v. Are all the switches the same make/model/version ?
vi. Have you reviewed the switches firmware version update notes to determine what was changed ?
Yes, yes; see no notes.
vii. Are you using VLANs ?
We do use VLANs, and the ports on the switches are configured the same away across all ESXis and the iSCSI target - at least while we're troubleshooting the issue:
Type Trunk
Native VLAN <masked>
Allowed VLANs all
Access policy Open
viii. You mentioned that you spun up an 'older standalone ESXi 6.7' and a 'Windows Desktop' both of which could "also see the device". Would I be correct to assume that 'see the device' to mean that they could see the iSCSI target in question, and mount the storage ?
Correct.
ix. How many network connections do the each of the ESXi servers have ?
x. How many network connections does the iSCSI target have ?
The first three (affected) ESXis: six total:
- two 10Gb NICs for general traffic, vMotion, switched iSCSI
- two 1Gb legacy NICs: still connected but no longer active (no attached port groups, vSwitches or VMkernel adapters are connected to them)
- two 10Gb NICs for direct-attached iSCSI (ME4024 mentioned above)
The 4th (unaffected):
- two 10Gb NICs for general traffic, vMotion, switched iSCSI
- (it's not connected to ME4024)
iSCSI target: two 10Gb NICs; only one active now; the 2nd one is connected to a switch port that was disabled by our network admin as for some reason there were IP conflict alarms from Meraki on the two ports for the target despite no apparent conflicts (the IPs are different).
- I would suggest moving one of the ESXi's that cannot connect to iSCSI target to one of the switches network ports that you know is working (either the, ESXi that works after the switches firmware upgrade, ESXi 6.7 or Win Desktop (assuming 'viii' to be correct)).
- As an experiment, you could exclude the switches from the equation altogether and make an appropriate direct connection . . or alternatively via another type/make of switch.
Thank you! I'll check with the network admin on both options.