Hi,
We're upgrading our iSCSI switches to 10Gbps and I can't get the new one to work fine. I'm looking for tips and advices.
This is the setup that we currently have and is working :
- DELL EMC ME4024 (10Gbps) SAN
- 2x Cisco 2960X (1Gbps interfaces)
- 2x DELL R730 using Broadcoms 5719 and 5720 (both 1Gbps interfaces)
We are replacing the Ciscos with DELL EMC S4128T-ON 10Gbps network switches. Both new switches have the latest firmwares. I've configured the new switches and when I migrate my hosts and SAN to these DELL switches, the iSCSI performances (access to datastores) are taking a big hit. Switching back the iSCSI connections to the Ciscos fixed everything immediatly.
- Seeing CRC errors slowly increasing on the new switches on the host's interfaces.
- Doing Rescan of all HBAs will take about 10 minutes, while on Ciscos is a matter of < 1 minute.
- Starting up a VM (not Booting, only Starting it into VMware) takes about 2-3 minutes to reach 100%.
- OS in a VM will take a long time to boot up.
- ESXi hosts log multiple performances degraded against the iSCSI datastores
- If a host try to boot while connected to the new switches, it will take about 10 minutes to start while usually it takes < 5 minutes.
- Even after it is booted, many services are not running fine, including the web UI of the host.
So this leads me to believe that with the new switches in place, access to the datastores through iSCSI isn't working fine. Of course, seeing CRC errors, I first though about changing the cables. I'm only getting CRCs at the hosts levels, not the SAN ports. So I swapped the cables for known good working cables and that didn't fix it. Tried other interfaces on the switches, nope. But that happens on both S4128T-ON for both hosts, so 4 interfaces...I don't think I'm that unlucky here.
Here are some notes I took over the past days :
- Jumbo frames is enabled and confirmed working with vmkping :
- Switches interfaces and VLAN MTUs is set at 9216
- vmkernel and vswitches are using MTU 9000
- Jumbo frames are enabled on the SAN
- vmkping between hosts and SAN are working(vmkping -I vmkX x.x.x.x -d -s 8972)
- Spanning-tree disabled on the switches
- All cables are CAT6.
- I have confirmed with DELL EMC Support that the switches are properly configured according to their recommendations.
- Delayed Ack is disabled, even though it is not mandatory to do for the ME4024 SAN.
- In case of duplex/speed mismatch between the hosts and the new switches, I forced 1000/Full at the hosts level and the switch level. The link was working, but iSCSI latency was still there.
- Updated Broadcom firmwares (July 2021) from DELL's website.
- Disabled Energy Efficient/Saving on each Broadcom NIC in the BIOS to have maximum performances.
- Broadcoms are using driver ntg3 4.1.5.0-0vmw.702.0.0.17867351 in VMware.
- Confirmed certified for 7.0u2c in the VMware HCL
I am running out of ideas. It has to be a config somewhere I am missing or an incompatibility between Broadcoms (either firmware, VMware driver, etc) and the new DELL switches.
Any ideas ?
Thanks !