vSphere vNetwork

 View Only

 vSAN network on LACP (performance issues)

Arkady Grinshpun's profile image
Arkady Grinshpun posted Oct 07, 2024 03:22 PM

HI All,

Disclaimer: I am aware of the disadvantages of LACP as compared with VMware LBT. Nonetheless we want to try to make it work in our environment for the advantages that it does provide. The strategy is to use it only for vSAN storage traffic; all other networks, i.e.: VM, vMotion, Management, are over standard Teaming. So I am hopping for the responses that are less about “you shouldn’t complicate things with LACP” and more about possible reasons for my issues and how to talk to or what to ask my network team, to get on the same page with them.

The cluster is made up of all certified hardware, AF with NVMe cache tier and SAS capacity tier. One disk group per host.
vSphere 8. The vSAN network is air-gapped and on its own non-routed private VLAN.
HCIBench results show what I view as very high Write and Read latency, especially Write latency. To be honest I am not really well versed to assess what is or isn’t high latency; however my baseline is a different similarly configured cluster with similar hardware minus the NVMe cache and minus the LACP. Both clusters are in the same organization, backed by the same distribution layer cisco switches. The cluster with LACP is performing 10-20 times worse on the same benchmark tests, with equivalent storage policies. I am suspecting misconfigured LACP on the VMware side or the MLAG pSwitch side. Please point me in the write direction, I am afraid to put production VMs on the cluster with NVMe cache which is performing slower than a cluster with SAS cache. Between NVMe cache and increased bandwidth by LACP I was expecting this cluster to fly. This is what LACP configuration looks like on the dvSwitch:
Name    vSAN-LAG

Number of ports               2

Mode    Active

Timeout               Slow

Load balancing mode     Source and destination IP address, TCP/UDP port and VLAN

GvinPin's profile image
GvinPin

Aside comparison LACP/LBT please read this article. It can have some clues and how to troubleshoot.

https://www.vviking.nl/vmware/lacp-and-vsphere-esxi-hosts-not-a-very-good-marriage/

Duncan Epping's profile image
vExpert Duncan Epping

I have not seen this issue myself, and I have seen various customers with LACP. personally what I would do is run vSAN on  the network which doesn't use LACP, just to test it and see how it behaves. That way you can rule out the NVMe devices as the issue at least. If it does perform without LACP you know the network configuration is problematic. Difficult to say why that is though, that would require more troubleshooting.