vSAN1

 View Only
  • 1.  RDMA functionality

    Posted Jan 31, 2025 12:33 PM

    Hello,

    now that I have cleaned up RDMA errors, seemingly got PFC working, I wanted to test, whether traffic is going via TCP or RDMA.

    Found tool "rdtbench". Ran it with -p tcp, from host to host, and it produced around 16,000Mhz of CPU usage.

    Though, if I try to run it with -p rdma, I have 0 traffic.

    I would surmize that RDMA is apparently not working.

    With -p tcp I get transferrate of 29,000Mbit (or 29 Gbit), which leads me to believe that there is some load balancing going on.

    The connection is simple. VLAN separated network on BCM57414 (N225P) NIC, 25G dual link active/active load based teaming. But also tested without Teaming, single adapter only.

    Any idea how I troubleshoot this?



  • 2.  RE: RDMA functionality

    Posted Feb 01, 2025 08:09 PM
    Edited by st-ops Feb 01, 2025 08:11 PM

    In the meantime, I also installed HCIBench and ran it.

    From what I have for numbers on top of my head, our Azure HCI Cluster had over 1mil IOPS showing in Windows Admin Center. Whether that is realistic or not, can't say. We did use only max of 15,000 IOPS though, according to charts.

    HCIBench (fio-8vmdk-100ws-4k-100rdpct-100randompct-4threads) Test however:

    CPU usage: 66%

    Cpu Utilization: 45%

    What I am wondering however, RDMA is enabled, why still high CPU usage?

    My problem is that I can whether say if these numbers above are realistic and correct, or is there something wrong with the system.

    Is RDMA working, because of high CPU usage?

    The hardware is like this:

    3 Nodes, each 16 NVMEs (Micron 7450 Max 6,4TB)

    I changed the uplink, just a single link, the 2nd uplink is in standby, still load based though.

    vSAN OSA right now, ESA is not supported with our servers. As far as I've seen configured with groups of 8 disks, 7 capacity and 1 cache per group.