vSAN1

 View Only
  • 1.  Troubleshooting RDMA RoCEv2 setup

    Posted 13 days ago

    Hi everyone,

    I'm currently trying to set up a point-to-point RDMA RoCEv2 connection between two nodes, but I'm hitting some roadblocks. I'm hoping someone can point me in the right direction.


    Issue:

    rping command doesn't work.

    On the server side:
    sudo rping -s -d -a 30.30.1.1 -Vv -C 3

    validate data
    verbose
    count 3
    created cm_id 0x651d5c7b8c80
    rdma_bind_addr successful
    rdma_listen

    On the client side:
    sudo rping -c -d -a 30.30.1.1 -I 30.30.1.2 -Vv -C 3
    validate data
    verbose
    count 3
    created cm_id 0x5db441510c50
    cma_event type RDMA_CM_EVENT_ADDR_RESOLVED cma_id 0x5db441510c50 (parent)
    cma_event type RDMA_CM_EVENT_ROUTE_RESOLVED cma_id 0x5db441510c50 (parent)
    rdma_resolve_addr - rdma_resolve_route successful
    created pd 0x5db4415054b0
    created channel 0x5db441505470
    created cq 0x5db441510fe0
    created qp 0x5db441511180
    rping_setup_buffers called on cb 0x5db4415037c0
    allocated & registered buffers...
    cq_thread started.
    cma_event type RDMA_CM_EVENT_UNREACHABLE cma_id 0x5db441510c50 (parent)
    cma event RDMA_CM_EVENT_UNREACHABLE, error -110
    wait for CONNECTED state 4
    connect error -1


    I tried also the ucmatose command with the same result

    On both nodes the RDMA link seems active:
    rdma link show
    link roceo12399/1 state ACTIVE physical_state LINK_UP netdev eno12399np0 
    link roceo12409/1 state ACTIVE physical_state LINK_UP netdev eno12409np1 


    Note:

    The ibv_rc_pingpong command works.

     
    On the server side:
    sudo ibv_rc_pingpong -d roceo12409 -i 1 -g 3
     local address:  LID 0x0000, QPN 0x000084, PSN 0x81ec1a, GID ::ffff:30.30.1.1
     remote address: LID 0x0000, QPN 0x00008a, PSN 0x6d2f08, GID ::ffff:30.30.1.2
    8192000 bytes in 0.02 seconds = 4068.79 Mbit/sec
    1000 iters in 0.02 seconds = 16.11 usec/iter

    On the client side:

    sudo ibv_rc_pingpong -d roceo12409 -i 1 -g 3 30.30.1.1
    local address:  LID 0x0000, QPN 0x00008a, PSN 0x6d2f08, GID ::ffff:30.30.1.2
    remote address: LID 0x0000, QPN 0x000084, PSN 0x81ec1a, GID ::ffff:30.30.1.1
    8192000 bytes in 0.02 seconds = 4137.63 Mbit/sec
    1000 iters in 0.02 seconds = 15.84 usec/iter

    Hardware/Environment:

    NICs: BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller

    OS/Kernel: Ubuntu 24.04, Kernel 6.8.0-90-generic

    Drivers: bnxt_en

    Firmware version: 227.0.134.0/pkg 22.71.11.13

    Any advice on debugging would be greatly appreciated!



    -------------------------------------------


  • 2.  RE: Troubleshooting RDMA RoCEv2 setup

    Posted 11 days ago
    "I'm currently trying to set up a point-to-point RDMA RoCEv2 connection between two nodes"
     
    RDMA is not supported for vSAN traffic in 2-node clusters:
    https://techdocs.broadcom.com/us/en/vmware-cis/vsan/vsan/8-0/planning-and-deployment/working-with-virtual-san-stretched-cluster/introduction-to-stretched-clusters/vsan-stretched-clusters-networking-design.html

    -------------------------------------------