vSAN1

 View Only

Troubleshooting RDMA RoCEv2 setup

  • 1.  Troubleshooting RDMA RoCEv2 setup

    Posted Jan 08, 2026 11:00 AM

    Hi everyone,

    I'm currently trying to set up a point-to-point RDMA RoCEv2 connection between two nodes, but I'm hitting some roadblocks. I'm hoping someone can point me in the right direction.

    Issue:

    rping command doesn't work.

    On the server side:
    sudo rping -s -d -a 30.30.1.1 -Vv -C 3

    validate data
    verbose
    count 3
    created cm_id 0x651d5c7b8c80
    rdma_bind_addr successful
    rdma_listen

    On the client side:
    sudo rping -c -d -a 30.30.1.1 -I 30.30.1.2 -Vv -C 3
    validate data
    verbose
    count 3
    created cm_id 0x5db441510c50
    cma_event type RDMA_CM_EVENT_ADDR_RESOLVED cma_id 0x5db441510c50 (parent)
    cma_event type RDMA_CM_EVENT_ROUTE_RESOLVED cma_id 0x5db441510c50 (parent)
    rdma_resolve_addr - rdma_resolve_route successful
    created pd 0x5db4415054b0
    created channel 0x5db441505470
    created cq 0x5db441510fe0
    created qp 0x5db441511180
    rping_setup_buffers called on cb 0x5db4415037c0
    allocated & registered buffers...
    cq_thread started.
    cma_event type RDMA_CM_EVENT_UNREACHABLE cma_id 0x5db441510c50 (parent)
    cma event RDMA_CM_EVENT_UNREACHABLE, error -110
    wait for CONNECTED state 4
    connect error -1

    I tried also the ucmatose command with the same result

    On both nodes the RDMA link seems active:
    rdma link show
    link roceo12399/1 state ACTIVE physical_state LINK_UP netdev eno12399np0 
    link roceo12409/1 state ACTIVE physical_state LINK_UP netdev eno12409np1 

    Note:

    The ibv_rc_pingpong command works.

     
    On the server side:
    sudo ibv_rc_pingpong -d roceo12409 -i 1 -g 3
     local address:  LID 0x0000, QPN 0x000084, PSN 0x81ec1a, GID ::ffff:30.30.1.1
     remote address: LID 0x0000, QPN 0x00008a, PSN 0x6d2f08, GID ::ffff:30.30.1.2
    8192000 bytes in 0.02 seconds = 4068.79 Mbit/sec
    1000 iters in 0.02 seconds = 16.11 usec/iter

    On the client side:

    sudo ibv_rc_pingpong -d roceo12409 -i 1 -g 3 30.30.1.1
    local address:  LID 0x0000, QPN 0x00008a, PSN 0x6d2f08, GID ::ffff:30.30.1.2
    remote address: LID 0x0000, QPN 0x000084, PSN 0x81ec1a, GID ::ffff:30.30.1.1
    8192000 bytes in 0.02 seconds = 4137.63 Mbit/sec
    1000 iters in 0.02 seconds = 15.84 usec/iter

    Hardware/Environment:

    • NICs: BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller

    • OS/Kernel: Ubuntu 24.04, Kernel 6.8.0-90-generic

    • Drivers: bnxt_en

    • Firmware version: 227.0.134.0/pkg 22.71.11.13

    Any advice on debugging would be greatly appreciated!



    -------------------------------------------