Hi everyone,
I'm currently trying to set up a point-to-point RDMA RoCEv2 connection between two nodes, but I'm hitting some roadblocks. I'm hoping someone can point me in the right direction.
Issue:
rping command doesn't work.
On the server side:
sudo rping -s -d -a 30.30.1.1 -Vv -C 3
validate data
verbose
count 3
created cm_id 0x651d5c7b8c80
rdma_bind_addr successful
rdma_listen
On the client side:
sudo rping -c -d -a 30.30.1.1 -I 30.30.1.2 -Vv -C 3
validate data
verbose
count 3
created cm_id 0x5db441510c50
cma_event type RDMA_CM_EVENT_ADDR_RESOLVED cma_id 0x5db441510c50 (parent)
cma_event type RDMA_CM_EVENT_ROUTE_RESOLVED cma_id 0x5db441510c50 (parent)
rdma_resolve_addr - rdma_resolve_route successful
created pd 0x5db4415054b0
created channel 0x5db441505470
created cq 0x5db441510fe0
created qp 0x5db441511180
rping_setup_buffers called on cb 0x5db4415037c0
allocated & registered buffers...
cq_thread started.
cma_event type RDMA_CM_EVENT_UNREACHABLE cma_id 0x5db441510c50 (parent)
cma event RDMA_CM_EVENT_UNREACHABLE, error -110
wait for CONNECTED state 4
connect error -1
I tried also the ucmatose command with the same result
On both nodes the RDMA link seems active:
rdma link show
link roceo12399/1 state ACTIVE physical_state LINK_UP netdev eno12399np0
link roceo12409/1 state ACTIVE physical_state LINK_UP netdev eno12409np1
Note:
The ibv_rc_pingpong command works.
On the server side:
sudo ibv_rc_pingpong -d roceo12409 -i 1 -g 3
local address: LID 0x0000, QPN 0x000084, PSN 0x81ec1a, GID ::ffff:30.30.1.1
remote address: LID 0x0000, QPN 0x00008a, PSN 0x6d2f08, GID ::ffff:30.30.1.2
8192000 bytes in 0.02 seconds = 4068.79 Mbit/sec
1000 iters in 0.02 seconds = 16.11 usec/iter
On the client side:
sudo ibv_rc_pingpong -d roceo12409 -i 1 -g 3 30.30.1.1
local address: LID 0x0000, QPN 0x00008a, PSN 0x6d2f08, GID ::ffff:30.30.1.2
remote address: LID 0x0000, QPN 0x000084, PSN 0x81ec1a, GID ::ffff:30.30.1.1
8192000 bytes in 0.02 seconds = 4137.63 Mbit/sec
1000 iters in 0.02 seconds = 15.84 usec/iter
Hardware/Environment:
NICs: BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller
OS/Kernel: Ubuntu 24.04, Kernel 6.8.0-90-generic
Drivers: bnxt_en
Firmware version: 227.0.134.0/pkg 22.71.11.13
Any advice on debugging would be greatly appreciated!
-------------------------------------------