VMware vSphere

 View Only
  • 1.  esxcli nvme fabrics discover failed

    Posted Dec 17, 2020 03:10 AM

    Hello, We are testing NVMe-oF with EXSI 7. I am attempting to discover a SPDK NVMe-oF target using "esxcli nvme fabrics discover" command, but failed:

    [root@localhost:~] esxcli nvme fabrics discover -a vmhba32 -i 10.251.32.216 -p 44200
    Unable to find transport address

    The "dmesg" shows:

    2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)World: 12458: VC opID esxcli-3d-4a53 maps to vmkernel opID 791a7c62
    2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMEDEV:711 Controller 265 allocated, maximum queues 16
    2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:110 Controller 265, connecting using parameters:
    2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:112 kato: 0
    2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:113 subtype: 1
    2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:114 vmkParams.asqsize: 16
    2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:115 vmkParams.cntlid: 0xffff
    2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:117 vmkParams.hostid: 5fd04702-15a6-343b-3582-6c92bf556abb
    2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:118 vmkParams.hostnqn: nqn.2014-08.org.nvmexpress:uuid:5fd04702-15a6-343b-3582-6c92bf556abb
    2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:119 vmkParams.subnqn: nqn.2014-08.org.nvmexpress.discovery
    2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:121 vmkParams.trType: RDMA
    2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:123 vmkParams.trsvcid: 44200
    2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:125 vmkParams.traddr: 10.251.32.216
    2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:127 vmkParams.tsas:
    2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)nvmerdma:1020 vmhba32, controller 265
    2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)nvmerdma:814 [ctlr 265, queue 0] cqsize 16
    2020-12-17T03:07:21.768Z cpu2:526767 opID=791a7c62)nvmerdma:1526 [ctlr 265, queue 0]
    2020-12-17T03:07:21.768Z cpu9:527915)nvmerdma:2150 [ctlr 265, queue 0]
    2020-12-17T03:07:21.768Z cpu2:526767 opID=791a7c62)nvmerdma:300 [ctlr 265, queue 0]
    2020-12-17T03:07:21.768Z cpu2:526767 opID=791a7c62)nvmerdma:1126 [ctlr 265, queue 0]
    2020-12-17T03:07:21.768Z cpu5:527916)nvmerdma:693 [ctlr 265, queue 0]
    2020-12-17T03:07:21.788Z cpu3:524823)nvmerdma:734 [ctlr 265, queue 0] event 0
    2020-12-17T03:07:21.788Z cpu3:524823)nvmerdma:1722 [ctlr 265, queue 0]
    2020-12-17T03:07:21.794Z cpu1:524871)nvmerdma:734 [ctlr 265, queue 0] event 2
    2020-12-17T03:07:21.794Z cpu1:524871)nvmerdma:1797 [ctlr 265, queue 0]
    2020-12-17T03:07:21.806Z cpu4:524833)nvmerdma:734 [ctlr 265, queue 0] event 9
    2020-12-17T03:07:21.806Z cpu4:524833)nvmerdma:1839 [ctlr 265, queue 0]
    2020-12-17T03:07:21.806Z cpu4:524833)nvmerdma:1843 [ctlr 265, queue 0] connected
    2020-12-17T03:07:21.806Z cpu2:526767 opID=791a7c62)nvmerdma:259 [ctlr 265, queue 0] connected
    2020-12-17T03:07:21.806Z cpu2:526767 opID=791a7c62)nvmerdma:1113 [ctlr 265] connected successfully

    2020-12-17T03:07:21.806Z cpu2:526767 opID=791a7c62)NVMFDEV:1108 controller 265, queue 0
    2020-12-17T03:07:21.807Z cpu2:526767 opID=791a7c62)NVMFDEV:1181 Connected to queue 0, successfully
    2020-12-17T03:07:21.807Z cpu2:526767 opID=791a7c62)NVMFDEV:1068 Controller 0x4313afc18880, set ctlrID to 1
    2020-12-17T03:07:21.807Z cpu2:526767 opID=791a7c62)NVMFDEV:187 Adding new controller nqn.2014-08.org.nvmexpress.discovery to active list
    2020-12-17T03:07:21.807Z cpu2:526767 opID=791a7c62)NVMEDEV:2408 disabling controller...
    2020-12-17T03:07:21.807Z cpu2:526767 opID=791a7c62)NVMEDEV:2417 enabling controller...
    2020-12-17T03:07:21.807Z cpu2:526767 opID=791a7c62)NVMEDEV:590 Controller 265, queue 0, set queue size to 16
    2020-12-17T03:07:21.807Z cpu2:526767 opID=791a7c62)NVMEDEV:2425 reading version register...
    2020-12-17T03:07:21.807Z cpu2:526767 opID=791a7c62)NVMEDEV:2439 get controller identify data...
    2020-12-17T03:07:46.929Z cpu4:524833)nvmerdma:734 [ctlr 265, queue 0] event 10
    2020-12-17T03:07:46.929Z cpu4:524833)nvmerdma:1893 [ctlr 265, queue 0]
    2020-12-17T03:07:46.929Z cpu4:524833)nvmerdma:1897 [ctlr 265, queue 0] disconnected due to CM event 10
    2020-12-17T03:07:46.929Z cpu5:527916)nvmerdma:702 Queue 0 disconnect world wakes up: Success
    2020-12-17T03:07:46.929Z cpu5:527916)nvmerdma:542 [ctlr 265, queue 0]
    2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c038d0 op 0x0 status 0x5
    2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c038e0 op 0x0 status 0x5
    2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c038f0 op 0x80 status 0x5
    2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c03900 op 0x0 status 0x5
    2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c03910 op 0x0 status 0x5
    2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c03920 op 0x0 status 0x5
    2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c03930 op 0x0 status 0x5
    2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c03940 op 0x0 status 0x5
    2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c03850 op 0x0 status 0x5
    2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c03860 op 0x0 status 0x5
    2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c03870 op 0x0 status 0x5
    2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c03880 op 0x0 status 0x5
    2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c03890 op 0x0 status 0x5
    2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c038a0 op 0x0 status 0x5
    2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c038b0 op 0x0 status 0x5
    2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c038c0 op 0x0 status 0x5
    2020-12-17T03:07:46.933Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0xfffffffffffffff2 op 0x0 status 0x5
    2020-12-17T03:07:46.933Z cpu5:527916)nvmerdma:521 [ctlr 265, queue 0] cleanup vmkCmd 0x453a44bffb80[0], status 0x80d
    2020-12-17T03:07:46.933Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0xfffffffffffffff1 op 0x0 status 0x5
    2020-12-17T03:07:46.933Z cpu2:526767 opID=791a7c62)WARNING: NVMEDEV:2446 Failed to get controller identify data, status: Failure
    2020-12-17T03:07:46.933Z cpu2:526767 opID=791a7c62)nvmerdma:2582 [ctlr 265, queue 0] cmd 0x453a44bffb80, queue not connected: Failure
    2020-12-17T03:07:46.933Z cpu2:526767 opID=791a7c6

     

     



  • 2.  RE: esxcli nvme fabrics discover failed

    Posted Dec 17, 2020 03:14 AM

    And on a linux host, the discovery succeed:

    ./nvme discover -t rdma -a 10.251.32.216 -s 44200

    Discovery Log Number of Records 1, Generation counter 3
    =====Discovery Log Entry 0======
    trtype: rdma
    adrfam: ipv4
    subtype: nvme subsystem
    treq: not required
    portid: 0
    trsvcid: 44200
    subnqn: nqn.2016-06.io.spdk:cnode
    traddr: 10.251.32.216
    rdma_prtype: not specified
    rdma_qptype: connected
    rdma_cms: rdma-cm
    rdma_pkey: 0x0000



  • 3.  RE: esxcli nvme fabrics discover failed

    Posted Feb 11, 2021 09:34 AM

    I am experiencing the same issue.

    2021-02-11T09:14:52.635Z cpu8:1051530 opID=860873e4)nvmerdma:1399 vmhba66, controller 298
    2021-02-11T09:14:52.656Z cpu23:1049453)nvmerdma:902 [ctlr 298, queue 0] event 0
    2021-02-11T09:14:52.660Z cpu20:1049573)nvmerdma:902 [ctlr 298, queue 0] event 2
    2021-02-11T09:14:52.664Z cpu20:1049493)nvmerdma:902 [ctlr 298, queue 0] event 9
    2021-02-11T09:14:52.664Z cpu20:1049493)nvmerdma:2457 [ctlr 298, queue 0] connected
    2021-02-11T09:14:52.664Z cpu8:1051530 opID=860873e4)nvmerdma:1492 [ctlr 298] connected successfully
    2021-02-11T09:14:52.664Z cpu8:1051530 opID=860873e4)NVMFDEV:1837 controller 298, queue 0
    2021-02-11T09:14:52.665Z cpu8:1051530 opID=860873e4)NVMFDEV:1910 Connected to queue 0, successfully
    2021-02-11T09:14:52.665Z cpu8:1051530 opID=860873e4)NVMFDEV:1701 Controller 0x431414a15540, set ctlrID to 32769
    2021-02-11T09:14:52.665Z cpu8:1051530 opID=860873e4)NVMFDEV:630 Adding new controller to target active list: nqn.2014-08.org.nvmexpress.discovery
    2021-02-11T09:14:52.665Z cpu8:1051530 opID=860873e4)NVMEDEV:2895 disabling controller...
    2021-02-11T09:15:00.169Z cpu6:1051530 opID=860873e4)WARNING: NVMEDEV:2391 Controller cannot be disabled, status: Timeout
    2021-02-11T09:15:00.169Z cpu6:1051530 opID=860873e4)WARNING: NVMEDEV:2899 Failed to disable controller, status: Timeout
    2021-02-11T09:15:00.169Z cpu6:1051530 opID=860873e4)WARNING: NVMEDEV:4609 Failed to initialize controller, status: Timeout.
    2021-02-11T09:15:00.169Z cpu6:1051530 opID=860873e4)WARNING: NVMFDEV:846 Failed to register controller 298, status: Timeout
    2021-02-11T09:15:00.169Z cpu6:1051530 opID=860873e4)nvmerdma:1533 [ctlr 298]
    2021-02-11T09:15:00.171Z cpu20:1049493)nvmerdma:902 [ctlr 298, queue 0] event 10
    2021-02-11T09:15:00.171Z cpu0:1286360)nvmerdma:2943 [ctlr 298, queue 0] Beacon completion succeeded, wrid 0xfffffffffffffff2 op 0x80 status 0x5
    2021-02-11T09:15:00.196Z cpu1:1286361)nvmerdma:867 [ctlr 298, queue 0] disconnect world dying, exit.
    2021-02-11T09:15:00.201Z cpu6:1051530 opID=860873e4)nvmerdma:1568 controller 298 disconnected
    2021-02-11T09:15:00.201Z cpu6:1051530 opID=860873e4)NVMEDEV:1095 Ctlr 298 freeing
    2021-02-11T09:15:00.201Z cpu6:1051530 opID=860873e4)NVMEDEV:6117 Cancel requests of controller 298, 0 left.
    2021-02-11T09:15:00.201Z cpu6:1051530 opID=860873e4)WARNING: NVMFDEV:1300 Failed to connect to controller, status: Timeout
    2021-02-11T09:15:00.201Z cpu6:1051530 opID=860873e4)WARNING: NVMFVSI:1074 Failed to discover controllers, status: Timeout
    [root@localhost:~]



  • 4.  RE: esxcli nvme fabrics discover failed

    Posted Apr 30, 2023 06:34 AM

    Were you able to solve this issue ?



  • 5.  RE: esxcli nvme fabrics discover failed

    Posted Apr 18, 2023 01:47 PM

    Hi were you able to solve this issue  ?

    I'm facing the same issue with ESX 7.0U3

     

    [root@localhost:~] esxcli nvme fabrics discover -a vmhba68 -i 192.168.18.4 -p 8009
    Unable to find transport address
    [root@localhost:~]

    023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)World: 12077: VC opID esxcli-17-75c4 maps to vmkernel opID a38002cf
    2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMEDEV:1393 Ctlr 312 allocated, maximum queues 16
    2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:159 Controller 312, connecting using parameters:
    2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:161 kato: 0
    2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:162 subtype: 1
    2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:163 cdc: 0
    2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:166 target type: NVMe
    2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:174 vmkParams.asqsize: 32
    2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:175 vmkParams.cntlid: 0xffff
    2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:177 vmkParams.hostid: 63d7700b-ee42-c554-5e2a-98be942a2a1a
    2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:178 vmkParams.hostnqn: nqn.2014-08.org.nvmexpress:uuid:63d7700b-ee42-c554-5e2a-98be942a2a1a
    2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:179 vmkParams.subnqn: nqn.2014-08.org.nvmexpress.discovery
    2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:196 vmkParams.trType: TCP
    2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:198 vmkParams.trsvcid: 8009
    2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:200 vmkParams.traddr: 192.168.18.4
    2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:202 vmkParams.tsas.digest: 0
    2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)nvmetcp:nt_ConnectController:781 vmhba68, controller 312
    2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)nvmetcp:nt_ConnectCM:4408 [ctlr 312, queue 0]
    2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFNET:151 Uplink: vmnic4, portset: vswitch_nvme_TCP1.
    2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)nvmetcp:nt_ConnectCM:4460 [ctlr 312, queue 0] Using source vmknic vmk1 for socket binding
    2023-04-18T14:34:10.771Z cpu20:1051704 opID=a38002cf)nvmetcp:nt_SocketConnect:4339 [ctlr 312, queue 0] Failed to connect socket: Failure
    2023-04-18T14:34:10.771Z cpu20:1051704 opID=a38002cf)nvmetcp:nt_ConnectQueueInt:4129 [ctlr 312, queue 0] failed to connect: Failure
    2023-04-18T14:34:10.771Z cpu20:1051704 opID=a38002cf)nvmetcp:nt_FreeSubmissionResources:5189 [ctlr 312, queue 0]
    2023-04-18T14:34:10.771Z cpu20:1051704 opID=a38002cf)nvmetcp:nt_ConnectController:860 Failed to connect admin queue: Failure
    2023-04-18T14:34:10.771Z cpu20:1051704 opID=a38002cf)WARNING: NVMFDEV:882 Failed to transport connect controller 312, status: Failure
    2023-04-18T14:34:10.771Z cpu20:1051704 opID=a38002cf)NVMEDEV:1565 Ctlr 312 freeing
    2023-04-18T14:34:10.771Z cpu20:1051704 opID=a38002cf)NVMEDEV:9057 Cancel requests of controller 312, 0 left.
    2023-04-18T14:34:10.771Z cpu20:1051704 opID=a38002cf)WARNING: NVMFDEV:1432 Failed to connect to controller, status: Failure
    2023-04-18T14:34:10.771Z cpu20:1051704 opID=a38002cf)WARNING: NVMFEVT:1773 Failed to discover controllers, status: Failure
    2023-04-18T14:34:10.771Z cpu20:1051704 opID=a38002cf)WARNING: NVMFEVT:1456 Discover and connect controller failed: Failure
    2023-04-18T14:34:10.771Z cpu20:1051704 opID=a38002cf)WARNING: NVMFVSI:1300 Failed to discover controllers: Failure

    However a discovery from a linux box is successful

    └──╼ nvme discover --transport=tcp --traddr=192.168.18.4 --host-traddr=192.168.18.5 --trsvcid=8009

    Discovery Log Number of Records 1, Generation counter 18
    =====Discovery Log Entry 0======
    trtype: tcp
    adrfam: ipv4
    subtype: nvme subsystem
    treq: not specified, sq flow control disable supported
    portid: 1
    trsvcid: 8009
    subnqn: nqn.2014-08.org.nvmexpress:uuid:03000200-0400-0500-0006-000700080009
    traddr: 192.168.18.4
    sectype: none