VMware vSphere

 View Only
Expand all | Collapse all

HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

  • 1.  HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Aug 15, 2018 09:03 AM

    I have come across an interesting issue with a new HPE platform. The system is running within a C7000 BladeSystem, with BL460c Gen9 blades.

    We have noticed some degradation in performance on iSCSI connection (using the Software iSCSI initiator), this traffic runs over vmnic1 and vmnic2 details from the NIC list are below.

    vmnic1  0000:06:00.1  elxnet  Up        Up       10000  Full32:a6:05:e0:00:be  1500  Emulex Corporation HPE FlexFabric 20Gb 2-port 650FLB Adapter
    vmnic2  0000:06:00.2  elxnet  Up        Up       10000  Full32:a6:05:e0:00:bd  1500  Emulex Corporation HPE FlexFabric 20Gb 2-port 650FLB Adapter

    Each NIC is reporting at 10000 Mb full, however I am not able to set the speed on the ESXi server. vmnic1 reports the following for advertised link modes;

    [root@ESX:~] esxcli network nic get -n vmnic1

       Advertised Auto Negotiation: true

       Advertised Link Modes: 1000BaseKR2/Full, 10000BaseKR2/Full, 20000BaseKR2/Full, Auto

       Auto Negotiation: true

    Where as vmnic2 reports the following modes

    [root@ESXi2b-14:~] esxcli network nic get -n vmnic2

       Advertised Auto Negotiation: false

       Advertised Link Modes: 20000None/Full

       Auto Negotiation: false

    Confused, the settings are identical for these within OneView. Both NIC's are using firmware - 12.0.1110.11 from SPP 2018.06.0. The HPE ESXi image has been used including driver version 12.0.1115.0 which shows as being compatible on the comparability guide VMware Compatibility Guide - I/O Device Search.

    Has anyone else seen this issue? If I try and manually set the speed/duplex settings via esxcli it fails with the following error in the vmkernel.log

    2018-08-14T23:49:41.361Z cpu20:65677)WARNING: elxnet: elxnet_linkStatusSet:7471: [vmnic2] Device is not privileged to do speed changes

    As a result of this when using HCIBench to test the storage throughput the 95%tile_LAT value is reading excessively when traversing vmnic2 - 95%tile_LAT = 3111.7403 ms

    Any thoughts??



  • 2.  RE: HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Aug 15, 2018 09:59 AM

    Interesting :smileyhappy: Can you share the complete output of the below commands?

    esxcli network nic get -n vmnic1

    esxcli network nic get -n vmnic2

    Cheers,

    Supreet



  • 3.  RE: HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Aug 15, 2018 10:03 AM

    Sure thing.

    [root@ESX:~] esxcli network nic get -n vmnic1

       Advertised Auto Negotiation: true

       Advertised Link Modes: 1000BaseKR2/Full, 10000BaseKR2/Full, 20000BaseKR2/Full, Auto

       Auto Negotiation: true

       Cable Type:

       Current Message Level: 4631

       Driver Info:

             Bus Info: 0000:06:00:1

             Driver: elxnet

             Firmware Version: 12.0.1110.11

             Version: 12.0.1115.0

       Link Detected: true

       Link Status: Up by explicit linkSet

       Name: vmnic1

       PHYAddress: 1

       Pause Autonegotiate: true

       Pause RX: true

       Pause TX: true

       Supported Ports:

       Supports Auto Negotiation: true

       Supports Pause: true

       Supports Wakeon: true

       Transceiver: external

       Virtual Address: 00:50:56:59:d7:63

       Wakeon: MagicPacket(tm)

    [root@ESX:~] esxcli network nic get -n vmnic2

       Advertised Auto Negotiation: false

       Advertised Link Modes: 20000None/Full

       Auto Negotiation: false

       Cable Type:

       Current Message Level: 4631

       Driver Info:

             Bus Info: 0000:06:00:2

             Driver: elxnet

             Firmware Version: 12.0.1110.11

             Version: 12.0.1115.0

       Link Detected: true

       Link Status: Up by explicit linkSet

       Name: vmnic2

       PHYAddress: 0

       Pause Autonegotiate: true

       Pause RX: true

       Pause TX: true

       Supported Ports:

       Supports Auto Negotiation: false

       Supports Pause: true

       Supports Wakeon: false

       Transceiver: external

       Virtual Address: 00:50:56:58:05:51

       Wakeon: None

    Really hoping that this isn't something simple that I have missed.

    Thanks, Ben.



  • 4.  RE: HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Aug 15, 2018 10:46 AM

    I also tried to set the interface to 10Gb Full via esxcli;

    esxcli network nic set -n vmnic2 -S 10000 -D full

    It failed as expected;

    2018-08-15T10:26:55.023Z cpu17:68364 opID=e4ebaba5)Uplink: 14445: Setting speed/duplex to (10000 FULL) on vmnic2.

    2018-08-15T10:26:55.024Z cpu47:65677)WARNING: elxnet: elxnet_linkStatusSet:7419: [vmnic2] Speed 10000 is not supported on this phy interface (0xc)

    I have a case open with HPE on this too, interesting indeed.



  • 5.  RE: HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Aug 15, 2018 10:49 AM

    Per my understanding, below could be the issue here -

    esxcli network nic get -n vmnic1          

      Bus Info: 0000:06:00:1 --> PF 1

    esxcli network nic get -n vmnic2

      Bus Info: 0000:06:00:2 --> PF 2

     

    In case of a multi-channel mode, same physical port will be shared among multiple PFs. PF-1 could be the primary PF and PF-2 could be treated as non-primary PF.

    Emulex firmware might not be allowing the non-primary PFs to modify the port level settings such as auto-negotiate, etc.

    This is to avoid multiple PFs choosing different settings which is not possible since, the physical port is same. And this is why we might be seeing the below error in the logs -

    2018-08-14T23:49:41.361Z cpu20:65677)WARNING: elxnet: elxnet_linkStatusSet:7471: [vmnic2] Device is not privileged to do speed changes

    Good that you have already involved HPE on this. I would be very eager to know what they have to say about this :smileyhappy:

    Please consider marking this answer as "correct" or "helpful" if you think your questions have been answered.

    Cheers,

    Supreet



  • 6.  RE: HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Aug 17, 2018 01:48 PM

    Thanks for the input so far Supreet.

    In our case vmnic1 and vmnic 2 will be using 2 different physical ports, as they are leaving the chassis via different interconnects.

    Still chasing HPE with this, sending a nice collection of log files over to them for this now. I'll keep you posted with their response.

    Cheers, Ben.



  • 7.  RE: HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Aug 17, 2018 06:27 PM

    Ahh! Will be eagerly waiting to know how this pans out :smileyhappy:

    Cheers,

    Supreet



  • 8.  RE: HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Aug 18, 2018 12:52 PM

    Just guessing.

    • Which network interconnect modules (type/model) do you use in the C7000 chassis?
    • You mentioned "SPP 2018.06.0". Has Virtual Connect already been updated to firmware 4.62 (if applicable)?
    • Do both ports - to which the BL460c is connected - have the same VC profile assigned?

    André



  • 9.  RE: HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Aug 20, 2018 08:00 AM

    Morning André

    • We have HP VC FlexFabric-20/40 F8 Module installed in the C7000.
    • Yes, the firmware is running 4.62.
    • The ports have different profiles, assuming I am reading OneView correctly. The profiles are identical with the exception of the member ports in the uplink sets. One profile is using interconnect 1 X8 where as the other profile is using interconnect 2 X8.

    Cheers, Ben.



  • 10.  RE: HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Aug 30, 2018 10:22 AM

    I have had an interesting development in this, I thought I would share.

    HPE are working on this now, I don't expect there to be a resolution to this any time soon though.

    We are using the HPE customised ESXi 6.5 U2 image, which includes elxnet driver version 12.0.1115.0.

       Driver Info:

             Bus Info: 0000:06:00:2

             Driver: elxnet

             Firmware Version: 12.0.1110.11

             Version: 12.0.1115.0

    Running this version of the driver, the NIC doesn’t list all the correct speeds at advertisement.

    [root@ESXi1a-21:~] esxcli network nic get -n vmnic2

       Advertised Auto Negotiation: false

       Advertised Link Modes: 20000None/Full

       Auto Negotiation: false

    Although other NIC's on the same host display the correct speed advertisements.

    [root@ESXi1a-21:~] esxcli network nic get -n vmnic1

       Advertised Auto Negotiation: true

       Advertised Link Modes: 1000BaseKR2/Full, 10000BaseKR2/Full, 20000BaseKR2/Full, Auto

       Auto Negotiation: true

    If I install ESXi 6.5 U2 via a direct download from VMware, this installs elxnet driver version 11.1.91.0.

       Driver Info:

             Bus Info: 0000:06:00:2

             Driver: elxnet

             Firmware Version: 12.0.1110.11

             Version: 11.1.91.0

    Running this version of the driver, the NIC doesn’t list all the correct speeds at advertisement.

    [root@localhost:~] esxcli network nic get -n vmnic2

       Advertised Auto Negotiation: false

       Advertised Link Modes: 20000None/Full

       Auto Negotiation: false

    If I use the HPE 6.0 U3 image this installs elxnet driver version 12.0.1115.0 which exhibits the same issue as the 6.5 U2 image.

    Now for the interesting part. If I install ESXi 6.0 U3 natively from the VMware website elxnet driver version 10.2.309.6v is included.

       Driver Info:

             Bus Info: 0000:06:00:2

             Driver: elxnet

             Firmware Version: 12.0.1110.11

             Version: 10.2.309.6v

    This driver version reports the correct available speeds.

    [root@localhost:~] esxcli network nic get  -n vmnic2

       Advertised Auto Negotiation: true

       Advertised Link Modes: 1000baseT/Full, 10000baseT/Full, 20000baseT/Full

       Auto Negotiation: false

    Nothing else has changed at all on the system, other than the ESXi image that has been used.

    I'm curious if anyone else has ever come across this issue, it seems to be a potential driver issue but I don't understand how, if this is a driver issue it hasn't been noticed in the past.



  • 11.  RE: HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Aug 30, 2018 11:23 AM

    Very interesting :smileyhappy: What if you install the latest VMware native driver on 6.5 U2? Does the issue persist? This is just to isolate if it is a problem with all the versions of elxnet async driver.

    Cheers,

    Supreet



  • 12.  RE: HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Aug 30, 2018 11:39 AM

    Certainly does Supreet.

    I even for the sake of playing devils advocate installed 6.7, and the native driver in there also exhibits the same problem.

    HPE are due to get access to lab hardware today/tomorrow to start replicating. I'll keep you posted!



  • 13.  RE: HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Aug 30, 2018 11:54 AM

    Would love to know the end of this :smileyhappy: Thank you for keeping us posted.

    Cheers,

    Supreet



  • 14.  RE: HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Sep 13, 2018 01:49 PM

    I am seeing the same thing, let me know what you find out.  This is driving me nuts not being able to have a consistent host profile.  FYI I am using:

       Driver: elxnet

             Firmware Version: 12.0.1110.11

             Version: 11.4.1205.0



  • 15.  RE: HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Sep 20, 2018 11:16 AM

    Interesting, good to hear that we are not alone with this issue. HPE have gone very quite on this one at the moment, will keep the thread up to date though as and when updates come through.



  • 16.  RE: HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Sep 20, 2018 08:34 PM

    I have some progress from HPE!

    They have now been able to replicate the fault and have acknowledged that this could well be a driver issue :smileyhappy:

    The issue has now been escalated from the L2 engineers to the L3 engineers for further testing. They have also said that they will be looking for other customers that have reported this issue globally. If anyone has this issue, please log a support request with HPE, drop me an email/message on the VMTN and I'll pass you the incident to reference this with HPE as well so they can tie them together.

    Cheers, Ben.



  • 17.  RE: HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Oct 15, 2018 08:27 PM

    Hello I have this same issue but we do not use iscsi, instead we use fc. We experience disconnections from our redundant paths to our SAN and only the hosts with the hardware from the title are affected. Any updates from HP?



  • 18.  RE: HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Oct 16, 2018 12:07 PM

    Still very much a work in progress with HPE at the moment. Still pushing them, latest is they need to work with VMware and the hardware vendor with the potential of a new driver to be developed.

    Interesting to hear that I am not the only one seeing this issue. If you have the ability to log this with HPE then more cases with the same issue with strengthen the case.

    I will keep this thread up to date with anything useful though when/if it arises.



  • 19.  RE: HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Nov 16, 2018 02:59 PM

    Any word from HPE on this? We are running into the same issue. Interesting little twist to what we are seeing is that if we put a significant amount of load on vmnic2 or vmnic3 the links will drop completely. HPE hasn't been able to pin down the issue for us yet.



  • 20.  RE: HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Nov 16, 2018 03:46 PM

    I have had some feedback, but nothing of any significance. They then also proceeded to close my case, despite my request for more time to test this out and produce more evidence as while there seems to be some merit to the details below - it doesn't answer why I see the latency.

    --------------------------------------------------------------------------------------------------------------------------------------------------
    This behaviour is expected in case of multi-channel modes.

    The same physical port will be shared among multiple logical functions in case of a multi-channel mode.

    For example, Port #A is associated with even numbered logical functions (i.e. 0,2,4,6, etc).
    and Port #B is associated with odd numbered logical functions (i.e. 1,3,5,7, etc.).

    Emulex Firmware design is such that, only primary logical functions (i.e. logical function 0 for Port #A and logical function 1 for Port #B) are privileged to modify Port level features like PortSpeed, Autoneg, etc..

    This is to avoid multiple logical functions choosing different settings which is not possible since the physical port is same.

    That is the reason that the driver is not advertising negotiation for those non-primary logical functions.

    ---------------------------------------------------------------------------------------------------------------------------------------------------

    I have pulled a server from the cluster and intend on doing so further testing on this. However, recent workload has prevented focus on this issue but I am hoping to have some time for testing on this in the next week or so.



  • 21.  RE: HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Jan 31, 2019 01:43 PM

    Did you ever find a resolution for this issue?  I'm moving some BL460c Gen9 hosts with the 650FLB to enclosures with the VCFF 20/40 F8 modules and am curious if I'll start seeing the same connectivity issues.  I'll be defining (3) connections for Management, vMotion & VM traffic and (2) for FC SAN.



  • 22.  RE: HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Feb 01, 2019 08:18 PM

    Not OP but we did end up resolving the issue by changing the port speed on our enclosure uplinks to auto negotiate instead of statically assigned. This was a change by our network team on the virtual port channel configuration. Haven't had any connectivity issues since.



  • 23.  RE: HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

    Posted Feb 28, 2019 11:28 PM

    We started seeing PSODs with the BL460c Gen9 running 6.5 Update 1 and HPE identified an issue with the 650FLB driver in the HPE custom image.  We had only just started migrating to G3 enclosures and leveraging the 650FLB as a CNA.  We got the PSOD with the hosts under a relatively heavy load.  HPE recommended we upgrade to 6.5 Update 2, but confirmed the driver issue was resolved with 6.7 Update 1, so we rebuilt our BL460 Gen9s with HPE custom image VMware-ESXi-6.7.0-Update1-10302608-HPE-Gen9plus-670.U1.10.3.5.12-Oct2018.

    We also fully patched them afterwards, which dropped down additional 9 patches, so our hosts are showing hypervisor VMware ESXi, 6.7.0, 11675023.  Knock on wood, no PSOD since then and I'm just about done migrating the BL460c Gen9 hosts off my original G1 enclosures.