vSAN1

 View Only
Expand all | Collapse all

Write latency and network errors

MichaelGi

MichaelGiJul 17, 2017 01:55 PM

rafficvmware

rafficvmwareMar 14, 2018 02:56 AM

  • 1.  Write latency and network errors

    Posted Mar 08, 2017 07:46 PM

    I'm trying to troubleshoot a write performance issue and noticed that there is VSCSI write latency.  We have a 4 node cluster with dual 10GB links on brocade vdx switches.  The only thing I noticed is that there are some out of order rx errors in vsan observer during the time of the latency. Has anyone seen this issue before?  I'm attaching a screen shot from observer.



  • 2.  RE: Write latency and network errors

    Posted Mar 09, 2017 02:16 PM

    What exact version of ESXi and vSAN are you running?

    Per



  • 3.  RE: Write latency and network errors

    Posted Mar 09, 2017 02:20 PM

    We are running the following.

    VMware ESXi, 6.0.0, 4600944

    VSAN 6.2



  • 4.  RE: Write latency and network errors

    Broadcom Employee
    Posted Mar 10, 2017 08:39 AM

    There are 10GbE nics out there where recently a new driver was released that leads to improved performance. Also, 6.0 U3 has a bunch of improvements, which I would recommend.



  • 5.  RE: Write latency and network errors

    Posted Mar 13, 2017 01:35 PM

    Thank you very much for the advice. I updated to the latest firmware and drivers for the nic.  I also upgraded to 6.0 U3.  I'm still seeing the errors and latencies.  Overall, performance is good.  I'm just trying to understand some inconsistencies and where I may be falling short or having a bottleneck.  I'm attaching some more screen shots from observer while doing a large file copy within vsan.  Any advice or direction is greatly appreciated.



  • 6.  RE: Write latency and network errors

    Broadcom Employee
    Posted Mar 15, 2017 02:03 PM

    What does your configuration look like (Server)? And how are you testing performance? (file copy is usually not how people test performance in my experience.)



  • 7.  RE: Write latency and network errors

    Posted Mar 15, 2017 02:28 PM

    We are running a 4 node hybrid cluster with HP DL380 G9's.

    3 disk groups per host. Each group has a 200GB SSD HP MO0200JEFNV (Mainstream Endurance SFF) with 4 900GB 10k SAS magnetic disks per group. (EG0900JFCKB). P440 array controller.

    Flow control is enabled.  I did try disabling that before and it did not seem to have any effect.  I'm not seeing any congestion.  Write buffers are not filling up.  A colleague is noting a performance inconsistency when doing sql backups from local disk to local disk on a windows server.  We aren't having any problems with the environment. I'm just trying to see if there's anything that can be done to alleviate these concerns.  Is there a recommended performance testing tool for vsan?

    Thanks,

    Mike



  • 8.  RE: Write latency and network errors

    Posted Mar 22, 2017 03:53 PM

    HCI Bench is recommended for VSAN testing.

    HCIBench



  • 9.  RE: Write latency and network errors

    Posted Mar 13, 2017 08:59 PM

    Do you experience any congestion during the high write latency?

    You will also want to see how much of the write cache on the SSD is already used up during the latency event. This is easier to see using SexiGraf (this is a free tool, just search google), which will show all the SSDs stats for a cluster on one page. I think in VSAN observer, if you deep dive to the SSD one of the graphs will show this too but for a single SSD at a time.

    Just a theory but maybe the write buffer on the SSD fills up, then destaging to capacity disk becomes the bottleneck (this will show as congestion). Congestion will artificially introduce latency as a result which is maybe what you are seeing.



  • 10.  RE: Write latency and network errors

    Posted Mar 15, 2017 01:27 AM

    Possibly unrelated but maybe worth checking (as without detailed packet analysis I would not make any assumptions)

    - Is RX and TX Flow Control disabled as per best practice?

    p7-8 of:

    http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/whitepaper/products/vsan/vmware-virtual-san-network-design-guide-white-paper.pdf

    Check if is enabled using:

    # ethtool -a vmnic<#OfNic(s)InUseForVsanTrafficHere>

    Set to off:

    # ethtool --pause VMNic_Name tx off rx off

    More info:

    https://kb.vmware.com/kb/1013413

    (checked and set on every host in the vSAN cluster of course)

    Bob



  • 11.  RE: Write latency and network errors

    Posted Aug 04, 2017 03:59 PM

    "Possibly unrelated but maybe worth checking (as without detailed packet analysis I would not make any assumptions)

    - Is RX and TX Flow Control disabled as per best practice?"

    That very same design guide advises to keep it enabled  (pages 28 and 138). The document does not mention disabling it anywhere.



  • 12.  RE: Write latency and network errors

    Posted Aug 04, 2017 04:05 PM

    I've tried it both ways and we've still found the performance to be inconsistent.  We ended up getting a storage array to use for our high performance servers.



  • 13.  RE: Write latency and network errors

    Posted Aug 04, 2017 04:14 PM

    Hello Srodenburg,

    That link now points to the new Networking guide (which didn't exist at the time of my post), I can't seem to find a copy of it locally but maybe you can find one online somewhere if you want to clarify what it did/didn't say back then.

    Edit: Found it, google VMware® vSAN™ Network Design-OLD - VMware Storage Hub:

    "vSAN manages congestion by introducing artificial latency to prevent cache/buffer exhaustion. Since vSAN has built-in congestion management, disabling flow control on VMkernel interfaces tagged for vSAN traffic is recommended. Note Flow Control is enabled by default on all physical uplinks. For further information on Flow Control see KB1013413. VMware Recommends: Disable flow control for vSAN traffic."

    And yes, good point that the recommendation for this has changed, nowadays we only advise disabling this with the switch-vendors blessing.

    Bob



  • 14.  RE: Write latency and network errors

    Broadcom Employee
    Posted May 15, 2017 01:29 PM

    Hi MichaelGi

    Checking to see if you got to the bottom of this and curious to see what the resolution was to your high OOORx errors?



  • 15.  RE: Write latency and network errors

    Posted Jul 17, 2017 01:55 PM

    I never reached a resolution on this.



  • 16.  RE: Write latency and network errors

    Posted Mar 14, 2018 02:56 AM

    Did anyone got a solution for this?



  • 17.  RE: Write latency and network errors

    Posted Mar 23, 2018 02:20 PM

    Just wondering if you have checked the upstream switches and MTU settings on all the vmkernel nics. Mismatch MTUs may cause reties and network inconsistencies.



  • 18.  RE: Write latency and network errors

    Posted Mar 26, 2018 04:48 PM

    We are having the very same issue, albeit much worse. We are seeing latencies surpassing 1400 ms ( ! ) on a relatively empty 12-node VSAN stretched cluster (SR# 18750505903).  The link between sites is less than 30% used with >1ms latency. The issue was discovered when a SQL server w/ 1.5TB DB was migrated into the cluster and began having major application issues.

    VSAN 6.2 , ESXi 6.0.0 3620759.

    Cisco UCS C240M4 hardware with Enterprise-grade SAS SSD/HDDs.Cluster is completely symmetrical. Hosts consist on 2 disk groups of 8 disks. (1) 400GB Enterprise SAS SSD / (7) 1.2 TB 10K SAS HDD."  VSAN HCL validated multiple times for incorrect drivers, firmwares and even hardware. All check out.

    I'm not seeing any pause frames on the upstream UCS Fabric Interconnects. Flow Control is not configured either, nor does it appear to be configurable on the VIC 1227:

    [root@-------vsan-06:~] esxcli system module parameters list -m enic

    Name               Type  Value  Description

    -----------------  ----  -----  -------------------------------------------------------------------------

    heap_initial       int          Initial heap size allocated for the driver.

    heap_max           int          Maximum attainable heap size for the driver.

    skb_mpool_initial  int          Driver's minimum private socket buffer memory pool size.

    skb_mpool_max      int          Maximum attainable private socket buffer memory pool size for the driver.

    [root@-------vsan-06:~] ethtool -a vmnic5

    Pause parameters for vmnic5:

    Cannot get device pause settings: Operation not supported

    Per KB2146267 I tried disabling the dedup scanner but this did not improve anything. I also updated the pNIC drivers and that didn't help either.



  • 19.  RE: Write latency and network errors

    Posted Apr 03, 2018 07:43 PM

    Thanks LeslieBNS9. I believe we are experiencing similar causation.

    Instead of uplinking our UCS servers directly to switches first they connect to the Fabric Interconnects 6248s, which then uplink to Nexus 7010s via (2) 40GE vPCs. The Fabric Interconnects are discarding packets as evidenced by "show queuing interface" on all active vSAN interfaces. The manner in which we have vmnics situated in VMware (Active/Standby) Fabric B is effectively dedicated to VSAN traffic, and the cluster is idle so not a bandwidth issue or even contention, rather the FI's scrawny buffer assigned to custome QoS System Classes in UCS not able to handle bursts. We have QoS configured per the Cisco VSAN Reference doc​. Platinum CoS is assigned qos-group 2, which only has a queue/buffer size of 22720! NXOS in the UCS FIs is read-only so this is not configurable.

    I will probably disable Platinum QoS System Class and assigning VSAN vNICs to Best Effort so we can at least increase the available queue size to 150720

    Ethernet1/1 queuing information:

      TX Queuing

        qos-group  sched-type  oper-bandwidth

            0       WRR              3   (Best Effort)

            1       WRR             17  (FCoE)

            2       WRR             31  (VSAN)

            3       WRR             25  (VM)

            4       WRR             18  (vMotion)

            5       WRR              6   (Mgmt)

    RX Queuing

        qos-group 0

        q-size: 150720, HW MTU: 1500 (1500 configured)

        drop-type: drop, xon: 0, xoff: 150720

    qos-group 1

        q-size: 79360, HW MTU: 2158 (2158 configured)

        drop-type: no-drop, xon: 20480, xoff: 40320

    qos-group 2

        q-size: 22720, HW MTU: 1500 (1500 configured)

        drop-type: drop, xon: 0, xoff: 22720

        Statistics:

            Pkts received over the port             : 256270856

            Ucast pkts sent to the cross-bar        : 187972399

            Mcast pkts sent to the cross-bar        : 63629024

            Ucast pkts received from the cross-bar  : 1897117447

            Pkts sent to the port                   : 2433368432

            Pkts discarded on ingress               : 4669433

            Per-priority-pause status               : Rx (Inactive), Tx (Inactive)

    Egress Buffers were verified to be congested during large file copy:

    show hardware internal carmel asic 0 registers match .*STA.*frh.* | i eg

    The following command reveals congestion on the egress (reference):

    nap-FI6248-VSAN-B(nxos)#  show hardware internal carmel asic 0 registers match .*STA.*frh.* | i eg

    Slot 0 Carmel 0 register contents:

    Register Name                                          | Offset   | Value

    car_bm_STA_frh_eg_addr_0                               | 0x50340  | 0x1

    car_bm_STA_frh_eg_addr_1                               | 0x52340  | 0

    car_bm_STA_frh_eg_addr_2                               | 0x54340  | 0

    car_bm_STA_frh_eg_addr_3                               | 0x56340  | 0

    car_bm_STA_frh_eg_addr_4                               | 0x58340  | 0

    car_bm_STA_frh_eg_addr_5                               | 0x5a340  | 0

    car_bm_STA_frh_eg_addr_6                               | 0x5c340  | 0

    car_bm_STA_frh_eg_addr_7                               | 0x5e340  | 0

    nap-FI6248-VSAN-B(nxos)#  show hardware internal carmel asic 0 registers match .*STA.*frh.* | i eg

    Slot 0 Carmel 0 register contents:

    Register Name                                          | Offset   | Value

    car_bm_STA_frh_eg_addr_0                               | 0x50340  | 0x2

    car_bm_STA_frh_eg_addr_1                               | 0x52340  | 0

    car_bm_STA_frh_eg_addr_2                               | 0x54340  | 0

    car_bm_STA_frh_eg_addr_3                               | 0x56340  | 0

    car_bm_STA_frh_eg_addr_4                               | 0x58340  | 0

    car_bm_STA_frh_eg_addr_5                               | 0x5a340  | 0

    car_bm_STA_frh_eg_addr_6                               | 0x5c340  | 0

    car_bm_STA_frh_eg_addr_7                               | 0x5e340  | 0

    nap-FI6248-VSAN-B(nxos)#  show hardware internal carmel asic 0 registers match .*STA.*frh.* | i eg

    Slot 0 Carmel 0 register contents:

    Register Name                                          | Offset   | Value

    car_bm_STA_frh_eg_addr_0                               | 0x50340  | 0

    car_bm_STA_frh_eg_addr_1                               | 0x52340  | 0

    car_bm_STA_frh_eg_addr_2                               | 0x54340  | 0

    car_bm_STA_frh_eg_addr_3                               | 0x56340  | 0x1

    car_bm_STA_frh_eg_addr_4                               | 0x58340  | 0

    car_bm_STA_frh_eg_addr_5                               | 0x5a340  | 0

    car_bm_STA_frh_eg_addr_6                               | 0x5c340  | 0

    car_bm_STA_frh_eg_addr_7                               | 0x5e340  | 0

    I should note we are not seeing discards or drops on any of the 'show interface' counters.



  • 20.  RE: Write latency and network errors

    Posted Apr 11, 2018 10:35 PM

    Subscribing. I have same issues.



  • 21.  RE: Write latency and network errors

    Posted May 01, 2018 06:52 PM

    We're having the same issue on a new 12 node, all flash stretch cluster with raid-5 and encryption.  Write latency is very high.  We have support tickets open with Dell and VMware. We've done testing with hcibench and SQLIO using different storage policies.  Raid 1 is better but still below what we consider acceptable.

    The out of order packets were caused by having dual uplinks to two different top of rack switches. We resolved that by changing them active-passive instead of active-active.  We'll convert to LACP when we get a chance.  Networking is all 10gig with < 1ms latency between hosts and sites.  Top of rack switches are Cisco Nexus 5K's and all error counters are clean.  Using iPerf from the host shell shows we can easily push greater than 9gbit between hosts and sites with .5 to .6 ms latency.



  • 22.  RE: Write latency and network errors

    Posted Apr 02, 2018 02:19 PM

    We are also seeing a lot of these errors on our All Flash vSAN environment. We've been doing some testing and think we have narrowed down the issue.

    We have 6 hosts with the following configuration..

    SuperMicro 1028U-TR4+

    2xIntel E5-2680v4

    512GB RAM

    X710-DA2 10GB Network Adapters (Dedicated for vSAN, not shared)

    Cisco 3548 Switches (Dedicated for vSAN, not shared)

    We went through different drives/firmware on our X710, but so far none of that has made a difference.

    We noticed on our Cisco switch that all of the interfaces connected to our vSAN were having discards on a regular basis (multiple times every hour). We opened a support case with Cisco to troubleshoot this and found that ALL of our vSAN ports have bursts of traffic that are filling up the output buffers on the switch. During these bursts/full buffers the switch discards the packets.

    So I would check on your switches to see if you are having any packet discards.

    At this point Cisco is recommending we move to a deep buffer switch. I spoke with VMWare support to see if there is a specific switch they recommend (or buffers), but they said they just require a 10Gb switch. I find this frustrating as we have 2 expensive switches we are only using 6 ports on and may not be able to add any more hosts to.

    Ethernet1/2 queuing information:

        qos-group  sched-type  oper-bandwidth

            0       WRR            100

        Multicast statistics:

            Mcast pkts dropped                      : 0

        Unicast statistics:

        qos-group 0

        HW MTU: 16356 (16356 configured)

        drop-type: drop, xon: 0, xoff: 0

        Statistics:

            Ucast pkts dropped                      : 180616

    Ethernet1/2 is up

    Dedicated Interface

      Hardware: 100/1000/10000 Ethernet, address: 00d7.8faa.cf09 (bia 00d7.8faa.cf09)

      MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec

      reliability 255/255, txload 2/255, rxload 4/255

      Encapsulation ARPA

      Port mode is access

      full-duplex, 10 Gb/s, media type is 10G

      Beacon is turned off

      Input flow-control is off, output flow-control is off

      Rate mode is dedicated

      Switchport monitor is off

      EtherType is 0x8100

      Last link flapped 4d12h

      Last clearing of "show interface" counters 3d23h

      0 interface resets

      Load-Interval #1: 30 seconds

      30 seconds input rate 98177624 bits/sec, 4262 packets/sec

      30 seconds output rate 124356600 bits/sec, 4302 packets/sec

      Load-Interval #2: 5 minute (300 seconds)

        input rate 163.09 Mbps, 6.20 Kpps; output rate 113.03 Mbps, 6.33 Kpps

      RX

        2620601947 unicast packets  5716 multicast packets  335 broadcast packets

        2620612576 input packets  10625804438347 bytes

        1353181073 jumbo packets  0 storm suppression bytes

        0 runts  0 giants  0 CRC  0 no buffer

        0 input error  0 short frame  0 overrun   0 underrun  0 ignored

        0 watchdog  0 bad etype drop  0 bad proto drop  0 if down drop

        0 input with dribble  0 input discard

        0 Rx pause

      TX

        2619585440 unicast packets  0 multicast packets  2452 broadcast packets

        2619587892 output packets  9072740199246 bytes

        1162617883 jumbo packets

        0 output errors  0 collision  0 deferred  0 late collision

        0 lost carrier  0 no carrier  0 babble 180616 output discard

        0 Tx pause



  • 23.  RE: Write latency and network errors

    Broadcom Employee
    Posted Apr 02, 2018 02:33 PM

    For NIC issues, here is a typical checklist:

    • Make sure the NICs are on the vSphere VCG
    • Not only make sure that Firmware and Drivers are up to date (latest), BUT also that there are no mismatches
      • Mismatches between these two have been know to cause some issues, in particular packet drops, based on my experience
    • For the X710 (X71x & 72x) disabling LRO / TSO have resolved a lot of the issues encountered in the past.


  • 24.  RE: Write latency and network errors

    Posted Apr 02, 2018 03:10 PM

    For the X710 (X71x & 72x) disabling LRO / TSO have resolved a lot of the issues encountered in the past.

    We are aware of the LRO/TSO errors and the firmware/driver version recommendations for the X710's and have already been through all of those settings.



  • 25.  RE: Write latency and network errors

    Posted Apr 02, 2018 03:12 PM

    Also all of our hardware is on the HCL and has matching drivers/firmware.

    I actually posted another thread specific to my issue at All-Flash vSAN Latency &amp; Network Discards (Switching Recommendations) 

    I just wanted to give the poster here some reference in case they are seeing the same thing we are seeing.



  • 26.  RE: Write latency and network errors

    Posted Feb 11, 2019 06:57 PM

    LeslieBNS9,

    Did you ending up getting a deep buffer switch? We are having the same issue.



  • 27.  RE: Write latency and network errors

    Posted Mar 10, 2019 10:42 AM

    Hello All,

    Same issue we are experiencing. Any update for solution?

    My switches Nexus 5548UP, a lot of packets are discarding on the switch ports.



  • 28.  RE: Write latency and network errors

    Posted Mar 10, 2019 11:51 AM

    Meanwhile, you can read more and more about packet discards on the switch side in All-Flash vSAN configurations. The cause often seems to be the buffer on the switch side. VMware itself gives little or no information about which switch components to use because they want to be hardware independent and don't prefer a vendor. But in my personal opinion, most Nexus switches are crap for use in vSAN all-flash configurations, especially if they're over 5 years old and have a shared buffer.

    However, John Nicholson (Technical Marketing vSAN) recently published a post on Reddit that summarizes some points to keep in mind (but it's his personal opinion and no official statement):

    1. Don't use Cisco FEX's. Seriously, just don't. Terrible buffers, no port to port capabilities. Even Cisco will telly you not to put storage on them
    2. Buffers. For a lab that 4MB buffer marvel $1000 special might work but really 12MB is the minimum buffer I wantt o see. IF you want to go nuts I've heard some lovely things about those crazy 6GB buffer StrataDNX DUNE ASIC switches (Even Cisco carries one the Nexus 36xx I think). Dropped frames/packets/re-transmits rapidly slow down storage. That Cisco Nexus 5500 that's 8 years old and has VoQ stuff? Seriously don't try running a heavy database on it!
    3. It's 2019. STOP BUYING 10Gbps stuff. 25Gbps cost very little more, and 10Gbps switches that can't do 25Gbps are likely 4 year old ASIC's at this point.
    4. Mind your NIC Driver/Firmware. The vSphere Health team has even started writing online health checks to KB's on a few. Disable the weird PCI-E powersaving if using Intel 5xx series NIC's. It will cause flapping.
    5. LACP if you use it, use the vDS and do a advanced hash (SRC-DST) to get proper bang/buck. Don't use crappy IP HASH only. No shame in active/passive. simpler to troubleshoot and failure behavior is cleaner.
    6. TURN ON CDP/LLDP in both directions!
    7. Only Arista issue I've seen (Was another redditor complaining about vSAN performance actually a while back we helped) was someone who mis-matched his LAG policies/groups/hashes.

    Interfaces. TwinAx I like because unlike 10Gbase-T you don't have to worry about interference or termination, they are reasonably priced, and as long as you don't need a long run the passive ones don't cause a lot of comparability issues.

    https://www.reddit.com/r/vmware/comments/aumhvj/vsan_switches/



  • 29.  RE: Write latency and network errors

    Posted Mar 12, 2019 04:34 AM

    Thank you for answer.

    Now we are working on the case with Cisco support. If I summary, they recommend us to apply the following steps:

    My issue is huge ingress packet discarding by the switch.

    - HOLB Mitigation: Enable VOQ Limit

    - HOLB Mitigation: Traffic Classification

    https://www.cisco.com/c/en/us/support/docs/switches/nexus-6000-series-switches/200401-Nexus-5600-6000-Understanding-and-Troub.html

    After applied the steps, I will inform you.



  • 30.  RE: Write latency and network errors

    Posted Mar 15, 2019 01:49 PM

    Still it continues, we applied QoS by using ACL, but we couldnt finalize the issue.



  • 31.  RE: Write latency and network errors

    Posted Jul 31, 2019 07:53 PM

    Very similar issue here on an all flash 5 node 6.7u2 cluster. 10gig dedicated interface for vSAN using Cisco 3548 switches.

    As others we have high write latency. Just a basic copy of a large file between drives within a VM causes write latency to spoke. We often see the copy start out fast for a few seconds then slow. We see output discards on the switch ports increment and with observer we can see retransmits, duplicate acks, data and out of order packets.

    We have also tried using a Cisco 4500x switch but the issues remain, but this doesn't have much more in the way of buffers than the 3548.

    I'm looking to get my hands on a 25g switch to test. Has anyone with this issue tried increasing network bandwidth and/or a switch with deeper buffers?