DX NetOps

 View Only
  • 1.  Interface utilization oddities

    Posted Dec 05, 2019 03:09 AM
    Edited by Marcelo Zacchi Dec 05, 2019 03:13 AM
    Fellow Spectrum admins,

    I have been running a snmpwalk on a device to get the interface input data on a highly utilized port and I am getting a weird, damming, behavior:
    snmpwalk -r1 -t3 -v 3 -u spectrum -lauthPriv -a SHA -A #### -x AES -X #### 10.10.10.26 1.3.6.1.2.1.31.1.1.1.6.939569152
    Instead of getting a gradual increase, I am getting it in bolts:

    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250109384515203
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250153796686324
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250153796686324
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250153796686324
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250153796686324
    IF-MIB::ifHCInOctets.939569152 = Counter64: 5250153796686324
    ​

    The problem with that is that my Performance graph in Spectrum looks like this, with peaks of over 200% and valleys of 0%:


    The weird thing is that looking from the device IOS, I am getting a gradual increase in the counters, so I assume it is an issue with the SNMP data.

    XXXXX1.XX2# sh int eth1/2/2 | i "input packets"
        4802007060663 input packets  5250155335945793 bytes
    XXXXX1.XX2# sh int eth1/2/2 | i "input packets"
        4802007533795 input packets  5250155906888303 bytes
    XXXXX1.XX2# sh int eth1/2/2 | i "input packets"
        4802008008309 input packets  5250156505085683 bytes
    XXXXX1.XX2# sh int eth1/2/2 | i "input packets"
        4802008466826 input packets  5250157051289196 bytes
    XXXXX1.XX2# sh int eth1/2/2 | i "input packets"
        4802008932035 input packets  5250157620969379 bytes
    XXXXX1.XX2# sh int eth1/2/2 | i "input packets"
        4802009777253 input packets  5250158650663943 bytes
    ​


    Have you ever seen this?
    ​​
    Thanks and regards,

    ------------------------------
    Marcelo Zacchi
    CA Spectrum consultant
    Nets Denmark
    ------------------------------


  • 2.  RE: Interface utilization oddities
    Best Answer

    Broadcom Employee
    Posted Dec 05, 2019 06:14 AM
    Edited by Marcelo Zacchi Dec 05, 2019 08:42 AM
    Hi Marcelo,

    The key to debugging this type of issue is to enable level 1 debug for the port. This will cause Spectrum to generate debug event data every time it calculates performance metrics for the port. These events will contain all of the source data read from the device along with information about how those values were used to calculate the performance metrics.

    The procedure is as follows:

    1. Identify which ports exhibit the behavior and make a list of their model handles. Limit this list to less than 20 ports. Do not enable this debugging for more than 20 ports at a time.

    2. Using CLI, send the 0x10297 action to the port(s) in the list:

    ./update action=0x10297 mh= <port mh>

    This will enable debugging for the port for a period of 7 days after which Spectrum will automatically disable the debugging.

    3. When the issue with the saw-tooth graph behavior occurs again, make a note of the port in question (assuming it is one of those that debugging was turned on for) and timeframe over which the issue was seen.

    4. Navigate to the port in question and select the events tab. Export all events to a CSV format for the timeframe where the issue occurred. Staring 5 minutes before and 5 minutes after is a good timeframe.

    5. Examine the event report in MS excel. In the case above (a saw-tooth patterned graph with lows that are zero), the key is to look for and examine instances of the 0x10d81 event where the "CURRENT LOAD_TOTAL" value is zero. Each of these will correspond to a low of zero in the graph.

    For each particular instance (event 0x10d81 with "CURRENT LOAD_TOTAL" equal to zero), the next step is to look at the data to see if the "Initial" and "Final" values for the major octet and packet rate counters are the same – which means their delta is zero over the sample period. Pay specific attention to looking at the differences between the "Initial" and "Final values of the following:

    X_IN_OCTETS and X_OUT_OCTETS

    X_IN_NUCAST_PKTS and X_OUT_NUCAST_PKTS

    X_IN_MCAST_PKTS and X_OUT_MCAST_PKTS

    X_IN_BCAST_PKTS and X_OUT_BCAST_PKTS

    X_IN_UCAST_PKTS and X_OUT_UCAST_PKTS


    • If all of the above counters show a zero delta over the sample period then the device is not updating its MIB counters often enough. Spectrum will poll the source data for the graph every ~10 seconds and, if the device is slow to update its counters, this will manifest as a saw-tooth graph with lows of zero, and highs that are typically less than 100%. Devices typically update their MIB values once every 2-3 seconds and a longer update interval implies a problem with the MIB agent on the device. The specific MIB counters Spectrum is reading can be obtained from the "EXTERNAL SOURCE ATTRIBUTES". This list contains the attribute IDs of all the external attributes spectrum is reading for port performance calculations for the port in question. With this information the customer check form themselves by using a sniffer trace to verify the agent behavior while the Spectrum graph is open.

    • If the data does NOT show that the deltas of the major counters are all zero then another probable cause is that the OC Console is requesting the graph data too often (more often than once every 10 seconds). This can be determined by looking at the time difference between the creation time of the 0x10d81 event where counter deltas were zero and the previous instance of the 0x10d81 event. If the difference in creation times less than 10 seconds then the OC Console polling skew could be at play. In this case the next step is to:

      • Add the following entry to the $SPECROOT/SS/.vnmrc:
        port_perf_valid_result_age=9
      • Cycle the SpectroSERVER.
      • Check to see if the sawtooth pattern has disappeared.

    If you see greatly reduced instances of the problem but still sees limited instances of the behavior repeat these steps again for the specific ports that still show the problem. If polling skew is still the cause (a this point the difference in event creation times would need to be less than 9 seconds), reduce the .vnmrc setting for port_perf_valid_result_age to 8 and cycle the SpectroSERVER.


    I hope that helps.

    ------------------------------
    Technical Support Engineer IV
    Broadcom Inc
    ------------------------------



  • 3.  RE: Interface utilization oddities

    Posted Dec 05, 2019 06:26 AM
    Hi Silvio,

    Thanks for your reply. I have already done that and sent the data over to Broadcom through a ticket. The issue, it seems, is not in Spectrum. It makes total sense from its perspective: the interface usually responds with an unaltered counter, so the bw utilization is 0, but suddenly, the counter is increased massively (14Gb to be precise) and that is what generates the peak. After that, delta changes back to 0 and we have the valleys.

    My point is, if there is any way to make Spectrum poll the interface only once every 1-2 minutes instead of every 2-3 seconds, we would avoid that problem.
    This seems to be a known issue with Nexus boxes (https://community.cisco.com/t5/network-management/cisco-nexus-3548-3524-snmp-interface-traffic-problem/m-p/2781144#M108213), but apparently no solution has been present by Cisco as of now.

    Regards,

    ------------------------------
    Marcelo Zacchi
    CA Spectrum consultant
    Nets Denmark
    ------------------------------



  • 4.  RE: Interface utilization oddities

    Posted Dec 05, 2019 07:46 AM
    Hi @Silvio Okamoto,
    I have played around with the port_perf_valid_result_age in my test environment and this is what I found out.
    Increasing it to 60 actually seems to have fixed my issue:

    Could you please elaborate a bit further on what this parameter does and what impact it might have if I enable it to such a big number?

    Regards,


    ------------------------------
    Marcelo Zacchi
    CA Spectrum consultant
    Nets Denmark
    ------------------------------



  • 5.  RE: Interface utilization oddities

    Broadcom Employee
    Posted Dec 05, 2019 08:29 AM
    Hi Marcelo,

    By setting the port_perf_valid_result_age=60 in the $SPECROOT/SS/.vnmrc file, the poll results will be valid for 60 seconds, and same data will be reused for that time.

    ------------------------------
    Technical Support Engineer IV
    Broadcom Inc
    ------------------------------



  • 6.  RE: Interface utilization oddities

    Posted Dec 05, 2019 08:42 AM
    Hi Silvio,

    OK, seems harmless enough. I will let it run for a while in test to make sure there is no impact.
    Thanks again for your help!

    Regards,

    ------------------------------
    Marcelo Zacchi
    CA Spectrum consultant
    Nets Denmark
    ------------------------------