Hi Marcelo,
The key to debugging this type of issue is to enable level 1 debug for the port. This will cause Spectrum to generate debug event data every time it calculates performance metrics for the port. These events will contain all of the source data read from the device along with information about how those values were used to calculate the performance metrics.
The procedure is as follows:
1. Identify which ports exhibit the behavior and make a list of their model handles. Limit this list to less than 20 ports. Do not enable this debugging for more than 20 ports at a time.
2. Using CLI, send the 0x10297 action to the port(s) in the list:
./update action=0x10297 mh= <port mh>
This will enable debugging for the port for a period of 7 days after which Spectrum will automatically disable the debugging.
3. When the issue with the saw-tooth graph behavior occurs again, make a note of the port in question (assuming it is one of those that debugging was turned on for) and timeframe over which the issue was seen.
4. Navigate to the port in question and select the events tab. Export all events to a CSV format for the timeframe where the issue occurred. Staring 5 minutes before and 5 minutes after is a good timeframe.
5. Examine the event report in MS excel. In the case above (a saw-tooth patterned graph with lows that are zero), the key is to look for and examine instances of the 0x10d81 event where the "CURRENT LOAD_TOTAL" value is zero. Each of these will correspond to a low of zero in the graph.
For each particular instance (event 0x10d81 with "CURRENT LOAD_TOTAL" equal to zero), the next step is to look at the data to see if the "Initial" and "Final" values for the major octet and packet rate counters are the same – which means their delta is zero over the sample period. Pay specific attention to looking at the differences between the "Initial" and "Final values of the following:
X_IN_OCTETS and X_OUT_OCTETS
X_IN_NUCAST_PKTS and X_OUT_NUCAST_PKTS
X_IN_MCAST_PKTS and X_OUT_MCAST_PKTS
X_IN_BCAST_PKTS and X_OUT_BCAST_PKTS
X_IN_UCAST_PKTS and X_OUT_UCAST_PKTS
- If all of the above counters show a zero delta over the sample period then the device is not updating its MIB counters often enough. Spectrum will poll the source data for the graph every ~10 seconds and, if the device is slow to update its counters, this will manifest as a saw-tooth graph with lows of zero, and highs that are typically less than 100%. Devices typically update their MIB values once every 2-3 seconds and a longer update interval implies a problem with the MIB agent on the device. The specific MIB counters Spectrum is reading can be obtained from the "EXTERNAL SOURCE ATTRIBUTES". This list contains the attribute IDs of all the external attributes spectrum is reading for port performance calculations for the port in question. With this information the customer check form themselves by using a sniffer trace to verify the agent behavior while the Spectrum graph is open.
- If the data does NOT show that the deltas of the major counters are all zero then another probable cause is that the OC Console is requesting the graph data too often (more often than once every 10 seconds). This can be determined by looking at the time difference between the creation time of the 0x10d81 event where counter deltas were zero and the previous instance of the 0x10d81 event. If the difference in creation times less than 10 seconds then the OC Console polling skew could be at play. In this case the next step is to:
- Add the following entry to the $SPECROOT/SS/.vnmrc:
port_perf_valid_result_age=9
- Cycle the SpectroSERVER.
- Check to see if the sawtooth pattern has disappeared.
If you see greatly reduced instances of the problem but still sees limited instances of the behavior repeat these steps again for the specific ports that still show the problem. If polling skew is still the cause (a this point the difference in event creation times would need to be less than 9 seconds), reduce the .vnmrc setting for port_perf_valid_result_age to 8 and cycle the SpectroSERVER.
I hope that helps.
------------------------------
Technical Support Engineer IV
Broadcom Inc
------------------------------
Original Message:
Sent: 12-05-2019 03:09 AM
From: Marcelo Zacchi
Subject: Interface utilization oddities
Fellow Spectrum admins,
I have been running a snmpwalk on a device to get the interface input data on a highly utilized port and I am getting a weird, damming, behavior:
snmpwalk -r1 -t3 -v 3 -u spectrum -lauthPriv -a SHA -A #### -x AES -X #### 10.10.10.26 1.3.6.1.2.1.31.1.1.1.6.939569152
Instead of getting a gradual increase, I am getting it in bolts:
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250109384515203
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250124476845203
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250138966734949
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250153796686324
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250153796686324
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250153796686324
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250153796686324
IF-MIB::ifHCInOctets.939569152 = Counter64: 5250153796686324
The problem with that is that my Performance graph in Spectrum looks like this, with peaks of over 200% and valleys of 0%:
The weird thing is that looking from the device IOS, I am getting a gradual increase in the counters, so I assume it is an issue with the SNMP data.
XXXXX1.XX2# sh int eth1/2/2 | i "input packets"
4802007060663 input packets 5250155335945793 bytes
XXXXX1.XX2# sh int eth1/2/2 | i "input packets"
4802007533795 input packets 5250155906888303 bytes
XXXXX1.XX2# sh int eth1/2/2 | i "input packets"
4802008008309 input packets 5250156505085683 bytes
XXXXX1.XX2# sh int eth1/2/2 | i "input packets"
4802008466826 input packets 5250157051289196 bytes
XXXXX1.XX2# sh int eth1/2/2 | i "input packets"
4802008932035 input packets 5250157620969379 bytes
XXXXX1.XX2# sh int eth1/2/2 | i "input packets"
4802009777253 input packets 5250158650663943 bytes
Have you ever seen this?
Thanks and regards,
------------------------------
Marcelo Zacchi
CA Spectrum consultant
Nets Denmark
------------------------------