Layer7 API Management

Expand all | Collapse all

Unequal CPU-usage across all cores

  • 1.  Unequal CPU-usage across all cores

    Posted 02-07-2019 04:55 AM

    When using the "top"-command and then pressing "1" you can see the CPU-usage of each individual CPU core.
    We see the following result:
    top - 17:11:31 up 6 days, 18:02,  1 user,  load average: 2.80, 4.14, 4.48
    Tasks: 299 total,   1 running, 298 sleeping,   0 stopped,   0 zombie
    Cpu0  : 80.0%us,  1.7%sy,  0.0%ni,  9.7%id,  0.0%wa,  0.0%hi,  8.7%si,  0.0%st
    Cpu1  : 37.5%us, 12.6%sy,  0.0%ni, 49.2%id,  0.3%wa,  0.0%hi,  0.3%si,  0.0%st
    Cpu2  : 39.9%us,  1.3%sy,  0.0%ni, 58.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    Cpu3  : 41.3%us,  2.0%sy,  0.0%ni, 56.0%id,  0.7%wa,  0.0%hi,  0.0%si,  0.0%st
    Cpu4  : 34.8%us,  1.3%sy,  0.0%ni, 63.5%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
    Cpu5  : 36.7%us,  1.7%sy,  0.0%ni, 61.3%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
    Cpu6  : 37.2%us,  3.3%sy,  0.0%ni, 59.1%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
    Cpu7  : 40.7%us,  6.0%sy,  0.0%ni, 50.3%id,  3.0%wa,  0.0%hi,  0.0%si,  0.0%st
    Cpu8  : 32.1%us,  3.6%sy,  0.0%ni, 63.6%id,  0.3%wa,  0.0%hi,  0.3%si,  0.0%st
    Cpu9  : 30.8%us,  9.4%sy,  0.0%ni, 59.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    Cpu10 : 40.3%us,  1.7%sy,  0.0%ni, 58.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    Cpu11 : 27.7%us,  4.7%sy,  0.0%ni, 67.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

     

    As you can see, we have currently 12 cores assigned to this VM, but the first core has a much higher load than all other cores.
    The remaining 11 cores are using approx. the same load, so that's fine.
    So what's the reason, why the first core has that much more load?
    Is this some OS or administrative overhead?
    Is this related to the ESX configuration and if yes, how can this be optimized?
    Are there any commands or configurations we should check to get further details here?
    What is this "si"-value? This is between 7-11% just for the first CPU.
    Thank you!

     

    Ciao Stefan



  • 2.  Re: Unequal CPU-usage across all cores

    Posted 02-08-2019 09:09 PM

    Stefan,

     

    Good evening, the SI stands for software interrupt routines(In Linux "top" command what are us, sy, ni, id, wa, hi, si and st (for CPU usage)? - Unix & Linux Stack Exchange). A some questions, an observation, and one suggestion: 

    Questions:

    1) What version of the gateway is this running?

    2) What version of ESX and the hardware version are in place?

    3) Is this just a processing node or is it handling MySQL traffic as well?

    4) Are you seeing issues with message processing?

     

    Observation:

    We have seen occasional spikes in CPU load on one CPU or another but normally it is spread out over all the nodes.

     

    Suggestion:

    We have found that the adding in the line com.l7tech.server.log.console=false to the /opt/SecureSpan/Gateway/node/default/etc/conf/system.properties file will reduce context switching in the gateway (Note: reboot the gateway for this to take effect). This setting is being recommended and will be looked to be added into the base product in future releases.

     

    Sincerely,

     

    Stephen Hughes

    Broadcom Support



  • 3.  Re: Unequal CPU-usage across all cores

    Posted 02-11-2019 09:22 AM

    Hi Stephen,

    please find my answers below:

    1) 9.1.01

    2) I have to double check this first as this is not in our responsibility. But which version or maybe settings would be critical here?

    3) This is a two-node cluster running in active-active mode in regards to message processing. And in active-standby mode in regards to MySQL, where the shown statistics are from the primary MySQL-node. But it's the same behavior on the secondary node as well. Only CPU0 has a much higher value than all other 11 cores.

    4) As of now I would say no, but we are getting monitoring alerts several times per day, that the avg. CPU-load from the last 5minutes is higher than 90%. And I don't know how the system behaves if this single core get stucked at all.

     

    The com.l7tech.server.log.console=false is already in place and configured.

     

    Again, I don't talk about Spikes, it's really the avg. load, which is almost twice as much for CPU0 in comparison to the remaining 11 cores.

    Thank you!

     

    Ciao Stefan



  • 4.  Re: Unequal CPU-usage across all cores

    Posted 02-13-2019 03:22 AM

    Hi again,

    I've now details about the ESX stuff. This is:

    • Manufacturer: Cisco Systems Inc.
    • Model: UCSB-B420-M
    • CPUs: 56x @ 2.095 GHz
    • CPU-Type: Intel(R) Xeon(R) CPU E5-4660 v3
    • License: VMware vSphere 6 Enterprise Plus
    • Processor Sockets: 4
    • Processors per Socket: 14
    • Locigal Processors: 112
    • Hyperthreading: Active

    Do you require any furher Information, maybe any special Settings?

    Isn't it possible to verify, which processes are running on CPU0?

    I also check cpu affinity and both main processes (java, mysql) are showing 0-11. So this seems to be fine as well.

    Thank you!

     

    Ciao Stefan



  • 5.  Re: Unequal CPU-usage across all cores

    Posted 02-14-2019 01:47 PM

    Stefan,

     

    Thank you for all the information. I have been testing this out in our lab to find the best way to see what is occurring. I've not been able to reproduce so the steps are only to help get more information. You can start by running the command top, type f, type j (P = Last used cpu (SMP)) then enter to return to top output. You will now see a P column that outputs the CPU that the process is using. From there you can run the command taskset -p <pid for the process in question> to see what the CPU Affinity is set to for the process. 

     

    Additonal questions:

    1) Are you running any custom assertions that are not installed by default on the gateway?

    2) Anything different about the policies in what they use like Siteminder, Kerberos, etc?

     

    Sincerely,

     

    Stephen Hughes

    Broadcom Support



  • 6.  Re: Unequal CPU-usage across all cores

    Posted 02-15-2019 06:58 AM

    Hi Stephen,

    as already mentioned in my previous post, I already checked CPU Affinity for the main process and they all show "0-11", so this seems to be fine.

    I adjusted the top-Output as described from you:

    top - 11:42:14 up 15 days, 12:33,  1 user,  load average: 2.81, 3.12, 3.19
    Tasks: 299 total,   1 running, 298 sleeping,   0 stopped,   0 zombie
    Cpu(s): 43.6%us,  1.2%sy,  0.0%ni, 54.1%id,  0.0%wa,  0.0%hi,  1.0%si,  0.0%st
    Mem:  41153656k total, 38314948k used,  2838708k free,   770856k buffers
    Swap:  2097148k total,    88228k used,  2008920k free, 13040228k cached

      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+   P COMMAND
     3452 gateway   20   0 24.9g  17g  18m S 546.7 45.3  60967:47  8 java
     3378 layer7    20   0 13.6g 304m  16m S  0.3  0.8  30:47.01  5 java
     2218 fndsrv    20   0 8740m 492m  14m S  0.3  1.2  90:06.24  8 java
     3046 mysql     20   0 6497m 308m 5488 S  4.6  0.8   2177:11  0 mysqld

     

    The mysqld process is switching fine between different CPUs, but the three Java processes stay the same all the time (but interesting it's not CPU0).

    Any more ideas or other commands, which I can execute to see what's running on each CPU?

     

    And in regards to any non-standard assertion, we have only the Metrics assertion installed, but this is not really used anymore. Besides this we have no other special assertions like you mentioned in use.

    Thank you!

     

    Ciao Stefan



  • 7.  Re: Unequal CPU-usage across all cores

    Posted 03-13-2019 04:10 AM

    Hi again,

    we found new Information regarding our issue.

    The IRQ-handling for the Network Interfaces is ALL assigned to just CPU0. I think this is also the reason, why the "%si" value in the "top"-command is showing such a high value (around 7-10%), where all other CPUs are 0%.

    Here a short Output from "cat /proc/Interrupts":

               CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       CPU8       CPU9       CPU10      CPU11     
     56:  228747200          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-rxtx-0
     57:  212499840          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-rxtx-1
     58:  171684243          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-rxtx-2
     59:  146349987          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-rxtx-3
     60:  146279048          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-rxtx-4
     61:  147885300          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-rxtx-5
     62:  144577720          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-rxtx-6
     63:  145362649          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-rxtx-7
     64:          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth0-event-8
     65: 1738199044          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-rxtx-0
     66: 1724994508          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-rxtx-1
     67: 1730530116          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-rxtx-2
     68: 1722641672          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-rxtx-3
     69: 1720193245          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-rxtx-4
     70: 1932210891          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-rxtx-5
     71: 1726752158          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-rxtx-6
     72: 1726421224          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-rxtx-7
     73:          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-event-8
     74:  704625987          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-rxtx-0
     75:  696922069          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-rxtx-1
     76:  698645849          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-rxtx-2
     77:  705809163          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-rxtx-3
     78:  696361203          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-rxtx-4
     79:  771890633          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-rxtx-5
     80:  696558172          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-rxtx-6
     81:  699802843          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-rxtx-7
     82:          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth2-event-8

     

    We already checked SMP Affinity and this looks fine for all above mentioned IRQs, all are set to "0-11".

    So do you have any idea what's wrong here?

    Is this more VM or ESX related? And what else should be checked based on these findings?

    Thank you!

     

    Ciao Stefan



  • 8.  Re: Unequal CPU-usage across all cores

    Posted 03-13-2019 02:00 PM

    Stefan,

     

    I've been doing some research based on the information that you provided around the IRQ of the ethernet adapters. In one article it mentions that having all the activity for the networking occur on one CPU can provide the best performance (linux - CPU0 is swamped with eth1 interrupts - Server Fault ) as the interrupt will be in cache and can be accessed quicker. Somethings we can look into is what vmware tools are installed within the host and what type of interface is used.

     

    Some additional links:

    https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-cpu-irq

    https://kb.vmware.com/s/article/2058349

     

    Sincerely,

     

    Stephen Hughes

    Broadcom Support



  • 9.  Re: Unequal CPU-usage across all cores

    Posted 04-02-2019 08:34 AM

    Hi Stephen,

    yes I got this information from CA support in the meanwhile as well, but I think this would mainly explain the higher "si" value, which is between 10-15% on our System. But still the "us" value is around 30% higher than on all other CPUs.

    But I'm still wondering, why it's not possible to identify this high load. I mean the system is calculate/sum different processes and will display the number/result. So either there is maybe some kind of strange bug, and the higher value of CPU0 is not correct or it should be possible to identify the different processes running on CPU0. So which commands can be used to display all the different values, which in sum will result in this high value? This MUST be somehow possible, isn't it?

    Thanks again!

     

    Ciao Stefan