We are doing a performance testing we have around 14 vm ssg nodes but we are having an issue with high CPU utilization by the gateway service over 800% and system CPU is reaching to 97%, we have over 8 cores and 32gb ram on each server and set max io concurrency to around 950, but still not sure what is lacking and to get your note we are using OTK(Oracle DB) and 9.1 ssg. would you suggest to take a thread dump and analysis the process to see what was consuming significant CPU? Please advice on how to nail it down.
I would say it would be better to have small clusters rather than having one big one, my idea would be to build cluster of 6 nodes in each and use LB to direct the traffic. In regards to tuning the gateway please take a look at this community post "How to tweak httpCoreConcurrency and httpMaxConcurrency "
and inregards to collecting data required for troubleshooting performance issues you can use our DCT tool, which is here "CA API Gateway Data Collector Tool is now available! "
On a side note, Gateway 9.1 unfortunately had quite a few performance issues, particularly when it came to JDBC calls - which you're going to have a ton of because you are also using OTK to an external database server. I'd suggest upgrading to 9.1.01 or even the latest 9.3. That alone should improve the Gateway performance, without even needing to re-architect anything. But of course, it'll only go so far and you may still need to tweak other items.
The Gateway is a very customizable application, but as such it is not really a one-size-fits-all type of product, so there is often ongoing tweaks needed to really take performance to the next level in any given environment. This is mostly needed for re-evaluation if your traffic levels change by a wide margin or you create new heavy-duty services (e.g. using lots of RegEx assertions or Evaluate-type assertions) or use something like the OTK which makes a lot of JDBC calls, for example.
Release notes of resolved issues in 9.1.01 for your reference: https://docops.ca.com/ca-api-gateway/9-1/en/release-notes-9-1-01/resolved-issues-in-9-1-01
Additionally, you noted you are using OTK with an Oracle database. We find that while this works without issues after some tweaking, it does need some tweaking in most cases from the default JDBC properties, because Oracle databases are a bit inconvenient with its licensing and configuration limitations and added security such as DDoS detection which often triggers with false-positives. While the documentation mentions specifically the DDoS part, the recommended advanced JDBC properties are recommended for any environment where the Gateway is reaching out to an Oracle database, so these may also be items you want to review too: https://docops.ca.com/ca-api-gateway/9-1/en/reference/troubleshoot#Troubleshoot-Problem:JDBCconnectionfailures
I hope the above helps.