How many web servers are running on the production env?
All the agents are getting the same issue in prod?
Is the load same across all the agents?
Did you compare the semaphore settings across all the envs?
If you are not able do the changes on production. You could try to replicate the issue in lower env with the load testing ( match the load with your production). Once you are able to replicate, please try below option to find whether your system has enough memory or not.
Set Session and resource cache size parameters to 0:
The idea behind decreasing those values to 0 is to check whether your system has enough memory or not. There might be a loss of performance while those values are at 0, because there will be no caching. However if this issue disappears, then it means that you should add some memory to the system, or reduce the cache size to a lower level if adding memory isn't possible