Every now and again, our support team receives a flood of Nimsoft Tunnel Disconnect tickets which we thought were caused by a particular client that was flooding the tunnel connection with data, therefore, causing it to drop out / restart. We thought this because the tunnel disconnects were happening on one particular hub.
Here recently, we have been getting Hub Tunnel Disconnect tickets from all over (multiple hubs). I just happened to catch the log file in time before they rolled off (log level 3), and found the following before the Hub Tunnel restarted (sensitive data removed):
Feb 23 13:24:43:879  hub: TSESS-P-823-0 sent error message responseFeb 23 13:24:45:469  hub: nimSession - failed to connect session to x.x.x.x:48002, error code 110Feb 23 13:24:45:469  hub: TSESS-P-724-8  could not connect locally to /Gladiator/Hub09/nimhub09/hub (x.x.x.x/48002)Feb 23 13:24:45:470  hub: TSESS-P-724-8 sent error message responseFeb 23 13:24:45:927  hub: nimSession - failed to connect session to x.x.x.x:48002, error code 110Feb 23 13:24:45:927  hub: TSESS-P-694-10  could not connect locally to /Gladiator/Hub09/nimhub09/hub (x.x.x.x/48002)Feb 23 13:24:45:927  hub: TSESS-P-694-10 sent error message responseFeb 23 13:24:46:635  hub: EXIT HUB: hub shutdownFeb 23 13:24:46:636  hub: Waiting for tunnel to terminate ...Feb 23 13:24:46:882  hub: nimSession - failed to connect session to x.x.x.x:48002, error code 110Feb 23 13:24:46:882  hub: TSESS-P-773-3  could not connect locally to /Gladiator/Hub09/nimhub09/hub (x.x.x.x/48002)Feb 23 13:24:46:882  hub: TSESS-P-773-3 sent error message responseFeb 23 13:24:47:498  hub: nimSession - failed to connect session to x.x.x.x:48002, error code 110Feb 23 13:24:47:498  hub: TSESS-P-762-4 sent error message responseFeb 23 13:24:49:751  hub: ----------------------------------------------------------------------------------------------------Feb 23 13:24:49:751  hub: ----- Hub 7.80-jh2 [Build 7.80, Nov 9 2015] initializing -----Feb 23 13:24:49:752  hub: Tunnel has started in thread [ 47929318897984 ]Feb 23 13:24:49:788  hub: ----- HUB started -----
As you can see, error code 110 was very prevalent before the hub tunnel dropped out. Any info to the meaning of error 110 would be greatly appreciated.
-We have 14 hub servers that are linux boxes (centos)
-Tunnel Encryption set to Medium
-Hub tunnel SSL Settings Mode set to Normal
-UIM ver 7.5 ---- Will be upgrading this coming March to 8.47
-Hub version 7.80
An error code 110 is a connection refused error. Please make sure the following 1) firewall is disabled
2) no AV scanning/blocking
Furthermore, for this scenario and any other scenario where you need to save off/dump specific log files when an alarm occurs, please refer to this Article I recently updated and tested:
How to save off (dump) log files when an alarm occurs
So you can catch it red-handed. In your situation I would set the loglevel to 5 and logsize to 40000 for the hub, controller and data_engine to get a better idea of what's happening. Use a regex for the tunnel disconnect alarm - its explained in the KB Article.
This Article contains a proven method of capturing essential log files used for analyzing customer problems that are very difficult to catch when they occur. This method includes a nas LUA script to automatically save off (dump) specified log files when an alarm occurs (based on an AO profile that filters on the given alarm message or any other filter you choose to define in the AO profile.) This script may be edited to suit your particular needs for capturing the appropriate log files when a problem occurs.
It is especially useful in situations where the problem is infrequent or inconsistent, occurs randomly or it exhibits a pattern of behavior possibly occurring at a given time on the weekend or during evening hours. Read the article for more excruciating details...Please note that you have to adjust the file paths for Linux.
Many thanks to Jim Christensen for the core script.
Thank you for your suggestions! These hubs are well established, and most definitely do not have a FW turned on, nor is there any AV scanning.
Absolutely love the tutorial on "How to save off (dump) log files when an alarm occurs". I am definitely going to implement this. Thanks for the share.
I have also opened a ticket with CA Support and they recommended:
If "ignore_ip = no" then set to "ignore_ip = yes"
If "check_cn = yes" then set to "check_cn = no"
These changes are still recommended latest hub versions
After a bit of digging, I located the following article which explains how to properly setup tunnels. I confirmed that our server side hub.cfg and security.cfg files both have the "ignore_ip" set to yes, and our client side hubs also have "ignore check_cn" set to no.
Tunnel setup link: https://www.ca.com/us/services-support/ca-support/ca-support-online/knowledge-base-articles.tec000002642.html
The link to the script when saves log files when errors are detected is dead, can anyone post the script here? It sounds great.
I've executed the pu command and outputted the contents of the PU command to list subs:
i = 1 probename = "nexec" probeaddr1 = "/DOMAIN/UIMHUB/UIMHUB/nexec" args = pds.create() pds.putString(args,"profile","NEXEC_PROFILE_NAME_TO_EXECUTE")
while i <= 4 do reply,rc = nimbus.request(probeaddr1,"run_profile",args) if rc == NIME_OK then break end i = i + 1 sleep(15000) end pds.delete(args)
Unix command contained in a shell script sitting in /opt/nimsoft/bin:
./pu -u admin_username -p admin_password /DOMAIN/HUB/HUB/hub list_subscribers > output.txt
Also need to set up an AO rule to execute the script on receipt of the alarm you want to action on.
Hope it helps someone!
Sure, here is the url.
I would also highly recommend that you open a support case for this issue.
Have you seen this article on hub and tunnel setting recommendations?
In addition to Steve's suggestions, I recommend looking through that article and applying those settings.
Those are a great help in many environments - particularly in previous versions.
Thank you for the link sir! Being that we are on a custom hub (7.80-jh2), which was to address hub tunnel disconnects, and worked beautifully for nearly a year...making changes to the hub config first needs to go through some red tape.
Your suggestions per the link have been discussed with Rito and Jason (CA Support). Being that it is a Friday, we will likely not implement the changes today being that we are on the eve of a weekend. Once final approval is received, I will get this config's implemented...likely Monday.
Email chain with CA Support that may be useful for some:
I asked Rito the following:
Out of those 6 hub cfg settings within " tec000004536", we have 5 of them in our hubs. Here are the current settings within our hubs:
postroute_interval = 30
postroute_reply_timeout = 60
postroute_passive_timeout = 30
hub_request_timeout = 30
tunnel_hang_timeout = 300
Would you be able to explain why we would need to switch those settings and how each could possibly help?
And if we need to add back in "tunnel_hang_retries = 3", why was it removed and why should we add it back?
These timeouts collectively control the hub's behavior when it encounters slow responses from queues and other hubs. Specifically the postroute_* timeouts have to do with queues. hub_request_timeout has to do with other hub-to-hub requests. Increasing these will help in a busy environment as the hubs will be allowed more time to respond to each other.
tunnel_hang_retries is not present by default, and when not present, defaults to 1. This controls how many times a tunnel connection will be re-tried when unresponsive before the hub simply restarts itself entirely to attempt to self-heal. Setting this higher will allow the hub to internally retry the connection a couple more times before restarting.
What i don't get is, the reference to port 48002, shouldn't that be 48003?
Feb 23 13:24:45:469  hub: nimSession - failed to connect session to x.x.x.x:48002, error code 110
I found the following in this link which may shed some light on why 48002 is being used: Installation Guide Release 7.1
In the left pane, at the bottom under 'More information', click the "Required Ports for SSL Tunnels
Snippet from link: