We have a customer who is still on gateway version 7.1.1 and they have a recurring problem where one of their clusters fails to do a normal reboot after a patch is applied. Without patching a reboot works fine, but after applying a patch the shutdown procedure seems to hang. Their other clusters are working fine.
It's not high priority for them and they hope the issue will be resolved once they can finally upgrade to a recent version, but I though I would just ask around if someone has seen similar issues and if they have found fix for it. I have attached a screenshot of where the shutdown hangs. They have waited for about 30 minutes until finally doing a guest restart from VMWare. After that the gateway runs normally. They couldn't find anything in /var/log/messages.
One of the first things I would ask would be to disable Splunk and see if the problem alleviates. It looks like the shutdown is hanging on Splunk--so perhaps the Platform Update caused a problem with a dependency with Splunk.
I was thinking something similar. I find it strange though that is does end with the Shutting down... line, which would suggest that it actually did send the shutdown signal at the end of the shutdown process. But it is also strange that the shutdown process seems so short. Somehow the system seems to be in a slightly different state when rebooting after a patch was applied, since regular reboots proceed fine. I'll see if I can set up a remote session with the customer the next time they install a patch so I can collect evidence first hand. But if there are any other suggestions in the mean time, they are welcome of course
That shutdown sequence is way too short for a Gateway. I suspect that the "Shutting down" message is coming from Splunk and not from init. Check the init scripts for Splunk and see if you can find that thread or try stopping the Splunk forwarder while the operating system is still running. You'll either see that message (which confirms the shutdown is hanging on Splunk) or you won't.
That's a good point, I didn't think of that. I was hoping to see something in the message file, but they don't have a message file from the time of this reboot anymore. But I'll see if we can get them to stop and start Splunk to verify if they get that message. That's probably the easiest way to check this at this time.
Thanks for the suggestion Eric