We are running AE V12.3.0 on Windows servers with SQLServer on the same servers. We patch the OS on these servers as often as every month which typically requires a reboot, so we schedule a weekend outage to pause operations and stop the client for this.
However after the AE comes back up, sometimes some of the agents will not connect, even though they show as running from ServiceManagerDialog. The first time this happened I contacted support and they instructed me to delete an orphaned communication row from the mqsrv table with this instruction;
delete from mqsrv where mqsrv_name=<agentname>
We have confirmed that deleting this orphaned row allows the agent to connect. Support also recommended we should do a better job of shutting down the AE prior to maintenance. But this suggestion ignores the fact this problem would be unavoidable if one suffered an accidental reboot. We've also had it bite us during DR testing when we restore the server from an active point in time.
WHY HAS THIS CHANGED?This was never a problem for us under V11. So in our view, V12 broke something.
REAL BUSINESS IMPACTFrom our view, V11 was more resilient regarding accidental reboots than V12. (Sadly, we've had 2 accidental reboots in the last 5 years when our machine room lost power, so this is not a hypothetical for us.)
WHAT ABOUT YOUR STORY?I'm curious to know if anyone else has encountered this as a new-to-V12 issue?
I'm not ruling out that it could be unique to how we are installed.... Not everyone uses Windows, not everyone uses SQLServer, and not everyone installs both of them onto the same server.
------------------------------
Pete
------------------------------