I am really new to Autosys and trying to figure something out that I am sure it most likely easy once you understand it. Our Autosys environment is setup with a primary and a shadow scheduler. If the primary dies, the shadow of course takes over. Looking at an agents agentparm.txt and the communication.manageraddress configuration, I only see the primary listed and not the shadow.
Is this normal?
Should both be in there?
How else would the agent know to hit the shadow instead of the primary?
## Added by Manager#communication.manageraddress_1=autosit1.********.comcommunication.managerid_1=AES_SCHcommunication.managerport_1=7507communication.socket_1=plain
It seems like even if you have the scheduler setup for HA with a primary and a shadow, it's useless unless you configure the agent for both. Am I correct?
Not normal to not have Shadow info after the failover. Shadow communicates the manager change to the agent after taking over. The agent receiver log shows CONTROL MGR AFMs from Shadow scheduler. One would normally see the following in the agentparm.txt after the failover.
Is the shadow able to execute jobs on these agents?
Thank you for the great information! That does help. I think I should have been more clean on what I am asking though :-) Because our agents are configured for only the primary, I do not understand how the agent knows to start talking to the shadow. Does Autosys actually modify the agents agentparm.txt file on a failover by changing the host listed in the agentparm.txt to the host of the shadow?
Here is a typical AFM string you can see in the receiver.log file of the System Agent, where xxxxx is the name of the primary or shadow machine, depending on which one is taking the control over the system agent.
06/28/2017 03:04:28.878 EDT-0400 2 TCP/IP Controller Plugin.Receiver pool thread <Regular:1>.CybReceiverSession.accept[:276] - Message received: 20170628 03042886+0400 WA_AGENT ORA_APP_********* 8_27724_4136158928_1498633426_796968_CM/WAAE_WF0.1/MAIN CONTROL MGRADDR Port(7500) Address(xxxxxxx.ca.com) SocketFactoryId(plain) User(ORA_APP_******)
Here ORA is the instance name and ORA_APP means that the request comes from the Application Server of instance ORA
Therefore if the communication.manageraddress entry of the agentparm.txt file has not been updated by your shadow scheduler, I would suggest to check if you see such AFM in the receiver.log file and look for any error in the application server log.
A typical error could be that the shadow scheduler cannot resolve the node_name of the machine definition of the system agent or the AFM port is blocked by a firewall ( default receiving port in the agentparm.txt file is 7520 )
Check all of this with:
- ping <agent>
- telnet <agent> <7520>
That's true and this is what I explained above. May be I was not clear enough but when the shadow scheduler takes over it sends all the agents this above CONTROL MGRADDR AFM and upon reception of this AFM, the System Agent automatically updates its agentparm.txt file with the details of the new scheduler ( which is now the shadow scheduler)
Later on, when the primary scheduler is back to normal, same handshake will be done again and the System Agent will update the agentparm.txt file with the new scheduler name
Let me know if it answers your question
Have a nice day
You were most likely clear. I am just really new to it all and did not fully understand but this clears it up! Thank you for the information. I really appreciate it.