Eliza
I was involved in a similar project (11.3.5 -> 11.3.6sp6 with about 200 agents that had grown organically, with OS from every flavour)
As Mike said we planned to have old scheduler shut down, then start new with the same instance ID (so we didn't have to change a lot of the back-end issue management systems.
In test environment we had 11.3.5 which we scripted to shutdown, rename, service disable (everything we could thing of to stop an accidental restart). in prod we created a new scheduler/wcc's 11.3.6sp6 infrastructure with knowledge of only a couple of agents, and tested functionality with existing scripts and processes.
We then go approval for test in prod at low job time (remember it being like 05:00 - 06:00 on a tuesday). In the new prod we defined 2 jobs for every type of OS we knew about (1 to get the OS + patches, and 1 to get workload agent version). So at test time we did:
- export agent definitions from old 11.3.5 prod
- shutdown old 11.3.5 prod using scripts developed in test
- loaded agent definitions into new 11.3.6sp6 prod and ran our 2 jobs on each node
- remove all agent definitions out of new 11.3.6sp6 prod and shut it down
- restarted old 11.3.5 prod with scripts developed in test
After about a month of this on each Tuesday, we had a picture of the OS and agent version (which we pushed into a web site) and all the firewalls etc sorted out so the jobs ran (don't remember if we tested port 7507 as well)
Then we defined D-day and max roll-back time (2 days, but evaluated every 12 hours), so if we were still running the new after 2 days we would not go back (lose of history would start to be a problem. So at D-day we did
- export agent definitions from old 11.3.5 prod
- export job definitions + global variables
- shutdown old 11.3.5 prod using scripts developed in test
- loaded agent definitions into new 11.3.6sp6 prod and ran our 2 jobs on each node
- loaded prod jobs and global variables into new prod (and waited for the excitement to start.
We nearly rolled back at the 2nd 12 hour mark but other than that we never considered going back.
I have left the team now, but they still use the 2 jobs to build the status portal, and are moving agents to the latest supported version on each OS, I believe it is done now but that part took 18months. Also they are running sp7 (probably sp8 now) so that they do not get into the nightmare of a huge upgrade step.
Good luck it was a stressful ride on D-day but the testing and planning really paid off.
Original Message:
Sent: 07-01-2019 04:44 AM
From: Eliza Narvadez
Subject: WAAE 11.3.5 > 11.3.6 migration question
All,
I have the same problem. We currently have R11.3.5 autosys managers. We created r11.3.6 SP8 and now have more than 500 agents with different agents installed as the OS
are a variety of Windows OS from NT4 , Win2000, Win2003, Win2008R2, Win2012R2, Win 2016, Linux and HP-UX from the oldest to newest. How will I point all these agents to the new R11.3.6 SP8 in one day. One time bigtime and revert back to 11.3.5 if a problem is encountered as a rollback procedure.
Appreciate your tips and advice. TIA.
Original Message------
Hi James,
This comes up every now and again. While I don't have a quick and easy answer, I have some suggestions that may help.
Do you need to define the agents to the new instance prior to the cutover? Would being able to telnet or nc to the agents on the agent port be enough to satisfy connectivity on the input side? The same should be done from the agent side, which depending on the OS may have a different solution on how to test. This is the safest as no agentparm.txt would be updated, but the most manual.
Another possible way is to insert the agents into the new instance (SCHEDULER MUST BE DOWN) and then do an autoping of the agents. The app server managerid is not persisted and has the hostname included, so prevents the agentparm from getting updated. I would switch the config file to have the app server use the scheduler port (7507) and reurn the test to verify that the other port is open as well. That should prevent the agentparm files from getting updated and not be a lot of effort to complete. The danger is if the scheduler starts and starts contacting agents. You may want to rename the scheduler binary after shutting it down to prevent even an accidental start if the machine was restarted for example.
Regards,
Mike