Client Management Suite

 View Only

Upgrade Experience: ITMS 7.5SP1 to 8.0 

Jan 26, 2017 04:49 AM

We just recently here executed a major upgrade from ITMS7.5SP1 to ITMS 8.0. The upgrade was ultimately a success, and both the major and minor issues we experienced were resolved in the 48hrs that followed the go-live.

I'm sharing our experience here to give folks an indication of the planning and testing which many of us undertake when embarking on major upgrades like this. The major issue we hit I also thought was worth disseminating, in the hope that others considering this rollout can avoid it.

If you just want the main upgrade gotcha we suffered, and aren't terribly interesting in the planning side, then here it is,

If you are to embark on an upgrade from 7.5SP1, consider changing the policy on your AppID service account so that it does not lock out on multiple failed authentication attempts. We saw a small percentage of agents go rogue and authenticate with incorrect credentials to the SMP during the upgrade process. This eventually locked out the service account and brought down the entire Infrastructure.

The powershell script in this article can be executed on your Domain Controllers to reveal bad agent behaviour.

 

What follows now is just a discussion of our upgrade plan, it's execution, and the issues that followed.

Before I do that, it's pertinent to ask why are we are upgrading to ITMS8. What does it offer us over 7.5SP1? Well the key reasons for us were as follows:

  • Windows 10 client support
  • User searching in the console (very nice)
  • Agent health data presented through the console icons, and in detail in a flipbook (should help to our IT Techs directly)
  • Improved software reporting
  • Platform scalabilty improvements and more current SQL Server support (SQL Server 2012 SP2 and SQL Server 2008R2 SP3)

As you can see our first item here was Windows 10. With our Windows 10 rollouts being just around the corner, we just wanted to make sure we were prepared for that very busy Windows 10 future.

 

1 . Our Environment

Our central ITMS installation is pretty typical - it's a small/medium business setup. We have,

  • 1 NS (serving 3000 clients)
  • 1 SQL Server
  • 1 Site Server (for task offloading)
  • 1 Cloud Gateway


When I plan upgrades, I generally split them into 3 phases:

  • Preparation
  • Upgrade
  • Wash-up/Remediation

I'll talk about each of these here so you can get an idea of our overall experience.

Note: In our environment we have two production ITMS installations - one serving 30 clients and another serving 3000 clients. This enables us to execute these 3 phases on the small client base first, and then on the larger client base. 

2. Preparation (~1 month)

This is where most of your effort in an upgrade should go if you want to minimise the upgrade time and the wash-up. It's a pain to do, but, seriously, it's worth doing. Here's what we did here,

  1. Build a virgin virtual server and install our target version of ITMS (v8)
  2. Check basic functionality of the server, and see how stable it is (lots of log checking)
  3. Be prepared to raise Symantec Support cases
  4. Check target version release notes - are there known issues relevant to your environment?

The objective here is to confirm that the target version works with your current processes in principle.

If the testing on the virgin box goes well, then build an ITMS server of your current version (in our case, ITMS 7.5SP1). Configure your core policies to mirror your production box, and then upgrade that. Document what changes, notably in your agent upgrade policies and targets. 

In the weeks before the upgrade also,

  1. Spend more time attending your live SMP.
    Make sure that you are happy with the state of the event logs, and are happy with the current functioning of the client estate. You want to upgrade the SMP when it's in prime condition.
     
  2. Gather Testing Documentation
    Gather "Acceptance Testing" documentation from those that use the console for their day-to-day tasks. Agree what items are critical and should trigger a back-out if they don't work in the post-upgrade scenario.
     
  3. Build the Upgrade Checklist
    On the day of the upgrade, you don't want to be 'winging it' too much right? Build a checklist, and think about backout plans for each step that introduces a change. This is laborious, but wil serve you well on upgrade day.
     
  4. Schedule Upgrade for Production Server
    Let people know you are planning the upgrade and that there will be downtime. Let folks know that there is a back-out plan. Let them know what the changes will be. Let them know that it will take a few days for the client upgrade process to rollout (we aimed for 90% coverage in 3 days). If you can, plan the upgrade over a weeked (we thought 2 days we thought would just about do it).

 

3. Upgrade (~2 days)

On the morning of the upgrade, confirm the sanity of your checklist one last time. Check the release notes for the version you are upgrading to again (they could have been updated). If all looks good, then continue.
 

3.1 Pre-upgrade Prep

  1. Firewall off the SMP (this helps put the server in a quiet state ready for snapshotting)
  2. Check Event Queues, Windows logs, SMP logs (make sure all is well before the upgrade)
  3. Confirm disk space on SMP and SQL Server (ours is an off-box SQL Server)
  4. Make sure the infrastructure is quiet and then ensure file backups on all servers are current
  5. Reboot the SMP (just to make sure we have a clean slate in terms of the OS)
  6. Snapshot all the Altiris virtual machines. At the very least, we snapshot the SMP and SQL Server

 

3.2 Execute Upgrade

  1. Install SMP8 Upgrade prerequisites on SMP (JRE 8 in our case)
  2. Begin preparing the download of SMP8 in the Symantec Installation Manager (SIM) (~20 mins)
  3. Install SMP8  (~90 mins to install, ~90 mins to configure)
  4. Reboot SMP, Check logs
  5. Install HF5 through SIM (~90 mins)
  6. Apply "Agent Health Reporting" Power Management Fix (TECH234452)
  7. Apply custom fix (similar to above) for PcAnywhere
  8. Check and configure Agent Upgrade Policies (we noticed that many of these will be turned off and/or reset. Go through each plugin and confirm that the policies and targets are as you'd expect)
  9. Check and confirm Cloud Agent settings correct (noticed https redirection was reset to http, so fixed that)
  10. Check logs
  11. Execute relevant portions of "Acceptance Testing" plans logging into the Console with the appropriate group rights
  12. We found at this point that "IT Anaytics" was broken. As this had already been flagged as an acceptable temporary casualty of the upgrade, we just logged this and moved on. 

 

3.3 Site Server Upgrade

  1. Install Site Server Upgrade pre-requisites (.NET Framework 4.5.1)
  2. Enable Site Server agent upgrade policy and check that the site server exists in the policy target
  3. Enable Firewall rule to enable site server access to SMP
  4. Confirm Site Server upgrade

 

3.4 Single Client Testing

  1. Enable Firewall rule to enable single-test client access
  2. Observe client upgrade process
  3. Confirm basic plugin functionality
  4. Execute relevant portions of "Acceptance Testing" plans logging into the Console with the appropriate group rights

 

3.5 Upgrade Cloud Gateway

  1. Enable Firewall on CEM gateway to click all clients
  2. Install Server Prerequisites (.NET 4.5.1 already installed)
  3. Upgrade CEM Internet Gateway Package
  4. Enable Firewall rule to enable single-client client access (whatismyip.com is your friend)
  5. Observe client upgrade process
  6. Confirm basic plugin functionality
  7. Confirm cloud agent switching to and from Cloud mode (using VPN client)
  8. Execute relevant portions of "Acceptance Testing" plans logging into the Console with the appropriate group rights

 

3.6 Multiple Client Testing

  1. Enable Firewall rule to enable multiple-test client access
  2. Observe client upgrade process
  3. Confirm basic plugin functionality
  4. Check logs
  5. Confirm again relevant portions of "Acceptance Testing" plans logging into the Console with the appropriate group rights

 

3.7 Go-Live

  1. Commit Virtual Machine snapshots
  2. Enable Firewall Rule for All Clients on SMP
  3. Enable Firewall Rule for All Client on Cloud Gateway
  4. Monitor logs

 

And, for us, that was it. I went to bed around midnight Sunday 29th with 300 client machines upgrading nicely. All seemed good.

 

4. Wash-Up/Remdediation

I began my server checks at 7:30am Monday morning. My expectation was to find a small, niggly issues, and I had ear-marked the following three days to track these down to remediate. However, when I logged into my work machines, I found the SMP was down. Totally down.

 

4.1 Agent Upgrade DDoS (Major Issue)

So, after allowing myself a good 10 seconds to panic, it was time to track down what caused this massive systems failure. After all, the system was health checked just a few hours ago as being pretty darn stable.

A quick diagnostics revealed that the App ID service account was locked out. Unlocking the account had no effect, as it was just locked out again moments later. So what was locking it? Looking at our domain controller logs, we could see that clients were failing to authenticate with the AppID credential and this was locking out the account. This was however happening too fast for a manual re-enabling of the account to have any effect. 

Looking at the clients themselves, it turned out that some had gone rogue during the upgrade process. They were sending corrupted authentication requests to the SMP. These requests were failing their authentication and resulting in a lockout of the service account. After quick emergency discussions, we decided that the simplest action at this point was to temporarily disable account lockouts.

We then cobbled together a powershell script to identify clients which were failing their auth attempts on the App ID and then raised a case with Symantec Support. Here is the powershell we came up with to reveal the machine names which were hitting the domain controller with failed authentications,

Get-WinEvent -LogName Security |

    where { $_.providername -eq 'Microsoft-Windows-Security-Auditing' -and

    $_.level -eq 0 -and
    $_.ID -eq 4776 -and

    $_.message -match 'CHANGE_ME_TO_YOUR_APP_ID_ACCOUNT'} | 

    Select-Object TimeCreated, Message |
 
    select-string "0xC000006A" | Foreach-Object {$_ -Replace "`r`n",""} | 
    Select-String -pattern 'TimeCreated=(\S+\s\S+);.*Logon Account:\s+(\S+)Source Workstation:\s+(\S+)+Error Code:' | 
    % {" $($_.matches.groups[1]) $($_.matches.groups[2]) $($_.matches.groups[3])"} | 
    out-file .\Logging.txt

 

In the above script, you'll need to change the string "CHANGE_ME_TO_YOUR_APP_ID_ACCOUNT" to your App ID.

Temporarily changing the user account lockout policy allowed the agent upgrades to struggle through. They then began to complete the process, but consequently we'd lost us a day in our plug-in rollout timeline.

If I had to guess at what was happening here... my money would bet that there is a bug with the old agent upgrade process. The 7.5 SP1 agent is quite old, and it was only during the upgrade that this app ID account lockout  issue occurred. Once the upgrade of the agent had completed, the agents behaved fine. So, likely not an issue with the new agent, just a bug with the very old one we had, which only manifested when we triggered that code fork in the upgrade process.

Update 7th Feb '17: Symantec Support have seen other instances of this lockout issue. They suspected the agents exhibiting this behaviour were extracting the App ID account password from the SMA secure password storage using a legacy (incorrect) API.  This resulted in the password exrtracted being corrupted and hence the App ID account lockout.
 

4.2 Real-Time Manager (Minor Issue)

This was a minor issue, but we found that Real-Time manager didn't seem to work consistently when the console was accessed over HTTP: Changing the URL to HTTPS resolved this. 

4.3 PCAnywhere (Minor Issue)

We had a glitch with PCAnywhere in that the policy targets seemed to undergo a reset in the upgrade. As a result, we were targeting the wrong machines with latest package upgrade. This had the effect of pushing the host-only package to machines which had the full package installed, which removed the PCAnywhere QuickConnect client.

Resolved by fixing the targets and pushing the full package again.

Our testing plan had omitted the step to check the targets for two PCAnywhere packages, so lesson learned here to be more thorough.

I should point out here that a known, documented repercussion of the upgrade is that you cannot configure PCAnywhere anymore in the console. The console object to perform this is corrupted by the installation. As PCAnywhere is EOL, Symantec will not be fixing this.

4.4 Agent Health Reporting (Minor Issue)

Whilst agent health reporting is a really good feature in ITMS8, there is one aspect of it which is niggly. Once it's there, people want to see ticks - not question marks or crosses. So, when we did the upgrade, we applied two little T-SQL patches to make the Power-Plan plugin and PC-Anywhere plugs report as healthy. The patch for the Power Plan plugin health reporting we found in TECH234452, and, to fix PCAnywhere, we made an equivalent patch,

USE [Symantec_CMDB]
GO

INSERT INTO [dbo].[SmpVersions]
           ([ProductGuid]
           ,[PluginGuid]
           ,[Type]
           ,[Version]
           ,[Major]
           ,[Minor]
           ,[Build]
           ,[Revision])
     VALUES
           ('C432B710-F971-11A2-8643-20105BF409AF' -- Guid from vProduct
           ,'452F2BCF-7261-4AA6-9228-387676F3A183' -- Agent Class Guid from Inv_AeX_AC_Client_Agent
           ,0
           ,'12.6.8556.0'
           ,12
           ,6
           ,8556
           ,-1)
GO

 

One fly in the ointment however was the recently EOL'd Software Virtualization (Workspace Virtualization and Streaming). We found that when clients were reporting back with higher versions than the SMP was configured to roll out, they were "Unhealthy" as the client and server versions didn't match.

virtualisation.png

We can't see a way around this, without ramping up the SWS client version on the server. It would be good to have the option in the future of simply not reporting in "Agent health" the status of certain plugins.

But not a big issue, so just threw this on my "contemplation" pile

4.5 Clearing the UserSettings

When I originally wrote this on Symantec Connect I forgot about this one. The Console user settings. The usersettings table contains the SID of each user that logs into the console, their console settings, and their last known console working location. When you perform major upgrades on the SMP, you can find that these saved console settings can end up being incompatible with the new version.

This can happen for example if the upgrade has removed a console object. In this case, if the defunct object exists in a user's usersettings table entry when they open the console browser they will be pushed into a seemingly corrupt location.

As we've had this now a couple of times with our in-place upgrades, our fix is to simply run the following SQL,

DELETE * FROM usersettings

Every user now logging into the console will now experience it as if it's their first time. A clean slate. So no chance of being pushed to an object that no longer exists.
 

5. Summary

On reflection, I view the upgrade rather positively. We didn't have to roll back, and I felt the preparation was worth it in making the upgrade and testing process move along swiftly. It also minimized the surprises in the upgrade as we'd already done a few of those in advance of the main rollout.

The major issue, the App ID lockout, impacted us for 90 minutes during business hours (although it felt a lot longer from my point of view). This downtime was small simply because we were in the fortunate position of being able to quickly push though the required account lockout changes. Others might not be in such a fortunate situation, and this is the main motivation for me writing this article.

All in all, it  took 72 hours for 85% of the agent upgrades to complete across our estate. We anticiated issues with inventory, software delivery and remote control to occur in this time window. We'd factored  3 days of infrastructure hand-holding following the upgrade in our planning, and the minor issues encountered were in the bulk taken care of in this window too.

6. Useful Links

 

Statistics
0 Favorited
0 Views
0 Files
0 Shares
0 Downloads

Tags and Keywords

Comments

Jul 13, 2017 08:50 PM

Hi Ian,

Yes...I tried just my AppID as well as domain_name\AppID.  Am I NOT supposed to leave the single quotes around my AppID?  Or perhaps I'm supposed to run this PS script from a certain location on the network?

Clint

Jul 12, 2017 03:32 AM

Clint -did you replace the text 'CHANGE_ME_TO_YOUR_APP_ID_ACCOUNT' with your App_ID?

Jul 11, 2017 05:12 PM

In the "4.1 Agent Upgrade DDoS (Major Issue)" section, anyone have problems running the Powershell script?  Actually, it runs for me under domain admin credentials although the output file is empty regardless of any locked account I put in.  I've tried with and without my domain name first followed by a backslash.  Any ideas what I'm doing wrong?

Jun 20, 2017 04:21 AM

Hi Clint,

We haven't gone through am AppID account reset in the move from 7.5SP1 to 8 (or to 8 to 8.1 either). The way I tend to do credential changes in distributed environments is create a NEW credential. This allows the new credential to be rolled out without compromising the machines that are using the old one while the change percolates.

This then allows you to schedule the retirement of the old credenential according to a sensible schedule. For us that will be 3 months as we dump machines anyway from the database once they've been out of contact for more than 12 weeks.

Kind Regards,
Ian./

Jun 06, 2017 03:18 PM

Hi Ian,

I was curious if you've gone through a password reset yet for your AppID account; post SMP upgrade?  My SMP 7.5 to 8.0 upgrade occurred in February and it's been quiet until last week.  I'm sure there can always be an old client or two that is fired up and locks AppID but heck of a coincidence this happened the day I decided to reset my AppID p/w.

Anyhow, I'm now down to about 6-7 lockouts per hour so not too bad compared to last week Friday when it was 16.  Also wondering if anyone's tried resetting your AppID p/w in the past to the same thing and noticed whether these lockouts occurred thereafter or not in your environment.

Clint

Jun 06, 2017 06:00 AM

Hi Clint,

You can ask your AD team to disable the account lockouts on specific accounts. We did this and now get notified of bad password attempts which is much more useful. This also helps us trace any client which is doing this. To date in our environment, it always seems to be clients which are upgrading from 7.5SP1.

We generally see ~70 bad password attempts in just two seconds when an old agent connects with its screwed up password decryption. 

We've seen our bad password attempts drop significantly as our upgrade drew to a close with most of our agents upgraded nicely. We have some old images which occasionally get deployed, and we see again this sporadically... but as those get refreshed we hope this will be an issue of the past.

 

Jun 02, 2017 03:42 PM

I haven't moved on to SMP 8.1 yet so still on v8.0 where my AppID account lockouts have returned this week!  I'm currently seeing about 16 lockouts per hour where I have again enabled my scheduled task to unlock the account every minute.  As for any changes, I started getting warnings that the password I'm using for the Application Identity account was about to expire but instead of changing the password, I opted to reset it to the exact same one.  Even though the p/w is textually identical, has anyone seen this behavior before where your AppID account keeps getting locked out in this situation?

I'm not aware of a bunch of really old workstations being fired up and going through the 7.5 to 8.0 agent upgrade which can lockout AppID but even if they were being turned on now, I doubt there'd be that many to cause lockouts to occur within a minute's time (true/false?).  I come into work pretty early compared to most where the account lockouts generally start about 8 a.m. and stop in the late afternoon or early evening so sure seems as though PCs that still have the 7.x agent are being turned on.

For the lockouts themselves, domain account tools show the source is our proxy server which is what happened the last time.  Also like before, instead of displaying a workstation name for the source (e.g. computer_name), it instead has a UNC path to our proxy server (e.g. \\proxy_server_name).  Again, I believe this all started after I reset the AppID password to what it was before so I don't get nagged about it expiring for another 90 days in our environment.  I'd appreciate any feedback on this very frustrating issue.  Thanks in advance!

Feb 21, 2017 10:19 AM

Thanks for that Ian. We'll be going from 7.6 to 8 at some stage "real soon now" so your tips could well be helpful...

Martin

Feb 17, 2017 08:06 PM

I recently upgraded from SMP 7.5 to 8.0 (a nightmare for me with the account lockouts!) and really wished I had come across this article sooner.  Along the way I ended up changing my AppID password for various reasons which naturally had more side effects.  Once I managed to fix all of the credentials and get my IT Analytics reports working again, I noticed that most of them are getting chopped off on the right side in my SMC.  Exporting to, say, Excel gets you all of the data so this appears to be a display issue in the console.  Anyone else see this problem as well?

Feb 15, 2017 09:05 AM

Great article Ian.  We are about to go to 8.0 from 7.6 so I'll be reviewing this article closely.  Thanks! :)

Feb 14, 2017 05:12 AM

Nice report Ian!!! The agent health suggestion is a good one so I will add it to the backlog.

Related Entries and Links

No Related Resource entered.