DX Unified Infrastructure Management

 View Only
Expand all | Collapse all

8.4 -> 8.5.1: Any "gotchas"?

David Michel

David MichelJun 13, 2017 08:56 AM

  • 1.  8.4 -> 8.5.1: Any "gotchas"?

    Posted Mar 30, 2017 07:31 AM

    We are beginning to plan our upgrade from 8.4 to 8.5.1 (Linux hubs and Oracle DB) and are interested to see if there gotchas or things we need to look out for that are not in the documentation.

  • 2.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Broadcom Employee
    Posted Mar 30, 2017 08:44 AM

    Hi William!


    I'm not aware of any major common issues with the move up to 8.5.1 - seems like it's been pretty stable.

    Be sure to check out the Release Notes


    The big thing I stress with any upgrade is to take backups. Take FULL backups of, at least, the following:

    > full primary /Nimsoft/ directory

    > full UMP /Nimsoft/ directory

    > full database

    It's probably a good idea to include any other 'important' servers in your environment, like snmpcollector or HA robot/hubs, etc.

  • 3.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Mar 30, 2017 10:11 AM

    I am on Windows central hub and MS SQL as the DB and went 8.31->8.51.


    Issues I've run into:

    - rules in discovery_server change - hopefully for the better - Expect that if you use USM and haven't yet cleaned up from the 8.31 GA discovery_server, you will have an opportunity to have many previously working web pages show incomplete information as the device id relationships are reworked.

    - The upgrade does a lousy job checking the completion of the SQL upgrade scripts. Expect that several of these will fail silently. 

    - The new alarm_server/nas combo take more RAM. You may need to increase the JVM settings

    - The password portlet is gone. Need to replace pages that have  that with the account admin portlet. Make sure your ACLs are correct

    - You need to manually upgrade the REST interface if you use that

    - USM is much much much slower than in 8.31 - by two orders of magnitude in my case. 17 minutes to successfully load worst case of the successful loads - nothing shorter than 30 seconds or so. More often than not your login expires before the load is complete.

    - Wasp startup is slow. The official comment from support is that 15 minutes for startup - activate to first successful page load  - is expected and normal.

    - Make sure that policy_engine stays disabled. The upgrade will inactivate it on the central hub but since it is now replaced, you have to manually disable and remove it wherever it might be installed.

    - EMS is in this new release - appears to only support a single central hub. Bad news if you use HA or nas replication

    - The Wasp upgrade may fail to install the UMP root. Need to delete the webapps/ROOT directory and the the ROOT.war (could be wrong about the name) and redeploy to get it to create.

    - The listviewer portlet didn't deploy correctly on the first attempt - apparently the .war file didn't deploy and so the old version was left in place at the end of the upgrade.

    - Expect the UIM and UMP upgrades together to take roughly five hours - longer if you have to call support.... And you have to do both at the same time. There are several database changes that break UMP. If you have customers using the portal pages, they will experience an outage for the whole duration.


    So this is my short list - there are many more annoyances 



  • 4.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Mar 30, 2017 11:41 AM

    In order to compare the upgrade with what I might experience, what's the size of the environment that you upgraded (monitored devices, hubs, db size, etc.)?

  • 5.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Mar 30, 2017 02:17 PM

    7,000ish robots, 3,050 of which are hubs too.


    Nimsoft database is roughly one TB in size at the moment. 5,500 active alerts, 16mil rows in the alarm transaction table


    I use no discovery agents or snmp collector features.



  • 6.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Mar 30, 2017 04:02 PM

    Garin just wondering are you doing any QoS roll-up on the data_engine? We used to have the DB that big, over 1TB yet after we adjusted the roll up periods, were now down to < 200GB. Our #'s: 0 - 14 days (raw) 15 - 140 days hourly, 141 - 490 days daily roll up.

  • 7.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Mar 30, 2017 04:21 PM

    I looked into using the rollup but at least in older versions, nothing had access to the data in the hourly and daily tables so it wasn't easily usable. Instead I just threw RAM at the problem and have been reasonably OK with the speed of reporting against it. The testing I did trying to use the views to combine the various historic tables just weren't fast enough to be useful.


    I also enabled table partitioning. It didn't speed anything up from what I could tell but it did seem to stop the slowdowns that happen with growth. And the nightly data pruning maintenance was able to finish since it moves from deleting by date to just dropping the oldest partitions.


    Today I'm keeping 105 weeks of data for most things where that makes sense (disk usage, database usage, etc - things you would forecast hardware demand off of a steady trend). things like net_connect pings I keep for only a couple weeks. CPU and Memory I keep for 10 weeks. So far that seems to satisfy my customers pretty well.


    There's always that one person who demands the metric values between 10:00AM and 2:00PM on 2/14/2016 and won't leave you alone until it happens.



  • 8.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Mar 30, 2017 01:15 PM

    Wait what!!!

    "- The upgrade does a lousy job checking the completion of the SQL upgrade scripts. Expect that several of these will fail silently. "

    How did you notice this and how do you fix? 

  • 9.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Mar 30, 2017 02:22 PM

    In USM, the group counts were zero. There's a probe callback for the nis seerver to force the migration to happen. 


    SQL insert errors in nas log. Deploy older version then redeploy newer version. Seems that it needed a couple pops to get it right.


    USM had some behavioral issues. Redeploying the USM probes helped that.



  • 10.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Apr 04, 2017 01:46 PM

    "Deploy older version then redeploy newer version. Seems that it needed a couple pops to get it right."

    Just to be clear, if I see the SQL insert errors in nas log, redeploy the older version of the nis_server probe (in 8.4 its version 3.5.1) and then try deploying the newer version 8.5.1?

  • 11.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Apr 04, 2017 01:56 PM

    SQL errors in the NAS log: deploy older nas version, then redeploy newer nas version.


    count of zero systems in the group display of USM: on the nis_server, there is a call back called migrate_groups - run that.



  • 12.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Mar 30, 2017 02:46 PM

    I have now updated two production UIM environments with lots of hubs and tunnels and both have had issues when hub 7.90 is used. Tunnels pumping up and down all the time. Seems to have issues both in Win and Linux platforms, support confirmed the downgrade to latest 7.80 HF is needed.

    And also the UMP ROOT folder corruption was seen in one portal server.



  • 13.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Mar 30, 2017 02:53 PM

    Vaguely reassuring to know that I wasn't alone in some of my experiences.


    Based on my usage, I'd argue that the hub 7.72 release (CnIa24uJ@ftp.ca.com/UIM_Probe_Hotfixes/hub772.zip" rel="nofollow" target="_blank">ftp://UIMuser:CnIa24uJ@ftp.ca.com/UIM_Probe_Hotfixes/hub772.zip) is probably a better choice than the 7.80HF21 release. It depends though on which defects cause more pain.



  • 14.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Mar 30, 2017 03:15 PM

    So you'd say stay away from 7.90 hub version and continue using the 7.80HF22 now? We've been using HF21 this whole time. 

    Funny how the the HF22 says will be all fixed in 7.90.


  • 15.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Mar 30, 2017 03:22 PM

    ttahkapaa was there a specific defect for the 7.90 hub that they said to use 7.80HF22? Were planning on upgrading to 8.5.1 next week and I have a few 7.90 hubs but haven't seen any issues yet.

  • 16.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Mar 30, 2017 04:01 PM

    Well, the defect we saw was that the whole infrastructure became pretty much useless. Both envs had tunnelhubs that connected to all "customer hubs". The one updated today became xmas tree as soon as we updated those tunnel hubs, it was even a bit difficult the get them downgraded because the IM connections did last so short time up and running the we had to be pretty quick to be able to distribute the hub there. We still have the primary hub running 7.90 hub version and at least it looked to function that way pretty well.


    The other env had some issues with (customer) win hubs opening so much handles in OS that the whole OS died.


    Both envs have lots of hubs and tunnels. No issues were found in test envs with only one tunnels or so.


    One additional issue that was found today is was the robot_update v7.90 in two win servers, it failed to run the pre-install command, hmm "rename_library_files.bat" (can not remember the name correctly. When then renaming those three files manually everything works ok. Have not yet opened a case for this, possibly something strange with those servers.

  • 17.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Apr 03, 2017 07:08 AM

    Just got info from support that hub 7.91 should come out, possible already this week.

  • 18.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Apr 05, 2017 03:58 AM

    Okay, hub 7.91 is out, waiting now for info about real tunneled environments how that works before I will start updating those.

  • 19.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Apr 04, 2017 01:18 AM

    Kind of reassuring to know that I wasnt the only one with this issue. 7.9 caused queues to back up and the primary to become unresponsive - The OS was up but robot goes offline causing all secondary queues to back up. Restarting the robot gets things moving again. I have since downgraded to 7.80HF21 which has improved things, however I have been noticing that randomly the bulk size of the secondary queues changes from 100 down to 1 causing some queues to backup.

  • 20.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Broadcom Employee
    Posted Apr 04, 2017 05:23 AM


    HUB bulk size suddenly declined from 100 to 1, is a known defect.

    The problem is happening for GET queue which has default bulk size assigned (Greyed out - Not specifically declared).

    The workaround is to declare 100 value for bulk size in GET queue.



    Yu Ishitani

  • 21.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Apr 04, 2017 09:09 AM

    Hello Yu,

    What exactly do we need to declare in the hub.cfg and in what section to fix this?

    I'm running hub v7.80HF22 and we've seen very weird alarm flow just stopping issues since I upgraded to 8.4SP2. I see that my nas queue on the primary hub is bulk size = 1 atm. In my Queue tab the nas definition Bulk Size = 60 grayed out.

    Do I have to manually edit it and set it to 100 in raw mode?

  • 22.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Broadcom Employee
    Posted Apr 04, 2017 09:15 AM

    Hi Daniel,


    nas will always be a bulk size of 1 as the nas probe is single threaded.

    Currently there is no way to make nas process multiple alarms at the same time.

  • 23.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Apr 04, 2017 09:34 AM

    Thanks Gene. Btw the other queue I see at size 1 is the audit probe. I have the defined with bulk size 50 but is that also single threaded? Thank you.

  • 24.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Broadcom Employee
    Posted Apr 04, 2017 09:49 AM

    Hi Daniel,


    I should have said nas processes alarms asynchronously rather than single threaded as Garin so kindly pointed out

    as to your question about Audit.

    The audit probe has not been update in quite some time so currently yes it only has a bulk size of one.

    this probe can have an issue in some environments with new hub version

    there is a KB article on this here.


  • 25.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Apr 04, 2017 09:40 AM

    Because I'm a fan of semantics I need to offer a correction to the statement "nas probe is single threaded". Nas is definitely not single threaded - on Windows mine shows 23 threads active. It does though process work that is sequential in nature. Consider what would happen if your nas queue was backed up and you had both an alert open and alert close in that queue and if nas was able to read records out of that queue in something other than arrival order? nothing good would happen.


    So, nas is required to process records out of its inbound queue one at a time in order to maintain the chronology of the events.


    That does not mean that when nas is processing these events it's not also doing other things. If you have a scheduled job for instance it will run simultaneously with reading events from queue. 


    Also consider what happens when a probe crashes. If it has not in some way cached that block of records so that it can reprocess on startup and also is able to figure out what was partially done or completed or not started then you lose the whole block of data. If you read one at a time then your worst case loss is the content of that single message. And since it caused the crash you probably don't want to keep retrying it over and over and repeating the crash.


    Regarding alarm flow stopping, is that happening in the alarm_enrichment part of nas or the nas probe itself? For myself, every upgrade has required me to increase the amount of memory allocated for alarm_enrichment. My current setting starts with 2GB RAM and the limit is set to 4GB. 



  • 26.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Apr 04, 2017 10:38 AM

    Garin my issue was that the alarm_enrichment probe would stop processing anything. We then noticed that hey, there were no alerts since 20min ago then have to do a full stop/start on the nas probe to get it back up and running. I have my memory on that probe set to 2GB max and from the support case also threw in the options for the alarm_enrichment probe to automatically restart itself it it gets to 90% of the allocated memory of 2GB. But I've hit situation where it just stops processing and its only using 100MB of memory.. It never restarted so its a full stop/start again to fix. 

    Right now running the nas v4.94 version and with these options specified on this page

    Tech Tip: Alarm Enrichment probe not processing alarms 

    but only the following set in the nas:


    lower_memory_usage_threshold_percentage = 0.90

    upper_memory_usage_threshold_percentage = 0.90

    memory_usage_exceeded_threshold = 1


    Hub, <hub>

    post_max_age = 60

  • 27.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Apr 04, 2017 12:05 PM

    Does sound a lot like what I was facing. The way it was explained to me was that when the outstanding queue of events for alarm_enrichment gets backed up, it moves from processing blocks of data to processing single pieces. The justification for this logic escaped me when it was originally explained and I'm sure that it hasn't improved since. The trick to keeping things working was to make sure that alarm_enrichment never got to the point where it determined it was backing up - doing that was a mix of making sure the block read size was big enough (but not too big), that there was way more than what appeared to be enough RAM, and adjusting the memory usage thresholds.On top of that, I have a logmon probe that watches the size of the alarm_entrichment probe and boots it if it's too big.


    The other thing I've noticed is that alarm enrichment gets progressively slower over time. Periodic restarts can be therapeutic for it apparently.



  • 28.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Broadcom Employee
    Posted Apr 04, 2017 09:36 AM

    Hello, Daniel.

    I'm sorry for having you confused.

    Your problem with NAS (probably alarm_enrichment) is not related to my post.

    The defect in my post is only a HUB to HUB subscriber. (GET queue in a HUB subscribes ATTACH queue in the other HUB)

    Due to the defect, I suggest that you specify bulk_size=100 for GET queue definition in hub.cfg



    Yu Ishitani

  • 29.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Apr 04, 2017 10:11 AM

    Thanks for all the great info in these replies!  To the CA guys on this board, what is CA's plan to fix these issues?

  • 30.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Apr 04, 2017 12:16 PM

    Let me know if anyone faced any issue in device discovery .I have integrated with spectrum and it must not cause any impact after upgrade.

  • 31.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Apr 04, 2017 12:29 PM

    Depending on what version of discovery you are coming from, this version has a bunch of new rules and flexibility in the correlation process. Everyone seems to be learning how to use it and not everything is as easy as it would be hoped. If you are unfamiliar with the new rules format you will probably find the documentation unsatisfying.


    The issue I was facing with the old discovery is that it was unable to recognize information from net_connect as belonging to a robot and so you'd get multiple entries in cm_computer_systems, one for the host and one for the device. This then broke anything that assumed robot name would be unique in cm_computer_systems.


    The new version allows one to accommodate this to some extent but now what I am seeing is that it seems to oscillate between the several devices ids as being the master.


    I've not had much success in figuring out why.



  • 32.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Apr 04, 2017 01:18 PM

    Upgraded from 8.4 SP2 to 8.5.1.


    Issues experienced

    • Linux HUBs running v7.90 were recycling randomly. Resolved by going back to v7.80HF22.
      • Primary HUB has NAS key tuning applied for memory and post_max_age.
    • During the upgrade, AIX robots were upgraded to robot v7.80HF21. At the time v7.90 robot was not showing available due to the lingering issues of support.nimsoft.com access. v7.80HF21 was pulled from hotfix site after being informed development to not have been tested on AIX. Resolved by re-install of AIX robots with v7.90.
    • Fresh CABI deployment ~ Do not disable superuser as stated in the log (awaiting development to fix this log entry in addition to removing hashed passwords strings). Disabling superuser will result in having to drop CABI DB tables and re-deploy. Ensure ACL's are separated for LDAP users versus nimbus users.
    • USM failed alarm view produced "dispatcher" error prompt referring to ems. This was caused by an earlier version of Spectrum Gateway probe causing corruption of EMS content. Resolution path was to update to Spectrum Gateway v8.5.1 and deactivate > delete > re-deploy EMS.
    • Robot v7.90 on Windows robots intermittently failing to rename files. Most robots updated while others failed.

  • 33.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Posted Apr 04, 2017 08:06 PM

    I also had a couple of secondary hubs have issues with discovery agent. I get the following error when starting discovery in USM -

    "An error occurred while starting discovery on .
    Please check the agent status and try again." The discovery still goes on to run however.

    If I delete a device using remove_master_devicess_by_cskeys method it doesnt get rediscovered (even though it is a robot connected to the hub). After running the discovery in usm it finds the device but doesnt link its QOS.

  • 34.  Re: 8.4 -> 8.5.1: Any "gotchas"?

    Broadcom Employee
    Posted Apr 04, 2017 09:56 PM