I was able to run admin console (barely – at least it displayed data but was too slow to actually operate) in 8.0. I applied the 8.1 upgrade and now it doesn’t work.
Specifically, it takes sometimes hours before service_host gets Tomcat to the point that it will respond to page load requests.
Jan 20 22:21:16:272 [tomcat, service_host] Jan 20, 2015 10:21:16 PM org.apache.catalina.startup.Catalina start Jan 20 22:21:16:273 [tomcat, service_host] INFO: Server startup in 12115661 ms Jan 20 22:21:16:273 [main, service_host] tomcat started.
Once the 12115661 ms have elapsed, you can finally get the main admin page that lists the admin console and IM links along with the install packages.
If you select the admin console link, you get the admin console painted and the “Please Wait” progress indicator comes up for almost exactly 120 seconds. That’s suspicious in and of itself. After that 120 seconds, the right hand side of the web page displays “No hubs to display”. All the buttons are functional and you can use the search box though it finds nothing because there are no hubs loaded.
Support spent a bunch of time looking at logs and tweaking parameters to no avail. Logs are mostly clean of anything error like and everything that’s supposed to be running apparently is.
So, any thoughts out there on what might be going on?
Hi Garin, I would love to help out but we have not taken the step to 8.1 yet due to things like this. Want to wait a bit so that all the "free featrues" like this are sorted out.. Good luck..
I really wanted to click the "Accept as Solution" button to your comment. That would have been so easy.
Unfortunately, the number of undesirable features in 8.0 rendered 8.1 a necessity. At least this Admin Console behavior is solid proof why IM shouldn't be deprecated.
I seriously never thought I'd be saying that a year ago. Then I would have traded IM for a sharp stick in the eye. But at least IM is a consistent unpleasantness. Predictable to a fault. And there's a huge benefit to that.
Maybe the 120 seconds is the configured timeout for gethubs request
For what it's worth, I offer you moral support Garin.
Hi Garin, I don't know if is going to help your situation but: How many hubs do you have under your Domain?
When I played around with the Admin Console in the past, I had an issue with it only showing 50 out of the 100+ hubs we have in our domain. I found that I had to bump up the default value to 200 just so the entire tree would load up fully otherwise I would have to wait for it to load them all up.
The article I found that helped explain how to change the default setting of 50 to X can be found here:
Another thing to possibly check when the service is starting up, if you can go to the SQL Studio and open a SQL Server profiler on the NimsoftSLM DB. You can see which queires are running against it and if any are SUSPENDED that are from your UMP box you can narrow it down as to what is causing the delay. If its a SQL query that is at the root of the issue.
I've got just under 650 hubs and roughly 3500 robots.
I thought, based on support's comments in the past, that setting was for limiting the expansion under a hub and not for the hubs themselves.
Regardless, that number was bumped up anyway with no positive result.
As an aside, I never understood the approach where one designs a UI to be fast by sacrificing the display of data that the UI is intended to display. With limits in place you are only guaranteeing that the display is going to be wrong at least some of the time. With no limits, you guarantee that the display will be right but with a performance that is corresponding to the volume of data displayed. People are
I'll try the SQL thing. I'm not seeing any indication of a query timeout in the probe logs but maybe this is a tomcat timeout hitting where in populating the page, as opposed to service_host getting the data. And I'm not seeing any indication that SQL is using cycles or disk I/O during this time that the page is trying to populate.
Thanks for the suggestions - something to at least try.
No joy going the SQL server route. There are three queries from service_host to get the list of hubs during the two minutes it takes to time out on the admin console but they all run in less than 11ms and are issued periodically at roughly 30 second intervals. There are also roughly a hundred individual queries where the alive time and origin are queried for a single nimsoft hub address. These seem to happen whether the query from the GUI is running or not so I don't think that they're related to the console.
And going further down this train of thought, I recall being told that service_host maintained the list of hubs internally based on information from discovery_server. Not sure if the validity of that recollection though.
So, setting the service_host probe and Catalina memory options both to 8GB and the odata_max_results to 2500 gets the service_host probe to get past whatever having 4GB and 750 limits didn't. Still took it 26 minutes to get to the point in startup where it'll respond to page load requests but at least there are hubs and robots on my screen now. Seems to have to keep about 3GB of that allocated memory paged in too - that wasn't in teh release notes....
Still doesn't work though as any probe configuration operations fail with the suggestion that raw configure is the tool of choice.
Oh much maligned Infrastructure Manager, where art thou now?
Is anyones instance of UIM 8.1 Admin Console working?
I do. I had to redeploy most of the the components of adminconsole and upgrade ppm. I was getting hubs, but I had a weird bug where restarting a robot would cause all traffic through the hub to start slowing down and the console would timeout.
Also had to upgrade ppm to latest release (newer than the one in 8.1).
PPM has an infinite recursion bug that causes issues when you try to configure ppm from adminconsole. It redelegates to itself forever and sucks up network connections and memory.
PPM also has to run on all hubs. It's finally noted somewhere in the docs.
Didn't know it at the time, but mpse is a dependency of adminconsole. You might want to redeploy that piece too. It also seemed to work a bit better once discovery caught up. I think it relies heavily on discovery data. When it can't find that, it tries to discover things itself which may not scale, and likely breaks severely when ppm is not on all hubs. Just guesses though.
Anyway, ppm on all hubs should fix the raw configure errors for the probes admin-console will support. There are still some other bugs where it'll log calls to internal webservices that don't exist. Support also claims running more than one is unsupported now. Seems a little unreliable?.?
Near as I can guess, mpse is a nexus that gets data from discovery or delegates requested data to ppm probes and returns it back to adminconsole. Might try bumping memory on that piece too? ppm seems to not need more memory unless you trigger infinite recursion, then hold on.
Can I say "you **bleep**er" on the forum or is that too unprofessional?
I just checked back and mine is now saying "no hubs to display." I must have let it running long enough for the bad mojo to make it nogo.
I can't even imagine trying to support it in this state. I'll probably be telling the team to wait a release or two and we'll try the new features and probes again.
Who the **bleep** knew that there was a mother**bleep**in' censor for some **bleep**ing choice words? Cool! Now I can stop holding back.
Maybe we're bumping up against the formally unknown "Conservation of Mojo" law of the Universe.
My admin console, while unusably slow, hasn't exhibited a failure in the past 30 minutes. I was even able to open the config window for CDM.
If you need to use Admin Console, I can shut mine down so that there's an imbalance in the Mojo force created and maybe it'll go back to residing in your instance.
Setting up new env and using AC for the first time. With only 4 hubs it's been working alright, but there's really only <10 robots at this point. But having looked at all this baseling, dynamic tresholds, etc. I agree with Ray here and I'm definitely leaving all that "for the future", if they ever get them up to shape..
I think it worked! Mines back! A little sluggish, but working.
I did correlate some errors right around some gethub stuff in probes/service/service_host/catalinaBase/logs/ids_services
01232015 12:53:26,158 [80-exec-37] ERROR t.ids_core.NisDomain - Failed to determine origin.
I also restarted the nis_server, but I don't know if that had anything to do with it. I have around 330 hubs.
Are you running a lot of remote hubs over tunnels with recent 7.61 or 7.63 hubs? I ran into a bug where increased tunnel performance was causing spontanious resets of sessions when relaying data back over a tunnel. It was triggered when the performance over the tunnel was better than the local speed back to the requester. There was a specific log message on the tunnel receiving side of the tunnel. I wrote something in defects about it. If you turn up logging and find that message you are hitting it. The patch is in unreleased version 7.62 which lack the patches in 7.63, but it didn't make the cutoff to be released in 7.63 so it lacks the tunnel patch.
I noticed the admin console seems to want to scan all hubs for detailed information when it starts up and maybe after sitting for a while. If you're hitting that bug it might be trigger a sever outage in admin console. The effect on inf manager was occasional generic communication errors when loading probes with large configs.