DX Unified Infrastructure Management

 View Only
  • 1.  Alarm console in Infrastructure Manager goes stale, stops updating

    Posted Oct 23, 2014 06:10 PM
    Over the past three years, we have seen this happen very infrequently, and it would usually resolve itself after about 10 minutes. However, after we upgraded to NMS7.6, running NAS version 4.37, this is a much more frequent occurrence and the system does not recover by itself. Alarm assignments, acknowledgements and new alarm updates stop working in Infrastructure manager after about 14-18 hours of runtime. To fix, we have to restart several hubs in our infrastructure, including the primary hub. Users are connecting to a hub that is connected via a tunnel connection to the primary hub (the IM clients are 3 hops away from the primary hub). Has anyone else experienced these symptoms?


  • 2.  Re: Alarm console in Infrastructure Manager goes stale, stops updating

    Posted Oct 31, 2014 05:40 PM

    You should see subscriptions to the message bus for the IM instances that are running, and they should have message counts associated with them. Do you see those message counts increasing?

     

    I just did some testing with hub 7.61 and 7.60, and I had major problems with messages queuing. I had to revert to our previous version until we can spend the time to troubleshoot.



  • 3.  Re: Alarm console in Infrastructure Manager goes stale, stops updating

    Posted Oct 31, 2014 07:46 PM

    Thanks for the info Keith.

     

    We have not looked directly at the t_NN (ex. t_14) queues on the primary hub that get created by Infrastructure Manager subscriptions.

     

    We've been working closely with development and support from Nimsoft to understand and resolve this issue.

     

    Some of it seems to be tunnel and messaging related.  I posted this mostly to see if anyone else was experiencing similar behaviors with hub version 7.61.



  • 4.  Re: Alarm console in Infrastructure Manager goes stale, stops updating

    Posted Nov 18, 2014 08:15 PM

    ^+100

    I have had this issue ever since joining my current company. The Nimsoft IM just occasionally hangs for us. What happens for when things go bad is that, it goes into a busy, waiting, pending state b/c my mouse cursor changes from the regular pointer to the spinning waiting cursor. And when I'm using IM it will just go into that state and stay there for a period fo time. 

    Whenever this happens I usually have to jump onto eithe the Primary Hub, or one of our Sub-Hubs and cycle the UIM services.

     

    I have an issue opened with Nimsoft regarding this but it goe Much much better after I upgrade to robot version 7.62 and hub version 7.61. We were previously using v7.05 and the difference was night and day. 



  • 5.  Re: Alarm console in Infrastructure Manager goes stale, stops updating

    Posted Nov 18, 2014 09:05 PM
    We upgraded from 6.5 to 7.1 and then almost immediately to 7.6, we weren't on 7.1 for more than 1 day. I've heard of several issues with the 7.0 - 7.1 releases.

    Do you still have to cycle the UIM service on the primary hub since upgrading to hub 7.61?


  • 6.  Re: Alarm console in Infrastructure Manager goes stale, stops updating

    Posted Nov 18, 2014 09:09 PM

    The behavior we are experiencing is with the primary hub GET queues to downstream hubs.  The message read rate slows significantly and after a few hours will just stop, even though the primary and downstream hubs appear healthy. To fix the situation, we have to cycle the UIM service on the primary hub - we don't have to restart anything downstream.

     

    On a side note: there is a memory leak identified in hub version 7.61.  There is a hot fix for it, if you're not experiencing the memory leak then you may want to wait until a formal release (not a hot fix) is published to the community. 

     



  • 7.  Re: Alarm console in Infrastructure Manager goes stale, stops updating

    Posted Nov 19, 2014 12:23 AM

    That sounds a lot like what we saw in our dev environment while testing hub 7.61 and then 7.60. We finally downgraded back to hub 5.82, which was working great before we started that testing. Unfortunately now we are running into a different issue with that hub version.



  • 8.  Re: Alarm console in Infrastructure Manager goes stale, stops updating

    Posted Nov 19, 2014 01:31 AM

    Things were their best when we were running hub v7.61 and robot version 7.62 across the board. Then we hit the critical robot "Bad Build" defect where robots would auto un-registered whenever the robot shut down.

    Since then we have downgraded to robot version v7.05 and now running v7.61 of the hub. We are waiting for the next robot GA release of v7.63 before fully deploying again across the board. Need to do more in-house testing and verify.

     

    At the moment we maybe 1x every other week have to cycle a hub here or there for things to get back into shape but it's much much better than the v7.05 hub days.