DX Infrastructure Manager

Expand all | Collapse all

Server freezing with robot onboard

  • 1.  Server freezing with robot onboard

    Posted 03-13-2018 11:18 AM

    Hi All,

    We have issues with our security software hitting 100% cpu hanging our Virtual servers and when it does it puts the server into a zombie state. The server is on the network and pingable, so the IP-device or UIM host model does not go critical in Spectrum, at the same time all robot probes stop reporting due to lack of resources. The device could stay in this state for hours without being discovered. When we finally hear about it, we go into the IM console, the device shows green, but when we click on a probe we get the error message that communication has been lost with robot. What alternatives do we have to identify when this is happening?

     

    TIA



  • 2.  Re: Server freezing with robot onboard

    Posted 03-13-2018 11:45 AM

    Usually in this case we used to get robot inactive , Are you getting any alerts for those probes ?

    FYI: In spectrum please change the "Value_When_Red " attribute to 7 (for Pingable models ),will help you get alarms ,the default is 0 on a Pingable which will cause the model to initial state and you will not get device down alerts .



  • 3.  Re: Server freezing with robot onboard

    Posted 03-13-2018 04:16 PM

    Yeah, not getting a robot inactive alarm in UIM. and it's still pinging so Spectrum is all green.



  • 4.  Re: Server freezing with robot onboard

    Posted 03-13-2018 11:51 AM

    Because the robot status icon is green, it seems the robot is still communication to the hub. This can be verified via Performance Reports Designer to see if qos metrics are there during the time the system was in the problem state.

     

    Point is that if the robot is still communicating then cpu usage can be monitored and an alarm sent. cdm should be able to handle that.



  • 5.  Re: Server freezing with robot onboard

    Posted 03-13-2018 01:09 PM

    Yeah, you would think so , but that’s not what’s happening. When you try to open a probe, even the controller we get “unable to reach controller, communication error”  error popup.



  • 6.  Re: Server freezing with robot onboard

    Posted 03-13-2018 01:41 PM

    This happens only the CPU is consumed 100%? When ever the CPU is normal you can reach the controller and confirm QOS is getting displayed in USM.



  • 7.  Re: Server freezing with robot onboard

    Posted 03-13-2018 04:17 PM

    Yes, no performance data. Clearly the probes do not have the resources needed to function.



  • 8.  Re: Server freezing with robot onboard

    Posted 03-13-2018 03:44 PM

    Maybe this can help as a Self Monitoring:

     

    Best Practices for Monitoring CA UIM (self-health - CA Knowledge 

     

    Kind Regards,

     

    Alex Yasuda
    Sr. Support Engineer



  • 9.  Re: Server freezing with robot onboard

    Posted 03-13-2018 04:24 PM

    Yes we thought of the net_connect to monitor, however this would become tedious and work intensive when dealing with 2000 virtual servers.



  • 10.  Re: Server freezing with robot onboard

    Posted 03-13-2018 08:58 PM

    Please go through this thread , there are some LUA scripts (check_robot_probe.lua for example) which you can make use of to detect the non responding probes 

     

    Detecting Malfunctioning Robots 



  • 11.  Re: Server freezing with robot onboard

    Posted 10-10-2018 10:50 AM

    Hi Phani,

    I am not a script person, is there any step by step instructions that describe how to set this up?



  • 12.  Re: Server freezing with robot onboard

    Posted 03-16-2018 10:07 AM

    Patrick, let us know how you are doing with the last recommendation.  



  • 13.  Re: Server freezing with robot onboard

    Posted 04-06-2018 04:39 AM

    Hi Patrick,

    We are marking this questions as assumed answered; if you need further help, let us know!