DX Unified Infrastructure Management

 View Only
  • 1.  Monitoring loss on some robots/servers

    Posted May 09, 2023 12:10 PM

    Hi everyone, I am going through something very curious, where I have observed with some recurrence that some robots seems to lose monitoring. Observing this state, not present any alarm only visually i see that don't have probe activate:


    I recovery the monitoring, re-deploying robot_update and so deploy cdm, processes, sqlserver, etc... And the monitoring go back activate...


    Any idea what happen here and how can i detect this behavior ?



  • 2.  RE: Monitoring loss on some robots/servers

    Posted May 09, 2023 10:33 PM

    when the robot shuts down, controller updates the list of probes that should be running by rewriting controller.cfg. If the drive is full or not writable then this file will get truncated or deleted. If the file doesn't exist, when the robot starts back up it will create a controller.cfg with a single "paragraph" in it for controller. And you'll only see controller in IM and it will be the only process (along with numbus) running.

    When this happens again, before doing anything check to see what's in controller.cfg - I bet that it is only the one entry. You can look at the file now on a good robot to see what should be there. 

    If it gets truncated, you can recreate the controller.cfg by copying it from a similarly configured robot. Put it in the changes subdirectory, delete all lines that start with magic_key, then start the robot. It should move the file to the robot directory and delete it from the changes directory.