DX Unified Infrastructure Management

 View Only
Expand all | Collapse all

robot.cfg disappeared from RHEL 7 servers

  • 1.  robot.cfg disappeared from RHEL 7 servers

    Posted May 16, 2018 09:24 AM

    Hi Guys,

     

    I received robot inactive alert and when checked, I found that robot.cfg does not exist in the system.

     

    Did anyone faced this issue in their environment?

     

    Kindly Suggest.

     

    Regards,

    ImranK



  • 2.  Re: robot.cfg disappeared from RHEL 7 servers

    Posted May 16, 2018 06:17 PM

    Yes. Happens more often than infrequently in my environment.

     

    Usually the cause can be tied to a failure of the robot to write the updated .cfg file. Seems that the code doesn't pay much attention to the success of the steps necessary to rewrite the cfg file when the robot is updated or restarted and so if you have a full disk or a permissions issue it goes away.

     

    There's also a defect where sections of a cfg file will be dropped. Usually I see that manifest itself by dropping one or more sections out of the cfg file but not losing the entire contents. This is supposed to be fixed in the latest robot version but I've not seen that to be true.

     

    -Garin



  • 3.  RE: Re: robot.cfg disappeared from RHEL 7 servers

    Posted Jul 11, 2019 09:02 AM
    Hey all,

    We're seeing this regularity in our environment. After a patch cycle of 4000 linux machines, we typically see around 400 machines come up with empty robot.cfg or missing bits in controller.cfg. 

    Is there any known workaround or help? Having to clean up after our monitoring agents, isn't a good place to be!


  • 4.  RE: Re: robot.cfg disappeared from RHEL 7 servers

    Posted Jul 11, 2019 09:58 AM
    400 systems with either 'if you have a full disk or a permissions issue it goes away.' seems like a lot. Could add to that any application that would restrict access to the robot directory. Ideally the complete nimsoft directory tree should be excluded from any application like that such as a virus scanner.

    ------------------------------
    Support Engineer
    Broadcom
    ------------------------------



  • 5.  RE: robot.cfg disappeared from RHEL 7 servers

    Broadcom Employee
    Posted Jul 11, 2019 12:05 PM
    Adding more details about David Michael comments, basically you need to execute this steps:

    -> Anti-Virus Protection and Nimsoft
    https://ca-broadcom.wolkenservicedesk.com/external/article?articleId=47887

    ------------------------------
    Senior Support Engineer
    Broadcom
    ------------------------------



  • 6.  RE: robot.cfg disappeared from RHEL 7 servers

    Posted Jul 12, 2019 08:24 AM
    Even when we had disabled virus scanning, we still had some systems that would loose their cfg files.  So we enable audit, not because we wanted auditing but because we got x last copies of the config and it was easy to recover.  Audit also meant that we didn't have to remember to take a copy of the config if we changed it.

    ------------------------------
    Knows a little about UIM/DXim, AE, Automic
    ------------------------------



  • 7.  RE: robot.cfg disappeared from RHEL 7 servers

    Posted Jul 12, 2019 08:24 AM
    Hey all,

    I've confirmed with our linux teams that we do not run antivirus or backup services on our hosts. So unfortunately, the root case for this behavior is still unknown. We do run uim as a service account (not root). This is a mandate by our infosec team, so we have no ability to change it, but I wonder if that might be related.


  • 8.  RE: robot.cfg disappeared from RHEL 7 servers

    Posted Jul 18, 2019 10:47 AM
    I can similarly confirm that there is no interaction from any third party software when this truncation/corruption/loss happens.

    And the 10% damage is roughly what I saw when using versions of hub and controller from the pre-hub-7.72 era. I have seen a significant reduction in the number of occurrences using the 7.95 controller and 7.93 hub but not to the point of eliminating the problem - probably 1 in a 100 occurence instead of 1 in 10.

    It feels like there's a race condition that occurs when package updates are applied. Mainly because I can add to this cfg file corruption behavior an additional failure to successfully complete the installation of all package files. Typically files will get staged in robot/pkg/temp and then the new files get copied to the destination with a .new extension, then existing files get renamed with a .old extension, then the .new files get renamed to the actual final name and the .old files will be removed. 

    It is very common to find the .old files still in place. less common to find the .new, and pretty common to find the files still in the temp directory with the existing file renamed.

    Usually one can fix this by copying the .new files manually to the correct names or removing the .old extension from the existing files. 

    Happens pretty frequently with hub and controller updates. I've never sen it happen with probes like CDM where there's no robot restart involved.

    So my suspicion with this behavior is that the controller is getting restarted too soon - essentially while it is processing the cfx file or moving the install files around.

    Just a guess at this point - would love to have confirmation but then if I knew which system was going to be broken ahead of time to grab the logs before and after it happened I'd have started playing the lottery a long time ago and be living on a nicely supplied tropical island far from the concerns of modern day IT.


  • 9.  RE: robot.cfg disappeared from RHEL 7 servers

    Posted Jul 18, 2019 11:06 AM
    Out of sheer curiosity... are you all using MCS for your package deployments or super packages through IM or other method of managing your deployments?

    Just wondering if there is any delivery or management method in common with the servers that lose their configs.


  • 10.  RE: robot.cfg disappeared from RHEL 7 servers

    Posted Jul 18, 2019 01:08 PM
    We are deploying using distsrv - mainly either drag-n-drop via IM or the job_add callback direct to distsrv.

    MCS is way too inflexible for our needs and so isn;t used except to periodically test to see if a newer version works better.

    The "new" automated deployment engine accessed via admin console suffers from both the horrendous usability issues that admin console imposes and was found in our environment to be far slower than distsrv (minutes to deploy a package instead of seconds with distsrv) and maybe an order of magnitude less reliable.

    We do use super packages extensively but not for the robot-update and hub packages because of the indeterminate length of time after a restart until the hub routing table gets updated with the new port information from the restarted robot. 



  • 11.  RE: robot.cfg disappeared from RHEL 7 servers

    Broadcom Employee
    Posted Jul 19, 2019 08:40 AM
    Hi IMrankhcl,

    What Robot version are you using?  If you are on v7.96 I would highly recommend downloading 7.97HF3 https://techdocs.broadcom.com/us/product-content/recommended-reading/technical-document-index/ca-unified-infrastructure-management-hotfix-index.html?r=2&r=1

    ------------------------------
    Customer Success Architect
    Broadcom
    ------------------------------



  • 12.  RE: robot.cfg disappeared from RHEL 7 servers

    Posted Jul 24, 2019 03:40 AM
    We are currently using 8.5.1 and are about to make the jump into the brave new world of 9.1.0S (hopefully 9.1.1S) so will update if we still see this problem on these problematic servers.

    ------------------------------
    Knows a little about UIM/DXim, AE, Automic
    ------------------------------