DX Unified Infrastructure Management

 View Only
Expand all | Collapse all

QOS not coming from robots in UIM database after CU2 update

  • 1.  QOS not coming from robots in UIM database after CU2 update

    Posted 23 days ago

    Hi all,

    Recently upgraded to CU2 and all robots reporting to the primary hub stopped sending CDM QoS.

    I applied CU3 update but that hasn't resolved.

    Reported to support, they suggested to run PU and execute "plugin-metric-correction" which didn't do anything.

    Tried removing robots from mcs group, letting it update and adding them in again - no good

    Tried manually removing cdm and allowing mcs to redeploy - no good

    Did anyone else experience this when moving to CU2?

    I'm running MCS 23.4.3.3 which is the patched version apparently.

    Robots I'm testing with are running 23.4.3 with the CDM 7.21



  • 2.  RE: QOS not coming from robots in UIM database after CU2 update

    Broadcom Employee
    Posted 23 days ago

    Hi Sam,

    Ill open a case for you and attach cdm-8.01-T4-20250620.zip (which has a bunch of fixes) for you to try and this should help, if not, we can hold a zoom meeting.

    Steve



    ------------------------------
    Steve Danseglio
    Senior Principal Support Engineer (Technical Support Engineer 5)
    Broadcom Software-IMS Division
    UIM Certified Expert
    KCSv6 Practices Certified
    Certified Customer Success Manager (CCSM) Level 1
    ------------------------------



  • 3.  RE: QOS not coming from robots in UIM database after CU2 update

    Broadcom Employee
    Posted 23 days ago
      |   view attached

    In the meantime here is the cdm version with a bunch of fixes including fix(es) for no QOS metrics.



    ------------------------------
    Steve Danseglio
    Senior Principal Support Engineer (Technical Support Engineer 5)
    Broadcom Software-IMS Division
    UIM Certified Expert
    KCSv6 Practices Certified
    Certified Customer Success Manager (CCSM) Level 1
    ------------------------------

    Attachment(s)

    zip
    cdm-8.01-T4-20250620.zip   21.40 MB 1 version


  • 4.  RE: QOS not coming from robots in UIM database after CU2 update

    Posted 23 days ago

    Brilliant, thanks Steve! I'll give it a ago now :)




  • 5.  RE: QOS not coming from robots in UIM database after CU2 update

    Broadcom Employee
    Posted 23 days ago

    Troubleshooting MCS and the configuration_reader_service (CRS):

    1.       Using Raw configure mode set the spooler parameters:

    debug = 5
    logsize = 200000 (logsize parameter will likely need to be added)

    2. Run the 'metric_plugin_get_profile_policy_checksums' callback against the spooler on the problem robot and capture the output.

    3. Using Raw configure mode for the configuration_reader_service on the hub that it reports to, set the following parameters under setup

    dumpDataFromRobot = true
    loglevel = 5
    logsize = 30000

    Note: Allow this to run for about 15 minutes

    4. Capture the following data for support, after the 15 minutes is up:

    (from the problem robot)
    - output of the 'metric_plugin_get_profile_policy_checksums' callback
    - <probe_name>.cfg
    - complete ..nimsoft/plugins directory and contents
    - _spooler.log, spooler.log, spooler.cfg (from the robot directory)

    (from the hub it reports to)
    - _configuration_reader_service.log, configuration_reader_service.log, configuration_reader_service.cfg
    - ..configuration_reader_service/robot_logs/<deviceid>_req.txt
    - ..configuration_reader_service/robot_logs/<deviceid>_resp



    ------------------------------
    Steve Danseglio
    Senior Principal Support Engineer (Technical Support Engineer 5)
    Broadcom Software-IMS Division
    UIM Certified Expert
    KCSv6 Practices Certified
    Certified Customer Success Manager (CCSM) Level 1
    ------------------------------



  • 6.  RE: QOS not coming from robots in UIM database after CU2 update

    Posted 23 days ago

    Hi Steve,

    I'll have to wait for it to be attached to the case, I can't download zip files from community.broadcom.com.

    If it helps, my case is logged with Arrow under ticket ref - 22188




  • 7.  RE: QOS not coming from robots in UIM database after CU2 update

    Posted 23 days ago

    If you can attach it to this kb article I think I can download from there, maybe...:

    https://knowledge.broadcom.com/external/article?articleNumber=382607




  • 8.  RE: QOS not coming from robots in UIM database after CU2 update

    Broadcom Employee
    Posted 23 days ago

    Hi Sam,

    I've attached cdm-8.01-T4-20250620.zip to the KB article along with details of what was fixed in each test fix.

    Please let me know how it goes.

    Steve



    ------------------------------
    Steve Danseglio
    Senior Principal Support Engineer (Technical Support Engineer 5)
    Broadcom Software-IMS Division
    UIM Certified Expert
    KCSv6 Practices Certified
    Certified Customer Success Manager (CCSM) Level 1
    ------------------------------



  • 9.  RE: QOS not coming from robots in UIM database after CU2 update

    Posted 23 days ago

    Perfect, thanks Steve.  Got it and testing now.




  • 10.  RE: QOS not coming from robots in UIM database after CU2 update

    Posted 23 days ago

    Had it running for 15 mins or so, not seeing any QoS from my Server yet.

    I'm seeing this in the Spooler log:

    ul  2 22:11:33:265 [17016] 0 spooler: (load_bus_plugin) Loaded plugin (C:\Program Files (x86)\Nimsoft/plugins/plugin_metric/plugin_metric.dll) successfully. 
    Jul  2 22:11:33:280 [17016] 0 spooler: Attempt to create malformed metric object 
    Jul  2 22:11:33:285 [17016] 0 spooler: Metrics post-processing plugin [23.4.3.1309 Dec 31 2024] successfully loaded and configured 
    Jul  2 22:11:33:285 [17016] 0 spooler: policy_profile_pull_thread  - start worker thread 
    Jul  2 22:11:33:305 [17016] 0 spooler: ########## START ########## 
    Jul  2 22:11:33:305 [17016] 0 spooler: Robot Spooler 23.4.3 [Build 23.4.3.1642, Dec 31 2024] 




  • 11.  RE: QOS not coming from robots in UIM database after CU2 update

    Posted 23 days ago

    This is from the MCS log when I attempt a plugin repair and specifying a particular Server:

    Jul 02 22:26:48:490 [attach_socket, mon_config_service] PluginRepairController.repairPlugin:94:    Received Plugin metric correction request for devices {SERVER-NAME} , csIds {null} , groupIds {null} , processAllDevicesFlag {null}
    Jul 02 22:26:48:494 [PluginRepairController[repairPluginMetric]-4754, mon_config_service] PluginRepairController.repairPluginMetric:172:    START Repairing pluginMetric
    Jul 02 22:26:48:494 [PluginRepairController[repairPluginMetric]-4754, mon_config_service] MigrationSynchronizer.acquireForPluginRepair:108:    Acquiring for Plugin repair
    Jul 02 22:26:48:494 [PluginRepairController[repairPluginMetric]-4754, mon_config_service] MigrationSynchronizer.acquireForPluginRepair:113:    Acquired for plugin repair
    Jul 02 22:26:48:494 [PluginRepairController[repairPluginMetric]-4754, mon_config_service] PluginRepairController.repairPluginMetric:190:     Repairing pluginMetric Started
    Jul 02 22:26:48:513 [PluginRepairController[repairPluginMetric]-4754, mon_config_service] PluginRepairController.repairPluginMetric:209:    Found: 0 devices in need of plugin repair processing




  • 12.  RE: QOS not coming from robots in UIM database after CU2 update

    Broadcom Employee
    Posted 23 days ago

    It is no longer recommended with 23.4 to run the plugin metric correction callback from the mon_config_service probe. I would recommend deactivating and activating the configuration_reader_service probe on the hub that owns the robots. This will cause a push/pull and correction as needed on the connecting robots.

    It may be a good idea to niscache clean, and reset robot id, at the hub, and the robot(s) in question.
    https://knowledge.broadcom.com/external/article/44669/how-to-clear-the-niscache-on-an-individu.html

    Lastly, you may want to review dr. nimbus to see if QOS is being pushed upstream from a given robot.
    https://knowledge.broadcom.com/external/article?articleNumber=57074




  • 13.  RE: QOS not coming from robots in UIM database after CU2 update

    Posted 23 days ago

    Perfect, thank you Chris.

    I've bounced the config_reader and I can see in the log it's doing a whole bunch of stuff.  I'll leave it over night and check back in the morning.

    If there's no change I'll clean the niscache and follow those processes.

    I never did check the Dr, as there wasn't anything queuing in the Hub queue and no data in the DB from the robot I'm testing with.  I'll check that tomorrow too.

    Thank you again Chris and Steve.  Appreciate your help.




  • 14.  RE: QOS not coming from robots in UIM database after CU2 update

    Broadcom Employee
    Posted 23 days ago

    What Chris said...

    MCS Troubleshooting in DX UIM v23.4 CU 1.1 or higher
    Do not run the plugin_metric_correction callback - it is no longer valid.



    ------------------------------
    Steve Danseglio
    Senior Principal Support Engineer (Technical Support Engineer 5)
    Broadcom Software-IMS Division
    UIM Certified Expert
    KCSv6 Practices Certified
    Certified Customer Success Manager (CCSM) Level 1
    ------------------------------



  • 15.  RE: QOS not coming from robots in UIM database after CU2 update

    Posted 23 days ago

    Morning both,

    I've cleaned the niscache and reset the robot id, and still no QoS.

    I've also checked dr nimbus and can confirm there is no QoS coming from the effected robots.




  • 16.  RE: QOS not coming from robots in UIM database after CU2 update

    Broadcom Employee
    Posted 22 days ago

    Hi Sam,

    Have you checked if the robot exist on the **** table on the DB?

    Please check KB Unable to view CDM metrics for one server.




  • 17.  RE: QOS not coming from robots in UIM database after CU2 update

    Posted 22 days ago

    Hi Samer, I checked the DB and there is no CDM data for the robots effected.  This is effecting all robots that report directly to the primary.  All QoS from CDM probe stopped at the time the upgrade took place.




  • 18.  RE: QOS not coming from robots in UIM database after CU2 update

    Broadcom Employee
    Posted 22 days ago
    Edited by Christopher Pearson 22 days ago

    If you are not seeing QOS via dr. nimbus, that would generally be an indication that there is likely a correlation issue between the data that the cdm probe is pushing to the spooler on the robot. A common issue is that the probe is set to send the QOS to spooler with short name, but the plugin metric file has the long name in it for that QOS or visa versa.

    So the best way to review this would be to open the cdm probe in IM of one of the problem robots and check (don't make any changes, as making changes in IM of mcs deployed cdm is not supported):


    against what the template has:


    In the case above they are consistent with regards to Send short.. and Allow QOS...

    Next look at the advanced tab, under QOS pretty much all of the metrics are turned on:


    Open your plugin_metric.cfg file <CTRL>B from IM.

    Pick a metric that you want to trace:


    A scenario where there may be a mismatch, is that the longname could be listed instead of the shortname for the qos. This would cause spooler to dump the data as it would expect cdm to send longname, but in my above example I had set the probe to use shortname.

    To demonstrate a possible problem to show the issue (do not do this). I stopped the CRS on the reporting hub, than changed my cdm.cfg turning off the two items above. After doing this now the data is coming from cdm with long name. The result is that my plugin_metric.cfg is configured for short, and my cdm is mismatched to long. The result in my spooler.log (at loglevel=5):


    The above should at least help you understand where the problem may be coming from or at what point it is breaking.




  • 19.  RE: QOS not coming from robots in UIM database after CU2 update

    Posted 19 days ago

    Thank you Chris, interesting info.

    I've checked an effected cdm probe and the short name matches the MCS profile.  One thing to note is that this is only effecting robots reporting into the primary hub.  Robots reporting to other Hubs are reporting cdm QoS with no problem.

    I've checked the plugin_metric file on an effected robot and the cdm config section looks like this (which clearly isn't right):

       <cdm>
          <318001>
             $setup_profile$ = 1
          </318001>
          <318016>
             $setup_profile$ = 1
          </318016>
          <318038>
             $setup_profile$ = 1
          </318038>
          <318006>
             $setup_profile$ = 1
          </318006>
          <318028>
             $setup_profile$ = 1
          </318028>
       </cdm>




  • 20.  RE: QOS not coming from robots in UIM database after CU2 update

    Posted 19 days ago

    Could be this one: cdm metrics are not displayed in UIM OC




  • 21.  RE: QOS not coming from robots in UIM database after CU2 update

    Posted 18 days ago

    Thanks Marius,

    I'm not seeing any QoS in Dr Nimbus.  I also don't have access to remote onto the effected robots to rename the file.  Ideally I need a centralised way of updating this.

    If it's not possible I'll have to involve others teams to remote onto 85 machines to rename a folder.




  • 22.  RE: QOS not coming from robots in UIM database after CU2 update

    Broadcom Employee
    Posted 17 days ago

    Stay tuned, I'm working on trying to find some method to do this en masse.



    ------------------------------
    Steve Danseglio
    Senior Principal Support Engineer (Technical Support Engineer 5)
    Broadcom Software-IMS Division
    UIM Certified Expert
    KCSv6 Practices Certified
    Certified Customer Success Manager (CCSM) Level 1
    ------------------------------



  • 23.  RE: QOS not coming from robots in UIM database after CU2 update

    Posted 17 days ago

    Thanks Steve,

    In desperation I tried a Lua script to rename the folder on the remote robot, but local robot permissions didn't allow it.  Bit of a long shot but worth a try.




  • 24.  RE: QOS not coming from robots in UIM database after CU2 update

    Posted 17 days ago

    Interestingly, when this first happened, I renamed the cabi Server's plugin_metric file and QoS started coming through from that Server.

    I've just received a Performance report for my UIM Servers and the QoS stopped on the Cabi Server at 23:4 later that day.

    As a test I'll recreate the plugin_metric file again and monitor to see if the QoS stops again.




  • 25.  RE: QOS not coming from robots in UIM database after CU2 update

    Posted 17 days ago

    Update on this one, originally I only renamed the plugin_metric file.  This time I've renamed the folder and deployed the robot_update (same version as already on the Server)

    I've seen the plugin folder get recreated, will update this thread if QoS starts / stops.




  • 26.  RE: QOS not coming from robots in UIM database after CU2 update

    Broadcom Employee
    Posted 17 days ago

    hi Sam,

    Hi Sam,

    Sounds hopeful I will continue to investigate on my end. 

    Steve



    ------------------------------
    Steve Danseglio
    Senior Principal Support Engineer (Technical Support Engineer 5)
    Broadcom Software-IMS Division
    UIM Certified Expert
    KCSv6 Practices Certified
    Certified Customer Success Manager (CCSM) Level 1
    ------------------------------



  • 27.  RE: QOS not coming from robots in UIM database after CU2 update

    Broadcom Employee
    Posted 18 days ago

    Hi Sam,

    In that case, please try these steps.

    • Remotely login on to the end robot with the issue, backup and delete the existing plugin_metric folder (path: ..\Nimsoft\plugins\plugin_metric)

    • (Re-)deploy the latest supported robot_update version in the archive on the end robot with issue
      (Note: Perform the above step, i.e. re-deploy the robot_update on the end robot with issue even if the end robot is currently running on the latest version)

    • Confirm that the default plugin_metric folder and .cfg file is recreated.

    After doing this, check for QOS_MESSAGE messages in DrNimbus and then data in OC.

    Steve



    ------------------------------
    Steve Danseglio
    Senior Principal Support Engineer (Technical Support Engineer 5)
    Broadcom Software-IMS Division
    UIM Certified Expert
    KCSv6 Practices Certified
    Certified Customer Success Manager (CCSM) Level 1
    ------------------------------



  • 28.  RE: QOS not coming from robots in UIM database after CU2 update

    Posted 18 days ago

    Morning Steve,

    I did this on one of the robots that I have access to and it worked.  Problem is there are 85 other robots and I don't have access to remote onto them.

    Is there a way to resolve this without having to log onto 85 machines?




  • 29.  RE: QOS not coming from robots in UIM database after CU2 update

    Broadcom Employee
    Posted 17 days ago

    Hi Sam,

    Checking into this, I'll get back to you asap.

    Steve



    ------------------------------
    Steve Danseglio
    Senior Principal Support Engineer (Technical Support Engineer 5)
    Broadcom Software-IMS Division
    UIM Certified Expert
    KCSv6 Practices Certified
    Certified Customer Success Manager (CCSM) Level 1
    ------------------------------



  • 30.  RE: QOS not coming from robots in UIM database after CU2 update

    Posted 22 days ago

    Hello,

    What is the status on the MCS profiles on the robots? What about when you removed the robots from the group? Where the profiles removed?

    Do you have NIC monitoring enabled?

    In cdm.log with level 5 you should be able to see if there are any errors and if the information is collected from the OS.

    Then look into the spooler.log with level 5 and you should be able to see if the qos is sent or if there are errors.

    I would also try deleting a robot from the inventory (without removing the QOS or alarms), removing it from the group with the MCS profiles and then reinstalling it. See if the profiles are applied.

    Marius




  • 31.  RE: QOS not coming from robots in UIM database after CU2 update

    Posted 22 days ago

    Hi Marius,

    The status of the MCS profile is "DEPLOYED".  I removed an effected robot from the group and watched MCS remove CDM, then I added it back again and watched it deploy CDM, but still no QoS data.

    In the spooler log I see this, which to me suggests it thinks it's successfully rebuilt the plugin-metric dll:

    Jul  3 15:47:44:310 [3000] 0 spooler: (load_bus_plugin) Loaded plugin (C:\Program Files (x86)\Nimsoft/plugins/plugin_metric/plugin_metric.dll) successfully. 
    Jul  3 15:47:44:350 [3000] 0 spooler: Attempt to create malformed metric object 
    Jul  3 15:47:44:350 [3000] 0 spooler: Metrics post-processing plugin [23.4.3.1309 Dec 31 2024] successfully loaded and configured 
    Jul  3 15:47:44:350 [3000] 0 spooler: policy_profile_pull_thread  - start worker thread 
    Jul  3 15:47:44:360 [3000] 0 spooler: ########## START ########## 
    Jul  3 15:47:44:360 [3000] 0 spooler: Robot Spooler 23.4.3 [Build 23.4.3.1642, Dec 31 2024] 




  • 32.  RE: QOS not coming from robots in UIM database after CU2 update

    Broadcom Employee
    Posted 11 days ago

    Hi Sam,

    Still trying to find out if this can be done. I've taken a last stab at the heart...

    The customer needs a method to do this en masse for 85 servers: Remotely login on to the end robot with the issue, backup and delete the existing plugin_metric folder (path: ..\Nimsoft\plugins\plugin_metric)

    If the customer is using the same drive(s), e.g., C:\ or even a small selection of drives,  a service account or the same credentials to login, couldn't we create a probe/robot_update package that does the following:

    1. Backup current plugin_metric folder
    2. Delete pre-existing plugin_metric folder
    2. Force a redeploy of the latest supported robot_update version in the local archive on the end robot with the issue even if the end robot is currently running on the latest version.
    3. Confirm that the default plugin_metric folder and .cfg file has been recreated.

    After doing this, the customer can manually check for QOS_MESSAGE messages in DrNimbus and then data in OC.

    I'll let you know if I get any helpful responses, but the 2-3 resources I'm counting on are out on holiday, retired, or currently out of office due to summer time.

    Steve



    ------------------------------
    Steve Danseglio
    Senior Principal Support Engineer (Technical Support Engineer 5)
    Broadcom Software-IMS Division
    UIM Certified Expert
    KCSv6 Practices Certified
    Certified Customer Success Manager (CCSM) Level 1
    ------------------------------



  • 33.  RE: QOS not coming from robots in UIM database after CU2 update

    Posted 11 days ago

    Hello Steve,

    Shouldn't Broadcom offer some kind of way to re-apply the MCS profiles when something bad happens? Something like a button on the Monitoring page in OC - Force Reapply and "Boom" - the plugin metric file is recreated from scratch and the profiles are reapplied.

    This should have been a core functionality since long ago. Plugin_metric_correction kind of did that but many times it failed and was stuck with alarms for old thresholds.

    Now the robots pull the profiles from the hub and "shouldn't" be any more problems, but there are still problems, as you can see. And also you can't use MCS profiles with passive robots.

    Thank you,

    Marius




  • 34.  RE: QOS not coming from robots in UIM database after CU2 update

    Broadcom Employee
    Posted 11 days ago

    Hmm, good idea Marius, Ill certainly discuss this with engineering.

    Steve



    ------------------------------
    Steve Danseglio
    Senior Principal Support Engineer (Technical Support Engineer 5)
    Broadcom Software-IMS Division
    UIM Certified Expert
    KCSv6 Practices Certified
    Certified Customer Success Manager (CCSM) Level 1
    ------------------------------



  • 35.  RE: QOS not coming from robots in UIM database after CU2 update

    Posted 10 days ago

    Hey Steve,

    Thanks for your help on this.  I've had to fall back to monitoring with the CDM probe and remove the MCS config.  We can't go too long without the QoS.

    We only use MCS for CDM so I'm going to look into automating CDM deployment via a LUA script instead.

    Going forward it might be worth putting this on engineering list of updates to resolve in a future release?

    Like Marius suggests, a way to push out MCS profiles when something goes wonky would be great.




  • 36.  RE: QOS not coming from robots in UIM database after CU2 update

    Broadcom Employee
    Posted 9 days ago
    Edited by Jason Allen 9 days ago

    One thing that you can do - depending on how complex your setup is in terms of whether you have a lot of MCS Profiles coming from different groups or not - but if you don't have a lot of overlap, so that the MCS profiles in question are all coming from a single group, then you can do something like this:

    • in Operator Console edit the group which has the MCS profiles applied
      - Add a "nonsense" filter to the group criteria, like "AND NAME IS ZZZZZ"
      - save the group - it will empty out, because you presumably don't have a server named ZZZZZ
    • now the plugin_metric.cfg files should be reset as long as there are no other groups pushing profiles to those servers (note: This can take a bit of time but if you keep an eye on it you should see plugin_metric.cfg replaced with about a 4kb file)
    • now you can remove the ZZZZZ filter and the profiles will get re-applied when the devices rejoin the group again.