DX Unified Infrastructure Management

 View Only
  • 1.  Lua script - Max restart probe

    Posted Jan 14, 2020 05:16 AM
    We have a lua script below which identifies Max. restart of probe alerts and creates a list. I now need to add restart command to the controllers in question as per the section below "-- Now run controller restart ???????????????"

    Can anyone help with the easiest way to do this?

    -- Start of script
    local al = alarm.list() -- Get alarm list

    local re = "%p%a+%d*_*%a*%d*%p" -- Regex to match probe name with alpha, numbers and underscore

    if al ~= null then
    for i = 1,#al do
    if al[i].prid == "controller" then -- First, filter to get alarms from controller probe only

    if string.match(al[i].message,"Max. restarts reached for probe") then -- Second, filter to get controller alarms with specific text i-e "Max. restarts reached for probe"

    probe = string.gsub(string.match(al[i].message,re),"'","") -- Get probe name from alarm message and then remove quotes from probe name to use in probe_verify callback
    --print(al[i].message.."! Probe-> "..probe) -- View alarms with probe names which failed to start

    addr = "/"..al[i].domain.."/"..al[i].hub.."/"..al[i].robot.."/".."controller" -- Build Nimsoft address
    printf("/"..al[i].domain.."/"..al[i].hub.."/"..al[i].robot.."/".."controller".."<->Probe="..al[i].prid) -- Print Nimsoft address(es)


    -- Now run controller restart ???????????????

    local args = pds.create()
    pds.putString(args,"name",probe)
    nimbus.request(addr,"probe_verify",args)
    nimbus.request(addr,"probe_activate",args)
    pds.delete(args)
    sleep (100) -- A little delay between each probe callback
    end
    end
    end
    end
    -- End of script


  • 2.  RE: Lua script - Max restart probe

    Posted Jan 14, 2020 08:53 AM
    perhaps one of these prior threads will help:
    https://community.broadcom.com/enterprisesoftware/communities/community-home/digestviewer/viewthread?GroupId=1315&MID=741845&CommunityKey=170eb4e5-a593-4af2-ad1d-f7655e31513b&tab=digestviewer

    https://community.broadcom.com/enterprisesoftware/communities/community-home/digestviewer/viewthread?MessageKey=95fd807d-e48c-457a-8407-9338a6d9bba1&CommunityKey=170eb4e5-a593-4af2-ad1d-f7655e31513b&tab=digestviewer#bm95fd807d-e48c-457a-8407-9338a6d9bba1

    ------------------------------
    Support Engineer
    Broadcom
    ------------------------------



  • 3.  RE: Lua script - Max restart probe

    Posted Jan 14, 2020 03:04 PM
    Couple comments about your code:

       local al = alarm.list() -- Get alarm list

    Can take arguments. So for instance you can filter the set of alarms returned:

       local al = alarm.list("prid", "controller") -- Get alarm list for just the controller probe

    The structure 

    for i = 1,#al do

    is dangerous since it assumes that the table that you are iterating over has no gaps but in general Lua tables will have gaps.

    Better is to use pairs like

    for index,value in pairs(table) do


    In the alarm record, the probe name is held by the "prid" element. No need to extract the item from the message text which would be error prone.

    Always check the return values of nimbus.request - it will fail fairly often - certainly far more often than blindly calling it and continuing will tolerate.

    To restart the probe in general, use nimbus.request to send "_stop" to it. The controller will see it stopped and restart it.






  • 4.  RE: Lua script - Max restart probe

    Posted Jan 15, 2020 04:51 AM
    The issue I have is the "Max. restarts reached for probe" generally come from multiple probes on the same robot when they fail and I only want to issue the Robot restart command once.  The script I have copied will give a list of all alerts containing "Max. restarts reached for probe".

    How do I accomplish a script that identifies the alerts and restarts the robot on the affect server?

    Help appreciated


  • 5.  RE: Lua script - Max restart probe
    Best Answer

    Posted Jan 15, 2020 02:52 PM
    Your original script should do that.

    The section of your code:
    local args = pds.create()
    pds.putString(args,"name",probe)
    nimbus.request(addr,"probe_verify",args)
    nimbus.request(addr,"probe_activate",args)
    pds.delete(args)

    is really intended to deal with correcting the "red" indicator on probes after a robot is renamed but it's doesn't hurt in this case either.

    The probe_verify call is the same as right click on probe, choose security, choose validate

    The probe_activate is the same as right click on probe, choose activate.

    Without seeing the output of your script or it having any error handling it is hard to tell if it is working or not. Very likely is the possibility that it is working and that the probe is immediately going inactive again.

    Taking the proverbial step backwards from the issue, probes shouldn't exhaust their retries. So while it is nice to have a script that fixes it, it is far better not to have to deal with the issue in the first place. Better to put your resources towards identifying the cause of the issue and eliminating rather than tolerating it.