DX Unified Infrastructure Management

 View Only
Expand all | Collapse all

Tunnel Clients not accessible from PriHub but can access via TunnelHub

  • 1.  Tunnel Clients not accessible from PriHub but can access via TunnelHub

    Posted Dec 03, 2019 04:34 PM
    Edited by Daniel Blanco Dec 03, 2019 05:00 PM

    So this just started happening recently where we are setting up new hubs and when logged into the Primary Hub thru IM or in Admin Console these new Tunnel Clients are Red. They are in-accessible but if from w/in IM I r-click and Login to a tunnel server they can reach these new tunnel client hub entries. 
    I have already tried doing a remove of these tunnel clients from w/in all 8 of the hub's probe's hub's tab on the two tunnel client hubs. When they were re-created, the primary hub still could not find a path to them. They are still RED in IM and in Admin Console.
    The r-click check access, check transfer just times out from w/in the hub probes hub's tab. Both entries are red. 

    In my support case (20127165) it was suggested to follow this KB:

    Article title: Remote Hub is offline and unreachablea

    which say to stop Nimsoft, and delete the hubs.sds and robots.sds file and then restart.

    I am trying this in my lab but I'm noticing that no robots are re-appearing under the primary hub afterwards. It's been almost 30 min and so far all 12 robots in lab are still gone from under primary hub. 
    If I forcibly do a stop/start on the nimsoft service the robots appear but I cannot do this in PROD. Will these robots eventually check back in at some point with the pri-hub in lab? 
    Also what other options do I have in order to fix this hub corruption as it seems what is happening here? Is deleting the hubs.sds the only method to fixing this issue? 

    ------------------------------
    Daniel Blanco
    Enterprise Tools Team Architect
    DBlanco@alphaserveit.com
    ------------------------------


  • 2.  RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

    Broadcom Employee
    Posted Dec 04, 2019 10:25 AM
    So we need some more information.
    Is the primary hub windows?
    Is the IM you are using running on the primary hub or a desktop?
    Are thew new hubs connect via a tunnel to any other hubs?
    Does the primary hub have direct access to the hubs?

    there should be no need to remove the robots.sds for this issue, only the hub.sds
    the robots should reappear on their own but there is no documented specific time interval for this.
    if you want it to happen immediately a robot restart will be required.


    ------------------------------
    Gene Howard
    Principal Support Engineer
    Broadcom
    ------------------------------



  • 3.  RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

    Posted Dec 04, 2019 10:47 AM
    Hi Gene,
    > Primary Hub is Windows
    > The IM session is on my desktop but I tired this even on the Pri-hub box and same issue. Also about 10 others reported same issue from my team.
    > All client hubs and these now 3 client hubs are tunnel clients. We've added different client hubs since then and they work. 
    > The primary hub doesn't have direct access to these boxes nor the 100+ other client hubs. All thru Tunnel(s)

    Just FYI it was mentioned that the problem started happening after we started using the UMP to deploy robots to machines in their environment after discovery. If that helps with possibly root cause of the issue.
    We also are seeing hubs that we off boarding, retired shut down last week still appearing in the IM list no matter what we do. We've tried deleting/REMOVING them many times on all hubs (primary, hubcol, tunner servers) and they keep coming back. There is no reference to these hubs anywhere and all their respective get queues to these retired hubs were deleted as well. This looks like a corrupted hubs.sds file I think. This is our first time ever this happening. 

    Primary Hub 

    ^

    Hub Collector (uimcol)

    ^

    Tunnel Server (6 of these) - UIMHUB1|2|3|4|5|6

    ^

    Tunnel Client Hubs (100+)



    ------------------------------
    Daniel Blanco
    Enterprise Tools Team Architect
    DBlanco@alphaserveit.com
    ------------------------------



  • 4.  RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub
    Best Answer

    Posted Dec 04, 2019 11:11 AM
    This is pretty much a constant experience with my environment.

    With the way the Nimbus network is maintained, where every hub has a copy of everything, if one of those hubs gets a little bit of wrong information it will keep reinfecting that incorrectness into your network. The routing bit is based on the proximity value - essentially every hub in a hub's list of known hubs has a counter of how many hops away it is and which hub is the next hop. And a given hub is constantly listening for updates to this hub network and whenever it gets an update with a smaller proximity it updates the information it has with the new lower proximity information.  

    This all works great if your network is small, network latency is nonexistent, and hubs never crash or corrupt this data. 

    One of the problems I have is that at one point a hub set the proximity value of another hub to zero and sent that out into my network. It's impossible to have a proximity less than zero and so this hub will never get updated regardless how wrong it's information is. The problem is that it should have had a proximity of 2 which had put it in the path of the route to a fair number of other hubs. 

    This in itself is not a problem but then that hub was decommissioned. But it had a proximity of zero and so was the most favored next hop wherever it had been a possible hop. And there was no way to get it out of the network.

    Currently the only solution that has been proposed to fix this is to stop UIM in its entirety across my network, delete hubs.sds and robots.sds wherever they exist, and then restart things from the central hub out to the remote systems. The idea being that UIM builds the network of systems on the fly and that if it starts from a clean slate then it will eventually build a correct network. The kicker though is that once all this effort is gone through, there's no guarantee that the same problem won't reoccur and worse, if someone out there had a server that was offline/retired/shut down and then brings it back after all this, that bad information might get reintroduced and you're back where you started.

    What I can tell you with IM not reaching things is that it settles down over time - in a couple weeks to a month it will probably be working the way you hope for the hubs that are problematic today. And that the idea of logging into the hub closest to the hub/robot you are trying to interact with is usual practice for my team - you're about 10x more likely to connect to a given hub connected to it's tunnel server than the central hub for instance. 

    The other thing is that most of the underlying infrastructure is pretty resilient to this - get queues for instance are automatically taking advantage of this "closest hub" thing so while you might not be able to go from your central hub  out the several hops to the leaf hub, the get queues between each pair of hubs are much more likely to be working.


  • 5.  RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

    Posted Dec 04, 2019 11:46 AM
    Ok I'm good now. I just did this on our primary hub. Stopped Nimsoft, deleted the hubs.sds file and then started back  up.

    W/in a few minutes all the correct hubs showed up and all the old broken ones were gone and the new hubs were accessible now. 

    Garin I hear ya. I wish there were probe utility calls that would allow you to fix, correct this but there doesn't seem to be any. A visual hub route map would be great if it existed. 



    ------------------------------
    Daniel Blanco
    Enterprise Tools Team Architect
    DBlanco@alphaserveit.com
    ------------------------------



  • 6.  RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

    Posted Dec 04, 2019 04:45 PM
    I wrote this "traceroute" tool - takes a destination and starting hub and displays the path:

    depth = 0
    
    function TimeStamp(stuff)
       print (string.format("%s %s", timestamp.format(timestamp.now(), "%X"), (stuff or "(nil)")))
    end
    
    function tdump(t)
       local function dmp(t, l, k)
          if type(t) == "table" then
             print(string.format("%s%s:", string.rep(" ", l*2), tostring(k)))
             for k, v in pairs(t) do
                dmp(v, l+1, k)
             end
          else
             print(string.format("%s%s:%s", string.rep(" ", l*2), tostring(k), tostring(t)))
          end
       end
       dmp(t, 1, "root")
    end
    
    function error_text(e)
       local x
    
       if ( e == nil ) then
          x = "error is nil"
       elseif ( e == NIME_OK ) then
          x = "OK"
       elseif ( e == NIME_ERROR ) then
          x = "error"
       elseif ( e == NIME_COMERR ) then
          x = "communication error"
       elseif ( e == NIME_INVAL ) then
          x = "invalid argument"
       elseif ( e == NIME_NOENT ) then
          x = "not found"
       elseif ( e == NIME_ISENT ) then
          x = "already defined"
       elseif ( e == NIME_ACCESS ) then
          x = "permission denied"
       elseif ( e == NIME_AGAIN ) then
          x = "temporarily out of resources"
       elseif ( e == NIME_NOMEM ) then
          x = "out of resources"
       elseif ( e == NIME_NOSPC ) then
          x = "no space left"
       elseif ( e == NIME_EPIPE ) then
          x = "broken connection"
       elseif ( e == NIME_NOCMD ) then
          x = "command not found"
       elseif ( e == NIME_LOGIN ) then
          x = "login failed"
       elseif ( e == NIME_SIDEXP ) then
          x = "SID expired"
       elseif ( e == NIME_ILLMAC ) then
          x = "illegal MAC"
       elseif ( e == NIME_ILLSID ) then
          x = "illegal SID"
       elseif ( e == NIME_SIDSESS ) then
          x = "Session id for hub is invalid"
       elseif ( e == NIME_EXPIRED ) then
          x = "Expired"
       elseif ( e == NIME_NOLIC ) then
          x = "No valid license"
       elseif ( e == NIME_INVLIC ) then
          x = "Invalid license"
       elseif ( e == NIME_ILLLIC ) then
          x = "Illegal license"
       elseif ( e == NIME_INVOP ) then
          x = "Invalid operation finv"
       else  --if ( e >= NIME_USER ) then
          x = "user error from this value and up"
       end
    
       return x
    end
    
    function NimbusRequest (address, command, arguments, retries, noisy, delay)
       local counter, response, retcode
    
       if ( retries == nil ) then
          retries = 1
       end
    
       if (noisy == nil) then
          noisy = 1
       end
    
       if (delay == nil) then
          delay = 1 * 1000
       end
    
       counter = retries
    
       repeat
          if ( noisy == 1 ) then
             TimeStamp ( "Sending " .. command .. " to " .. address)
          end
          response, retcode = nimbus.request(address, command, arguments)
          if ( retcode ~= NIME_OK ) then
             -- counter = counter - 1
             if ( noisy == 1 ) then
                TimeStamp ( command .. " Failed for " .. address .. " Error code (" .. retcode .. ") " .. error_text(retcode) )
                --tdump(response)
             end
             sleep(delay)
          end
    
          counter = counter - 1
       until ( retcode == NIME_OK or counter == 0 )
    
       if ( retcode == NIME_OK ) then
          if ( noisy == 1 ) then
             TimeStamp ( command .. " successful for " .. address )
          end
       else
          TimeStamp ( command .. " retries exhausted for " .. address .. " Error code " .. error_text(retcode) )
       end
    
       return response, retcode
    end
    
    
    
    function HubList(addr)
       local gethubs, rc
       local h = {}
       local key, value
    
       if ( addr == nil ) then
          addr = "hub"
       end
    
       gethubs, rc = NimbusRequest(addr,"gethubs", nil, 1, 1, 1000)
       if (rc ~= 0 ) then
          TimeStamp("Error " .. rc .. " : " .. error_text(rc))
          return nil
       end
    
       for key, value in pairs(gethubs.hublist) do
          h[value.name] = { name = value.name, addr = value.addr, source = value.source, proximity = value.proximity, robotname = value.robotname }
          if ( value.proximity == 0 ) then
             h[value.name].source = value.addr
          end
       end
    
       for key, value in pairs(gethubs.hublist) do
          if ( h[value.name].proximity > 0 ) then
             local PathParts = split ( h[value.name].source, "/")
             local HubName = PathParts[2]
    
             if (h[HubName] ~= nil) then
                h[value.name].source = h[value.name].source .. "/" .. h[HubName].robotname .. "/hub"
             else
                TimeStamp("Unable to locate hub " .. HubName)
                h[value.name].source = h[value.name].source .. "/" .. "ERROR"
             end
          end
       end
    
       return h
    end
    
    function FindNextHop(dest, current_hop)
       local hubs
       local PathParts = split ( current_hop, "/")
       local HubName = PathParts[2]
    
       if ( depth == 0 ) then
          HopList = {}
       end
       TimeStamp(" ")
       TimeStamp ("Hop: " .. current_hop .. " at depth " .. depth)
       if ( HopList[current_hop] == nil ) then
          HopList[current_hop] = depth
       else
          TimeStamp("Encountered circular routing to " .. current_hop)
          tdump(HopList)
          return 1
       end
    
       depth = depth + 1
       if (depth > 30) then
          TimeStamp( "Reached max depth of " .. depth)
       elseif (dest == HubName) then
          TimeStamp( "Found")
       else
          hubs = HubList(current_hop)
          if ( hubs[dest] == nil ) then
             TimeStamp("Hub hop " .. current_hop .. " doesn't know about " .. dest)
          else
             TimeStamp("Proximity to " .. dest .. " is " .. hubs[dest].proximity .. " via " .. hubs[dest].source)
             rc = FindNextHop(dest, hubs[dest].source)
             if ( rc == 1 ) then
                return 1
             end
          end
       end
    
       return 0
    end
     
    HopList = {}
    
    test_hub = "DestHubHere"
    depth = 0
    FindNextHop(test_hub, "/Domain/hub/startingrobot/hub")
    ​



  • 7.  RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

    Posted Dec 05, 2019 10:54 AM
    Nice. What is this written in? How do I run this? Just copy this into the nas a new script or some other way? 
    TIA...

    ------------------------------
    Daniel Blanco
    Enterprise Tools Team Architect
    DBlanco@alphaserveit.com
    ------------------------------



  • 8.  RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

    Posted Dec 05, 2019 11:32 AM
    It's a Lua script - just open nas, select the Auto-Operator/scripts tab, right click in the white space and select "New -> script"

    Paste this in.

    The last couple lines define the destination hub and the starting point. Update those to match your needs.


  • 9.  RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

    Posted Dec 05, 2019 01:25 PM
    Ok tried running this on my laptop and it failed. Then directly off the primary hub and also failed.
    Getting:
    ----------- Executing script at 12/5/2019 1:19:05 PM ----------

    1:19:04 PM
    1:19:04 PM Hop: /NMS/Onxxx/xxxxnms1/hub at depth 0
    1:19:04 PM Sending gethubs to /NMS/Onxxxx/xxxxnms1/hub
    1:19:04 PM gethubs Failed for /NMS/Onxxxx/xxxxnms1/hub Error code (2) communication error
    1:19:05 PM gethubs retries exhausted for /NMS/Onxxxx/xxxxnms1/hub Error code communication error
    1:19:05 PM Error 2 : communication error
    Error in line 186: attempt to index local 'hubs' (a nil value)


    Line 186 is:
    >>> if ( hubs[dest] == nil ) then
    TimeStamp("Hub hop " .. current_hop .. " doesn't know about " .. dest)
    else


    On the last few lines I specified:
    --== MAIN BEGIN:
    HopList = {}
    --Enter the Hub you want to test
    test_hub = "Alphaserve"
    depth = 0
    --Specify the starting hub you want to test from here
    --User full /DOMAIN/HUB/ROBOT/hub address
    --FindNextHop(test_hub, "/Domain/hub/startingrobot/hub")
    FindNextHop(test_hub, "/NMS/Onxxx/xxxxnms1/hub")


    ------------------------------
    Daniel Blanco
    Enterprise Tools Team Architect
    DBlanco@alphaserveit.com
    ------------------------------



  • 10.  RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

    Posted Dec 05, 2019 01:38 PM
    So, from wherever you are running this, the hub /NMS/Onxxx/xxxxnms1/hub isn't reachable. 

    The name is apparently valid because the error is "Error code (2) communication error", if the name was completely bad, it would have been Error Code (4) not found.

    Once your network crests a particular size (for me it was when I had to move to using tunnel proxies because of the Windows subscriber limits) this gets more and more frequent.

    Is /NMS/Onxxx/xxxxnms1/hub where your nas is? If not, I'd start by changing the starting point to the hub where your nas is which should at least get you past the first gethubs callback.



  • 11.  RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

    Posted Dec 05, 2019 01:47 PM
    Yes the nms1 box is our primary hub and its where the nas is running. I tried running it of an IM session running off the nsm1 box. 
    I guess it cannot reach itself or maybe its the case sensitivity. Let me try that... 


    ------------------------------
    Daniel Blanco
    Enterprise Tools Team Architect
    DBlanco@alphaserveit.com
    ------------------------------



  • 12.  RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

    Posted Dec 05, 2019 01:54 PM
    Okay yes that was it. Its case sensitive. It's now working.. Cool script:
    (Of course renaming hubs and robots)

    ----------- Executing script at 12/5/2019 1:48:36 PM ----------

    1:48:36 PM
    1:48:36 PM Hop: /NMS/On-Prem/NMS1/hub at depth 0
    1:48:36 PM Sending gethubs to /NMS/On-Prem/NMS1/hub
    1:48:36 PM gethubs successful for /NMS/On-Prem/NMS1/hub
    1:48:36 PM Proximity to CLIENT_B is 1 via /NMS/UIMHUB5/uimhub5/hub
    1:48:36 PM
    1:48:36 PM Hop: /NMS/UIMHUB5/uimhub5/hub at depth 1
    1:48:36 PM Sending gethubs to /NMS/UIMHUB5/uimhub5/hub
    1:48:36 PM gethubs successful for /NMS/UIMHUB5/uimhub5/hub
    1:48:36 PM Proximity to CLIENT_B is 0 via /NMS/CLIENT_B/clientBhub/hub
    1:48:36 PM
    1:48:36 PM Hop: /NMS/CLIENT_B/clientBhub/hub at depth 2
    1:48:36 PM Found


    ----------- Executing script at 12/5/2019 1:47:51 PM ----------

    1:47:50 PM
    1:47:50 PM Hop: /NMS/On-Prem/NMS1/hub at depth 0
    1:47:50 PM Sending gethubs to /NMS/On-Prem/NMS1/hub
    1:47:51 PM gethubs successful for /NMS/On-Prem/NMS1/hub
    1:47:51 PM Proximity to Alphaserve is 0 via /NMS/Alphaserve/hub1/hub
    1:47:51 PM
    1:47:51 PM Hop: /NMS/Alphaserve/hub1/hub at depth 1
    1:47:51 PM Found

    Thanks....

    ------------------------------
    Daniel Blanco
    Enterprise Tools Team Architect
    DBlanco@alphaserveit.com
    ------------------------------



  • 13.  RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

    Posted Dec 05, 2019 02:00 PM
    Hope it helps the next time you have issues. Might give you a better place to start looking at resetting things other than your central hub.


  • 14.  RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

    Posted Dec 05, 2019 01:58 PM
    I created the script from what I posted and it worked. 

    I corrupted the case of my domain in the last line of the script and got the error 2 as opposed to 4 which I would have expected.

    I corrupted the name of the hub and it still worked which was completely unexpected. I wonder if the nimbus request is falling back to the local hub if the specified hub isn't reachable.

    I corrupted the name of the robot and got a a valid gethubs return (again completely unexpected) and a legitimate error message.

    So, I think you are on track in checking the case.

    -Garin



  • 15.  RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

    Posted Dec 11, 2019 11:12 AM
    Edited by Daniel Blanco Dec 11, 2019 12:15 PM
    So this problem started happening again and it happened 3x this morning. To clarify the issue:


    So the issue is before it's happening, when logged into IM or Admin console we can access these new client hubs. In this example the Client-Z  hubs are reachable. We can get to them in IM and we can the robots under their hub. 

    After our engineer tries to deploy robots to discovered machines under Client-Z things break. When logged into IM and connected to our primary hub the new clients hub Client-Z is RED. They are down and not accessible. But if we log onto another hub, a tunnel server or our hub collector hub, the Client-Z hub then works. We can access it.
    So somehow the Client-Z hub's address/location is getting corrupted/broken on our primary hub. Its not until we delete the hubs.sds file on the primary hub and re-start that then Client-Z is accessible again in IM or Admin Console.

    So the engineer did the stop nimsoft on pri-hub, deleted the hubs.sds file and restarted. Waited and eventually all hubs showed up. He was able to access Client-Z. He then re-tried deploying and it broke again.

    We opened a new case on this. 20140704 HUB's showing down under Primary HUB(On-Prem)

    Anyone ever see this happen to them? This never happened in 8.x but just started happening in 9.x as we hit now 5x.

    We are running latest of robot v9.20HF7 and hub v9.20HF6


  • 16.  RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

    Posted Feb 06, 2020 11:31 AM
    So this happened again today. We deployed the robot to a hub site and the primary hub's SDS got corrupted. When in IM and connected to the primary hub we now cannot access the hub we deployed the robot to using the UMP deploy robot feature. This is definitely a defect. 

    Using Garin's script to check if the hub can reach the broken hub it shows it works but in IM the hub is red and won't open but if we log into any of the tunnel servers we can access that hub in IM. 
    10:35:24 AM
    10:35:24 AM Hop: /NMS/On-Prem/NMS1/hub at depth 0
    10:35:24 AM Sending gethubs to /NMS/On-Prem/NMS1/hub
    10:35:24 AM gethubs successful for /NMS/On-Prem/NMS1/hub
    10:35:24 AM Proximity to ClientA is 0 via /NMS/ClientA/clientA-asrelay/hub
    10:35:24 AM
    10:35:24 AM Hop: /NMS/ClientA/clientA-asrelay/hub at depth 1
    10:35:24 AM Found
    Anyone else hit this issue?


    ------------------------------
    Daniel Blanco
    Enterprise Tools Team Architect
    DBlanco@alphaserveit.com
    ------------------------------