Tunnel Clients not accessible from PriHub but can access via TunnelHub

5. RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Recommend

Daniel Blanco

Posted Dec 04, 2019 11:46 AM

Ok I'm good now. I just did this on our primary hub. Stopped Nimsoft, deleted the hubs.sds file and then started back up.

W/in a few minutes all the correct hubs showed up and all the old broken ones were gone and the new hubs were accessible now.

Garin I hear ya. I wish there were probe utility calls that would allow you to fix, correct this but there doesn't seem to be any. A visual hub route map would be great if it existed.

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com
------------------------------

Original Message

6. RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Recommend

Garin Walsh

Posted Dec 04, 2019 04:45 PM

I wrote this "traceroute" tool - takes a destination and starting hub and displays the path:

depth = 0

function TimeStamp(stuff)
   print (string.format("%s %s", timestamp.format(timestamp.now(), "%X"), (stuff or "(nil)")))
end

function tdump(t)
   local function dmp(t, l, k)
      if type(t) == "table" then
         print(string.format("%s%s:", string.rep(" ", l*2), tostring(k)))
         for k, v in pairs(t) do
            dmp(v, l+1, k)
         end
      else
         print(string.format("%s%s:%s", string.rep(" ", l*2), tostring(k), tostring(t)))
      end
   end
   dmp(t, 1, "root")
end

function error_text(e)
   local x

   if ( e == nil ) then
      x = "error is nil"
   elseif ( e == NIME_OK ) then
      x = "OK"
   elseif ( e == NIME_ERROR ) then
      x = "error"
   elseif ( e == NIME_COMERR ) then
      x = "communication error"
   elseif ( e == NIME_INVAL ) then
      x = "invalid argument"
   elseif ( e == NIME_NOENT ) then
      x = "not found"
   elseif ( e == NIME_ISENT ) then
      x = "already defined"
   elseif ( e == NIME_ACCESS ) then
      x = "permission denied"
   elseif ( e == NIME_AGAIN ) then
      x = "temporarily out of resources"
   elseif ( e == NIME_NOMEM ) then
      x = "out of resources"
   elseif ( e == NIME_NOSPC ) then
      x = "no space left"
   elseif ( e == NIME_EPIPE ) then
      x = "broken connection"
   elseif ( e == NIME_NOCMD ) then
      x = "command not found"
   elseif ( e == NIME_LOGIN ) then
      x = "login failed"
   elseif ( e == NIME_SIDEXP ) then
      x = "SID expired"
   elseif ( e == NIME_ILLMAC ) then
      x = "illegal MAC"
   elseif ( e == NIME_ILLSID ) then
      x = "illegal SID"
   elseif ( e == NIME_SIDSESS ) then
      x = "Session id for hub is invalid"
   elseif ( e == NIME_EXPIRED ) then
      x = "Expired"
   elseif ( e == NIME_NOLIC ) then
      x = "No valid license"
   elseif ( e == NIME_INVLIC ) then
      x = "Invalid license"
   elseif ( e == NIME_ILLLIC ) then
      x = "Illegal license"
   elseif ( e == NIME_INVOP ) then
      x = "Invalid operation finv"
   else  --if ( e >= NIME_USER ) then
      x = "user error from this value and up"
   end

   return x
end

function NimbusRequest (address, command, arguments, retries, noisy, delay)
   local counter, response, retcode

   if ( retries == nil ) then
      retries = 1
   end

   if (noisy == nil) then
      noisy = 1
   end

   if (delay == nil) then
      delay = 1 * 1000
   end

   counter = retries

   repeat
      if ( noisy == 1 ) then
         TimeStamp ( "Sending " .. command .. " to " .. address)
      end
      response, retcode = nimbus.request(address, command, arguments)
      if ( retcode ~= NIME_OK ) then
         -- counter = counter - 1
         if ( noisy == 1 ) then
            TimeStamp ( command .. " Failed for " .. address .. " Error code (" .. retcode .. ") " .. error_text(retcode) )
            --tdump(response)
         end
         sleep(delay)
      end

      counter = counter - 1
   until ( retcode == NIME_OK or counter == 0 )

   if ( retcode == NIME_OK ) then
      if ( noisy == 1 ) then
         TimeStamp ( command .. " successful for " .. address )
      end
   else
      TimeStamp ( command .. " retries exhausted for " .. address .. " Error code " .. error_text(retcode) )
   end

   return response, retcode
end



function HubList(addr)
   local gethubs, rc
   local h = {}
   local key, value

   if ( addr == nil ) then
      addr = "hub"
   end

   gethubs, rc = NimbusRequest(addr,"gethubs", nil, 1, 1, 1000)
   if (rc ~= 0 ) then
      TimeStamp("Error " .. rc .. " : " .. error_text(rc))
      return nil
   end

   for key, value in pairs(gethubs.hublist) do
      h[value.name] = { name = value.name, addr = value.addr, source = value.source, proximity = value.proximity, robotname = value.robotname }
      if ( value.proximity == 0 ) then
         h[value.name].source = value.addr
      end
   end

   for key, value in pairs(gethubs.hublist) do
      if ( h[value.name].proximity > 0 ) then
         local PathParts = split ( h[value.name].source, "/")
         local HubName = PathParts[2]

         if (h[HubName] ~= nil) then
            h[value.name].source = h[value.name].source .. "/" .. h[HubName].robotname .. "/hub"
         else
            TimeStamp("Unable to locate hub " .. HubName)
            h[value.name].source = h[value.name].source .. "/" .. "ERROR"
         end
      end
   end

   return h
end

function FindNextHop(dest, current_hop)
   local hubs
   local PathParts = split ( current_hop, "/")
   local HubName = PathParts[2]

   if ( depth == 0 ) then
      HopList = {}
   end
   TimeStamp(" ")
   TimeStamp ("Hop: " .. current_hop .. " at depth " .. depth)
   if ( HopList[current_hop] == nil ) then
      HopList[current_hop] = depth
   else
      TimeStamp("Encountered circular routing to " .. current_hop)
      tdump(HopList)
      return 1
   end

   depth = depth + 1
   if (depth > 30) then
      TimeStamp( "Reached max depth of " .. depth)
   elseif (dest == HubName) then
      TimeStamp( "Found")
   else
      hubs = HubList(current_hop)
      if ( hubs[dest] == nil ) then
         TimeStamp("Hub hop " .. current_hop .. " doesn't know about " .. dest)
      else
         TimeStamp("Proximity to " .. dest .. " is " .. hubs[dest].proximity .. " via " .. hubs[dest].source)
         rc = FindNextHop(dest, hubs[dest].source)
         if ( rc == 1 ) then
            return 1
         end
      end
   end

   return 0
end
 
HopList = {}

test_hub = "DestHubHere"
depth = 0
FindNextHop(test_hub, "/Domain/hub/startingrobot/hub")

Original Message

7. RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Recommend

Daniel Blanco

Posted Dec 05, 2019 10:54 AM

Nice. What is this written in? How do I run this? Just copy this into the nas a new script or some other way?
TIA...

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com
------------------------------

Original Message

Original Message:
Sent: 12-04-2019 04:44 PM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

I wrote this "traceroute" tool - takes a destination and starting hub and displays the path:

depth = 0function TimeStamp(stuff)   print (string.format("%s %s", timestamp.format(timestamp.now(), "%X"), (stuff or "(nil)")))endfunction tdump(t)   local function dmp(t, l, k)      if type(t) == "table" then         print(string.format("%s%s:", string.rep(" ", l*2), tostring(k)))         for k, v in pairs(t) do            dmp(v, l+1, k)         end      else         print(string.format("%s%s:%s", string.rep(" ", l*2), tostring(k), tostring(t)))      end   end   dmp(t, 1, "root")endfunction error_text(e)   local x   if ( e == nil ) then      x = "error is nil"   elseif ( e == NIME_OK ) then      x = "OK"   elseif ( e == NIME_ERROR ) then      x = "error"   elseif ( e == NIME_COMERR ) then      x = "communication error"   elseif ( e == NIME_INVAL ) then      x = "invalid argument"   elseif ( e == NIME_NOENT ) then      x = "not found"   elseif ( e == NIME_ISENT ) then      x = "already defined"   elseif ( e == NIME_ACCESS ) then      x = "permission denied"   elseif ( e == NIME_AGAIN ) then      x = "temporarily out of resources"   elseif ( e == NIME_NOMEM ) then      x = "out of resources"   elseif ( e == NIME_NOSPC ) then      x = "no space left"   elseif ( e == NIME_EPIPE ) then      x = "broken connection"   elseif ( e == NIME_NOCMD ) then      x = "command not found"   elseif ( e == NIME_LOGIN ) then      x = "login failed"   elseif ( e == NIME_SIDEXP ) then      x = "SID expired"   elseif ( e == NIME_ILLMAC ) then      x = "illegal MAC"   elseif ( e == NIME_ILLSID ) then      x = "illegal SID"   elseif ( e == NIME_SIDSESS ) then      x = "Session id for hub is invalid"   elseif ( e == NIME_EXPIRED ) then      x = "Expired"   elseif ( e == NIME_NOLIC ) then      x = "No valid license"   elseif ( e == NIME_INVLIC ) then      x = "Invalid license"   elseif ( e == NIME_ILLLIC ) then      x = "Illegal license"   elseif ( e == NIME_INVOP ) then      x = "Invalid operation finv"   else  --if ( e >= NIME_USER ) then      x = "user error from this value and up"   end   return xendfunction NimbusRequest (address, command, arguments, retries, noisy, delay)   local counter, response, retcode   if ( retries == nil ) then      retries = 1   end   if (noisy == nil) then      noisy = 1   end   if (delay == nil) then      delay = 1 * 1000   end   counter = retries   repeat      if ( noisy == 1 ) then         TimeStamp ( "Sending " .. command .. " to " .. address)      end      response, retcode = nimbus.request(address, command, arguments)      if ( retcode ~= NIME_OK ) then         -- counter = counter - 1         if ( noisy == 1 ) then            TimeStamp ( command .. " Failed for " .. address .. " Error code (" .. retcode .. ") " .. error_text(retcode) )            --tdump(response)         end         sleep(delay)      end      counter = counter - 1   until ( retcode == NIME_OK or counter == 0 )   if ( retcode == NIME_OK ) then      if ( noisy == 1 ) then         TimeStamp ( command .. " successful for " .. address )      end   else      TimeStamp ( command .. " retries exhausted for " .. address .. " Error code " .. error_text(retcode) )   end   return response, retcodeendfunction HubList(addr)   local gethubs, rc   local h = {}   local key, value   if ( addr == nil ) then      addr = "hub"   end   gethubs, rc = NimbusRequest(addr,"gethubs", nil, 1, 1, 1000)   if (rc ~= 0 ) then      TimeStamp("Error " .. rc .. " : " .. error_text(rc))      return nil   end   for key, value in pairs(gethubs.hublist) do      h[value.name] = { name = value.name, addr = value.addr, source = value.source, proximity = value.proximity, robotname = value.robotname }      if ( value.proximity == 0 ) then         h[value.name].source = value.addr      end   end   for key, value in pairs(gethubs.hublist) do      if ( h[value.name].proximity > 0 ) then         local PathParts = split ( h[value.name].source, "/")         local HubName = PathParts[2]         if (h[HubName] ~= nil) then            h[value.name].source = h[value.name].source .. "/" .. h[HubName].robotname .. "/hub"         else            TimeStamp("Unable to locate hub " .. HubName)            h[value.name].source = h[value.name].source .. "/" .. "ERROR"         end      end   end   return hendfunction FindNextHop(dest, current_hop)   local hubs   local PathParts = split ( current_hop, "/")   local HubName = PathParts[2]   if ( depth == 0 ) then      HopList = {}   end   TimeStamp(" ")   TimeStamp ("Hop: " .. current_hop .. " at depth " .. depth)   if ( HopList[current_hop] == nil ) then      HopList[current_hop] = depth   else      TimeStamp("Encountered circular routing to " .. current_hop)      tdump(HopList)      return 1   end   depth = depth + 1   if (depth > 30) then      TimeStamp( "Reached max depth of " .. depth)   elseif (dest == HubName) then      TimeStamp( "Found")   else      hubs = HubList(current_hop)      if ( hubs[dest] == nil ) then         TimeStamp("Hub hop " .. current_hop .. " doesn't know about " .. dest)      else         TimeStamp("Proximity to " .. dest .. " is " .. hubs[dest].proximity .. " via " .. hubs[dest].source)         rc = FindNextHop(dest, hubs[dest].source)         if ( rc == 1 ) then            return 1         end      end   end   return 0end HopList = {}test_hub = "DestHubHere"depth = 0FindNextHop(test_hub, "/Domain/hub/startingrobot/hub")

Original Message:
Sent: 12-04-2019 11:45 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Ok I'm good now. I just did this on our primary hub. Stopped Nimsoft, deleted the hubs.sds file and then started back up.

W/in a few minutes all the correct hubs showed up and all the old broken ones were gone and the new hubs were accessible now.

Garin I hear ya. I wish there were probe utility calls that would allow you to fix, correct this but there doesn't seem to be any. A visual hub route map would be great if it existed.

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 11:11 AM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

This is pretty much a constant experience with my environment.

With the way the Nimbus network is maintained, where every hub has a copy of everything, if one of those hubs gets a little bit of wrong information it will keep reinfecting that incorrectness into your network. The routing bit is based on the proximity value - essentially every hub in a hub's list of known hubs has a counter of how many hops away it is and which hub is the next hop. And a given hub is constantly listening for updates to this hub network and whenever it gets an update with a smaller proximity it updates the information it has with the new lower proximity information.

This all works great if your network is small, network latency is nonexistent, and hubs never crash or corrupt this data.

One of the problems I have is that at one point a hub set the proximity value of another hub to zero and sent that out into my network. It's impossible to have a proximity less than zero and so this hub will never get updated regardless how wrong it's information is. The problem is that it should have had a proximity of 2 which had put it in the path of the route to a fair number of other hubs.

This in itself is not a problem but then that hub was decommissioned. But it had a proximity of zero and so was the most favored next hop wherever it had been a possible hop. And there was no way to get it out of the network.

Currently the only solution that has been proposed to fix this is to stop UIM in its entirety across my network, delete hubs.sds and robots.sds wherever they exist, and then restart things from the central hub out to the remote systems. The idea being that UIM builds the network of systems on the fly and that if it starts from a clean slate then it will eventually build a correct network. The kicker though is that once all this effort is gone through, there's no guarantee that the same problem won't reoccur and worse, if someone out there had a server that was offline/retired/shut down and then brings it back after all this, that bad information might get reintroduced and you're back where you started.

What I can tell you with IM not reaching things is that it settles down over time - in a couple weeks to a month it will probably be working the way you hope for the hubs that are problematic today. And that the idea of logging into the hub closest to the hub/robot you are trying to interact with is usual practice for my team - you're about 10x more likely to connect to a given hub connected to it's tunnel server than the central hub for instance.

The other thing is that most of the underlying infrastructure is pretty resilient to this - get queues for instance are automatically taking advantage of this "closest hub" thing so while you might not be able to go from your central hub out the several hops to the leaf hub, the get queues between each pair of hubs are much more likely to be working.
Original Message:
Sent: 12-04-2019 10:47 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Hi Gene,
> Primary Hub is Windows
> The IM session is on my desktop but I tired this even on the Pri-hub box and same issue. Also about 10 others reported same issue from my team.
> All client hubs and these now 3 client hubs are tunnel clients. We've added different client hubs since then and they work.
> The primary hub doesn't have direct access to these boxes nor the 100+ other client hubs. All thru Tunnel(s)

Just FYI it was mentioned that the problem started happening after we started using the UMP to deploy robots to machines in their environment after discovery. If that helps with possibly root cause of the issue.
We also are seeing hubs that we off boarding, retired shut down last week still appearing in the IM list no matter what we do. We've tried deleting/REMOVING them many times on all hubs (primary, hubcol, tunner servers) and they keep coming back. There is no reference to these hubs anywhere and all their respective get queues to these retired hubs were deleted as well. This looks like a corrupted hubs.sds file I think. This is our first time ever this happening.

Primary Hub

^

Hub Collector (uimcol)

^

Tunnel Server (6 of these) - UIMHUB1|2|3|4|5|6

^

Tunnel Client Hubs (100+)

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 10:25 AM
From: Gene HOWARD
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

So we need some more information.
Is the primary hub windows?
Is the IM you are using running on the primary hub or a desktop?
Are thew new hubs connect via a tunnel to any other hubs?
Does the primary hub have direct access to the hubs?

there should be no need to remove the robots.sds for this issue, only the hub.sds
the robots should reappear on their own but there is no documented specific time interval for this.
if you want it to happen immediately a robot restart will be required.

------------------------------
Gene Howard
Principal Support Engineer
Broadcom

Original Message:
Sent: 12-03-2019 04:34 PM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

So this just started happening recently where we are setting up new hubs and when logged into the Primary Hub thru IM or in Admin Console these new Tunnel Clients are Red. They are in-accessible but if from w/in IM I r-click and Login to a tunnel server they can reach these new tunnel client hub entries.
I have already tried doing a remove of these tunnel clients from w/in all 8 of the hub's probe's hub's tab on the two tunnel client hubs. When they were re-created, the primary hub still could not find a path to them. They are still RED in IM and in Admin Console.
The r-click check access, check transfer just times out from w/in the hub probes hub's tab. Both entries are red.

In my support case (20127165) it was suggested to follow this KB:

Article title: Remote Hub is offline and unreachablea

which say to stop Nimsoft, and delete the hubs.sds and robots.sds file and then restart.

I am trying this in my lab but I'm noticing that no robots are re-appearing under the primary hub afterwards. It's been almost 30 min and so far all 12 robots in lab are still gone from under primary hub.
If I forcibly do a stop/start on the nimsoft service the robots appear but I cannot do this in PROD. Will these robots eventually check back in at some point with the pri-hub in lab?
Also what other options do I have in order to fix this hub corruption as it seems what is happening here? Is deleting the hubs.sds the only method to fixing this issue?

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com
------------------------------

8. RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Recommend

Garin Walsh

Posted Dec 05, 2019 11:32 AM

It's a Lua script - just open nas, select the Auto-Operator/scripts tab, right click in the white space and select "New -> script"

Paste this in.

The last couple lines define the destination hub and the starting point. Update those to match your needs.

Original Message

Original Message:
Sent: 12-05-2019 10:54 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Nice. What is this written in? How do I run this? Just copy this into the nas a new script or some other way?
TIA...

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 04:44 PM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

I wrote this "traceroute" tool - takes a destination and starting hub and displays the path:

depth = 0function TimeStamp(stuff)   print (string.format("%s %s", timestamp.format(timestamp.now(), "%X"), (stuff or "(nil)")))endfunction tdump(t)   local function dmp(t, l, k)      if type(t) == "table" then         print(string.format("%s%s:", string.rep(" ", l*2), tostring(k)))         for k, v in pairs(t) do            dmp(v, l+1, k)         end      else         print(string.format("%s%s:%s", string.rep(" ", l*2), tostring(k), tostring(t)))      end   end   dmp(t, 1, "root")endfunction error_text(e)   local x   if ( e == nil ) then      x = "error is nil"   elseif ( e == NIME_OK ) then      x = "OK"   elseif ( e == NIME_ERROR ) then      x = "error"   elseif ( e == NIME_COMERR ) then      x = "communication error"   elseif ( e == NIME_INVAL ) then      x = "invalid argument"   elseif ( e == NIME_NOENT ) then      x = "not found"   elseif ( e == NIME_ISENT ) then      x = "already defined"   elseif ( e == NIME_ACCESS ) then      x = "permission denied"   elseif ( e == NIME_AGAIN ) then      x = "temporarily out of resources"   elseif ( e == NIME_NOMEM ) then      x = "out of resources"   elseif ( e == NIME_NOSPC ) then      x = "no space left"   elseif ( e == NIME_EPIPE ) then      x = "broken connection"   elseif ( e == NIME_NOCMD ) then      x = "command not found"   elseif ( e == NIME_LOGIN ) then      x = "login failed"   elseif ( e == NIME_SIDEXP ) then      x = "SID expired"   elseif ( e == NIME_ILLMAC ) then      x = "illegal MAC"   elseif ( e == NIME_ILLSID ) then      x = "illegal SID"   elseif ( e == NIME_SIDSESS ) then      x = "Session id for hub is invalid"   elseif ( e == NIME_EXPIRED ) then      x = "Expired"   elseif ( e == NIME_NOLIC ) then      x = "No valid license"   elseif ( e == NIME_INVLIC ) then      x = "Invalid license"   elseif ( e == NIME_ILLLIC ) then      x = "Illegal license"   elseif ( e == NIME_INVOP ) then      x = "Invalid operation finv"   else  --if ( e >= NIME_USER ) then      x = "user error from this value and up"   end   return xendfunction NimbusRequest (address, command, arguments, retries, noisy, delay)   local counter, response, retcode   if ( retries == nil ) then      retries = 1   end   if (noisy == nil) then      noisy = 1   end   if (delay == nil) then      delay = 1 * 1000   end   counter = retries   repeat      if ( noisy == 1 ) then         TimeStamp ( "Sending " .. command .. " to " .. address)      end      response, retcode = nimbus.request(address, command, arguments)      if ( retcode ~= NIME_OK ) then         -- counter = counter - 1         if ( noisy == 1 ) then            TimeStamp ( command .. " Failed for " .. address .. " Error code (" .. retcode .. ") " .. error_text(retcode) )            --tdump(response)         end         sleep(delay)      end      counter = counter - 1   until ( retcode == NIME_OK or counter == 0 )   if ( retcode == NIME_OK ) then      if ( noisy == 1 ) then         TimeStamp ( command .. " successful for " .. address )      end   else      TimeStamp ( command .. " retries exhausted for " .. address .. " Error code " .. error_text(retcode) )   end   return response, retcodeendfunction HubList(addr)   local gethubs, rc   local h = {}   local key, value   if ( addr == nil ) then      addr = "hub"   end   gethubs, rc = NimbusRequest(addr,"gethubs", nil, 1, 1, 1000)   if (rc ~= 0 ) then      TimeStamp("Error " .. rc .. " : " .. error_text(rc))      return nil   end   for key, value in pairs(gethubs.hublist) do      h[value.name] = { name = value.name, addr = value.addr, source = value.source, proximity = value.proximity, robotname = value.robotname }      if ( value.proximity == 0 ) then         h[value.name].source = value.addr      end   end   for key, value in pairs(gethubs.hublist) do      if ( h[value.name].proximity > 0 ) then         local PathParts = split ( h[value.name].source, "/")         local HubName = PathParts[2]         if (h[HubName] ~= nil) then            h[value.name].source = h[value.name].source .. "/" .. h[HubName].robotname .. "/hub"         else            TimeStamp("Unable to locate hub " .. HubName)            h[value.name].source = h[value.name].source .. "/" .. "ERROR"         end      end   end   return hendfunction FindNextHop(dest, current_hop)   local hubs   local PathParts = split ( current_hop, "/")   local HubName = PathParts[2]   if ( depth == 0 ) then      HopList = {}   end   TimeStamp(" ")   TimeStamp ("Hop: " .. current_hop .. " at depth " .. depth)   if ( HopList[current_hop] == nil ) then      HopList[current_hop] = depth   else      TimeStamp("Encountered circular routing to " .. current_hop)      tdump(HopList)      return 1   end   depth = depth + 1   if (depth > 30) then      TimeStamp( "Reached max depth of " .. depth)   elseif (dest == HubName) then      TimeStamp( "Found")   else      hubs = HubList(current_hop)      if ( hubs[dest] == nil ) then         TimeStamp("Hub hop " .. current_hop .. " doesn't know about " .. dest)      else         TimeStamp("Proximity to " .. dest .. " is " .. hubs[dest].proximity .. " via " .. hubs[dest].source)         rc = FindNextHop(dest, hubs[dest].source)         if ( rc == 1 ) then            return 1         end      end   end   return 0end HopList = {}test_hub = "DestHubHere"depth = 0FindNextHop(test_hub, "/Domain/hub/startingrobot/hub")

Original Message:
Sent: 12-04-2019 11:45 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Ok I'm good now. I just did this on our primary hub. Stopped Nimsoft, deleted the hubs.sds file and then started back up.

W/in a few minutes all the correct hubs showed up and all the old broken ones were gone and the new hubs were accessible now.

Garin I hear ya. I wish there were probe utility calls that would allow you to fix, correct this but there doesn't seem to be any. A visual hub route map would be great if it existed.

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 11:11 AM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

This is pretty much a constant experience with my environment.

With the way the Nimbus network is maintained, where every hub has a copy of everything, if one of those hubs gets a little bit of wrong information it will keep reinfecting that incorrectness into your network. The routing bit is based on the proximity value - essentially every hub in a hub's list of known hubs has a counter of how many hops away it is and which hub is the next hop. And a given hub is constantly listening for updates to this hub network and whenever it gets an update with a smaller proximity it updates the information it has with the new lower proximity information.

This all works great if your network is small, network latency is nonexistent, and hubs never crash or corrupt this data.

One of the problems I have is that at one point a hub set the proximity value of another hub to zero and sent that out into my network. It's impossible to have a proximity less than zero and so this hub will never get updated regardless how wrong it's information is. The problem is that it should have had a proximity of 2 which had put it in the path of the route to a fair number of other hubs.

This in itself is not a problem but then that hub was decommissioned. But it had a proximity of zero and so was the most favored next hop wherever it had been a possible hop. And there was no way to get it out of the network.

Currently the only solution that has been proposed to fix this is to stop UIM in its entirety across my network, delete hubs.sds and robots.sds wherever they exist, and then restart things from the central hub out to the remote systems. The idea being that UIM builds the network of systems on the fly and that if it starts from a clean slate then it will eventually build a correct network. The kicker though is that once all this effort is gone through, there's no guarantee that the same problem won't reoccur and worse, if someone out there had a server that was offline/retired/shut down and then brings it back after all this, that bad information might get reintroduced and you're back where you started.

What I can tell you with IM not reaching things is that it settles down over time - in a couple weeks to a month it will probably be working the way you hope for the hubs that are problematic today. And that the idea of logging into the hub closest to the hub/robot you are trying to interact with is usual practice for my team - you're about 10x more likely to connect to a given hub connected to it's tunnel server than the central hub for instance.

The other thing is that most of the underlying infrastructure is pretty resilient to this - get queues for instance are automatically taking advantage of this "closest hub" thing so while you might not be able to go from your central hub out the several hops to the leaf hub, the get queues between each pair of hubs are much more likely to be working.
Original Message:
Sent: 12-04-2019 10:47 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Hi Gene,
> Primary Hub is Windows
> The IM session is on my desktop but I tired this even on the Pri-hub box and same issue. Also about 10 others reported same issue from my team.
> All client hubs and these now 3 client hubs are tunnel clients. We've added different client hubs since then and they work.
> The primary hub doesn't have direct access to these boxes nor the 100+ other client hubs. All thru Tunnel(s)

Just FYI it was mentioned that the problem started happening after we started using the UMP to deploy robots to machines in their environment after discovery. If that helps with possibly root cause of the issue.
We also are seeing hubs that we off boarding, retired shut down last week still appearing in the IM list no matter what we do. We've tried deleting/REMOVING them many times on all hubs (primary, hubcol, tunner servers) and they keep coming back. There is no reference to these hubs anywhere and all their respective get queues to these retired hubs were deleted as well. This looks like a corrupted hubs.sds file I think. This is our first time ever this happening.

Primary Hub

^

Hub Collector (uimcol)

^

Tunnel Server (6 of these) - UIMHUB1|2|3|4|5|6

^

Tunnel Client Hubs (100+)

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 10:25 AM
From: Gene HOWARD
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

So we need some more information.
Is the primary hub windows?
Is the IM you are using running on the primary hub or a desktop?
Are thew new hubs connect via a tunnel to any other hubs?
Does the primary hub have direct access to the hubs?

there should be no need to remove the robots.sds for this issue, only the hub.sds
the robots should reappear on their own but there is no documented specific time interval for this.
if you want it to happen immediately a robot restart will be required.

------------------------------
Gene Howard
Principal Support Engineer
Broadcom

Original Message:
Sent: 12-03-2019 04:34 PM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

So this just started happening recently where we are setting up new hubs and when logged into the Primary Hub thru IM or in Admin Console these new Tunnel Clients are Red. They are in-accessible but if from w/in IM I r-click and Login to a tunnel server they can reach these new tunnel client hub entries.
I have already tried doing a remove of these tunnel clients from w/in all 8 of the hub's probe's hub's tab on the two tunnel client hubs. When they were re-created, the primary hub still could not find a path to them. They are still RED in IM and in Admin Console.
The r-click check access, check transfer just times out from w/in the hub probes hub's tab. Both entries are red.

In my support case (20127165) it was suggested to follow this KB:

Article title: Remote Hub is offline and unreachablea

which say to stop Nimsoft, and delete the hubs.sds and robots.sds file and then restart.

I am trying this in my lab but I'm noticing that no robots are re-appearing under the primary hub afterwards. It's been almost 30 min and so far all 12 robots in lab are still gone from under primary hub.
If I forcibly do a stop/start on the nimsoft service the robots appear but I cannot do this in PROD. Will these robots eventually check back in at some point with the pri-hub in lab?
Also what other options do I have in order to fix this hub corruption as it seems what is happening here? Is deleting the hubs.sds the only method to fixing this issue?

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com
------------------------------

9. RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Recommend

Daniel Blanco

Posted Dec 05, 2019 01:25 PM

Ok tried running this on my laptop and it failed. Then directly off the primary hub and also failed.
Getting:
----------- Executing script at 12/5/2019 1:19:05 PM ----------

1:19:04 PM
1:19:04 PM Hop: /NMS/Onxxx/xxxxnms1/hub at depth 0
1:19:04 PM Sending gethubs to /NMS/Onxxxx/xxxxnms1/hub
1:19:04 PM gethubs Failed for /NMS/Onxxxx/xxxxnms1/hub Error code (2) communication error
1:19:05 PM gethubs retries exhausted for /NMS/Onxxxx/xxxxnms1/hub Error code communication error
1:19:05 PM Error 2 : communication error
Error in line 186: attempt to index local 'hubs' (a nil value)

Line 186 is:
>>> if ( hubs[dest] == nil ) then
TimeStamp("Hub hop " .. current_hop .. " doesn't know about " .. dest)
else

On the last few lines I specified:
--== MAIN BEGIN:
HopList = {}
--Enter the Hub you want to test
test_hub = "Alphaserve"
depth = 0
--Specify the starting hub you want to test from here
--User full /DOMAIN/HUB/ROBOT/hub address
--FindNextHop(test_hub, "/Domain/hub/startingrobot/hub")
FindNextHop(test_hub, "/NMS/Onxxx/xxxxnms1/hub")

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com
------------------------------

Original Message

Original Message:
Sent: 12-05-2019 11:31 AM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

It's a Lua script - just open nas, select the Auto-Operator/scripts tab, right click in the white space and select "New -> script"

Paste this in.

The last couple lines define the destination hub and the starting point. Update those to match your needs.
Original Message:
Sent: 12-05-2019 10:54 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Nice. What is this written in? How do I run this? Just copy this into the nas a new script or some other way?
TIA...

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 04:44 PM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

I wrote this "traceroute" tool - takes a destination and starting hub and displays the path:

depth = 0function TimeStamp(stuff)   print (string.format("%s %s", timestamp.format(timestamp.now(), "%X"), (stuff or "(nil)")))endfunction tdump(t)   local function dmp(t, l, k)      if type(t) == "table" then         print(string.format("%s%s:", string.rep(" ", l*2), tostring(k)))         for k, v in pairs(t) do            dmp(v, l+1, k)         end      else         print(string.format("%s%s:%s", string.rep(" ", l*2), tostring(k), tostring(t)))      end   end   dmp(t, 1, "root")endfunction error_text(e)   local x   if ( e == nil ) then      x = "error is nil"   elseif ( e == NIME_OK ) then      x = "OK"   elseif ( e == NIME_ERROR ) then      x = "error"   elseif ( e == NIME_COMERR ) then      x = "communication error"   elseif ( e == NIME_INVAL ) then      x = "invalid argument"   elseif ( e == NIME_NOENT ) then      x = "not found"   elseif ( e == NIME_ISENT ) then      x = "already defined"   elseif ( e == NIME_ACCESS ) then      x = "permission denied"   elseif ( e == NIME_AGAIN ) then      x = "temporarily out of resources"   elseif ( e == NIME_NOMEM ) then      x = "out of resources"   elseif ( e == NIME_NOSPC ) then      x = "no space left"   elseif ( e == NIME_EPIPE ) then      x = "broken connection"   elseif ( e == NIME_NOCMD ) then      x = "command not found"   elseif ( e == NIME_LOGIN ) then      x = "login failed"   elseif ( e == NIME_SIDEXP ) then      x = "SID expired"   elseif ( e == NIME_ILLMAC ) then      x = "illegal MAC"   elseif ( e == NIME_ILLSID ) then      x = "illegal SID"   elseif ( e == NIME_SIDSESS ) then      x = "Session id for hub is invalid"   elseif ( e == NIME_EXPIRED ) then      x = "Expired"   elseif ( e == NIME_NOLIC ) then      x = "No valid license"   elseif ( e == NIME_INVLIC ) then      x = "Invalid license"   elseif ( e == NIME_ILLLIC ) then      x = "Illegal license"   elseif ( e == NIME_INVOP ) then      x = "Invalid operation finv"   else  --if ( e >= NIME_USER ) then      x = "user error from this value and up"   end   return xendfunction NimbusRequest (address, command, arguments, retries, noisy, delay)   local counter, response, retcode   if ( retries == nil ) then      retries = 1   end   if (noisy == nil) then      noisy = 1   end   if (delay == nil) then      delay = 1 * 1000   end   counter = retries   repeat      if ( noisy == 1 ) then         TimeStamp ( "Sending " .. command .. " to " .. address)      end      response, retcode = nimbus.request(address, command, arguments)      if ( retcode ~= NIME_OK ) then         -- counter = counter - 1         if ( noisy == 1 ) then            TimeStamp ( command .. " Failed for " .. address .. " Error code (" .. retcode .. ") " .. error_text(retcode) )            --tdump(response)         end         sleep(delay)      end      counter = counter - 1   until ( retcode == NIME_OK or counter == 0 )   if ( retcode == NIME_OK ) then      if ( noisy == 1 ) then         TimeStamp ( command .. " successful for " .. address )      end   else      TimeStamp ( command .. " retries exhausted for " .. address .. " Error code " .. error_text(retcode) )   end   return response, retcodeendfunction HubList(addr)   local gethubs, rc   local h = {}   local key, value   if ( addr == nil ) then      addr = "hub"   end   gethubs, rc = NimbusRequest(addr,"gethubs", nil, 1, 1, 1000)   if (rc ~= 0 ) then      TimeStamp("Error " .. rc .. " : " .. error_text(rc))      return nil   end   for key, value in pairs(gethubs.hublist) do      h[value.name] = { name = value.name, addr = value.addr, source = value.source, proximity = value.proximity, robotname = value.robotname }      if ( value.proximity == 0 ) then         h[value.name].source = value.addr      end   end   for key, value in pairs(gethubs.hublist) do      if ( h[value.name].proximity > 0 ) then         local PathParts = split ( h[value.name].source, "/")         local HubName = PathParts[2]         if (h[HubName] ~= nil) then            h[value.name].source = h[value.name].source .. "/" .. h[HubName].robotname .. "/hub"         else            TimeStamp("Unable to locate hub " .. HubName)            h[value.name].source = h[value.name].source .. "/" .. "ERROR"         end      end   end   return hendfunction FindNextHop(dest, current_hop)   local hubs   local PathParts = split ( current_hop, "/")   local HubName = PathParts[2]   if ( depth == 0 ) then      HopList = {}   end   TimeStamp(" ")   TimeStamp ("Hop: " .. current_hop .. " at depth " .. depth)   if ( HopList[current_hop] == nil ) then      HopList[current_hop] = depth   else      TimeStamp("Encountered circular routing to " .. current_hop)      tdump(HopList)      return 1   end   depth = depth + 1   if (depth > 30) then      TimeStamp( "Reached max depth of " .. depth)   elseif (dest == HubName) then      TimeStamp( "Found")   else      hubs = HubList(current_hop)      if ( hubs[dest] == nil ) then         TimeStamp("Hub hop " .. current_hop .. " doesn't know about " .. dest)      else         TimeStamp("Proximity to " .. dest .. " is " .. hubs[dest].proximity .. " via " .. hubs[dest].source)         rc = FindNextHop(dest, hubs[dest].source)         if ( rc == 1 ) then            return 1         end      end   end   return 0end HopList = {}test_hub = "DestHubHere"depth = 0FindNextHop(test_hub, "/Domain/hub/startingrobot/hub")

Original Message:
Sent: 12-04-2019 11:45 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Ok I'm good now. I just did this on our primary hub. Stopped Nimsoft, deleted the hubs.sds file and then started back up.

W/in a few minutes all the correct hubs showed up and all the old broken ones were gone and the new hubs were accessible now.

Garin I hear ya. I wish there were probe utility calls that would allow you to fix, correct this but there doesn't seem to be any. A visual hub route map would be great if it existed.

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 11:11 AM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

This is pretty much a constant experience with my environment.

With the way the Nimbus network is maintained, where every hub has a copy of everything, if one of those hubs gets a little bit of wrong information it will keep reinfecting that incorrectness into your network. The routing bit is based on the proximity value - essentially every hub in a hub's list of known hubs has a counter of how many hops away it is and which hub is the next hop. And a given hub is constantly listening for updates to this hub network and whenever it gets an update with a smaller proximity it updates the information it has with the new lower proximity information.

This all works great if your network is small, network latency is nonexistent, and hubs never crash or corrupt this data.

One of the problems I have is that at one point a hub set the proximity value of another hub to zero and sent that out into my network. It's impossible to have a proximity less than zero and so this hub will never get updated regardless how wrong it's information is. The problem is that it should have had a proximity of 2 which had put it in the path of the route to a fair number of other hubs.

This in itself is not a problem but then that hub was decommissioned. But it had a proximity of zero and so was the most favored next hop wherever it had been a possible hop. And there was no way to get it out of the network.

Currently the only solution that has been proposed to fix this is to stop UIM in its entirety across my network, delete hubs.sds and robots.sds wherever they exist, and then restart things from the central hub out to the remote systems. The idea being that UIM builds the network of systems on the fly and that if it starts from a clean slate then it will eventually build a correct network. The kicker though is that once all this effort is gone through, there's no guarantee that the same problem won't reoccur and worse, if someone out there had a server that was offline/retired/shut down and then brings it back after all this, that bad information might get reintroduced and you're back where you started.

What I can tell you with IM not reaching things is that it settles down over time - in a couple weeks to a month it will probably be working the way you hope for the hubs that are problematic today. And that the idea of logging into the hub closest to the hub/robot you are trying to interact with is usual practice for my team - you're about 10x more likely to connect to a given hub connected to it's tunnel server than the central hub for instance.

The other thing is that most of the underlying infrastructure is pretty resilient to this - get queues for instance are automatically taking advantage of this "closest hub" thing so while you might not be able to go from your central hub out the several hops to the leaf hub, the get queues between each pair of hubs are much more likely to be working.
Original Message:
Sent: 12-04-2019 10:47 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Hi Gene,
> Primary Hub is Windows
> The IM session is on my desktop but I tired this even on the Pri-hub box and same issue. Also about 10 others reported same issue from my team.
> All client hubs and these now 3 client hubs are tunnel clients. We've added different client hubs since then and they work.
> The primary hub doesn't have direct access to these boxes nor the 100+ other client hubs. All thru Tunnel(s)

Just FYI it was mentioned that the problem started happening after we started using the UMP to deploy robots to machines in their environment after discovery. If that helps with possibly root cause of the issue.
We also are seeing hubs that we off boarding, retired shut down last week still appearing in the IM list no matter what we do. We've tried deleting/REMOVING them many times on all hubs (primary, hubcol, tunner servers) and they keep coming back. There is no reference to these hubs anywhere and all their respective get queues to these retired hubs were deleted as well. This looks like a corrupted hubs.sds file I think. This is our first time ever this happening.

Primary Hub

^

Hub Collector (uimcol)

^

Tunnel Server (6 of these) - UIMHUB1|2|3|4|5|6

^

Tunnel Client Hubs (100+)

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 10:25 AM
From: Gene HOWARD
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

So we need some more information.
Is the primary hub windows?
Is the IM you are using running on the primary hub or a desktop?
Are thew new hubs connect via a tunnel to any other hubs?
Does the primary hub have direct access to the hubs?

there should be no need to remove the robots.sds for this issue, only the hub.sds
the robots should reappear on their own but there is no documented specific time interval for this.
if you want it to happen immediately a robot restart will be required.

------------------------------
Gene Howard
Principal Support Engineer
Broadcom

Original Message:
Sent: 12-03-2019 04:34 PM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

So this just started happening recently where we are setting up new hubs and when logged into the Primary Hub thru IM or in Admin Console these new Tunnel Clients are Red. They are in-accessible but if from w/in IM I r-click and Login to a tunnel server they can reach these new tunnel client hub entries.
I have already tried doing a remove of these tunnel clients from w/in all 8 of the hub's probe's hub's tab on the two tunnel client hubs. When they were re-created, the primary hub still could not find a path to them. They are still RED in IM and in Admin Console.
The r-click check access, check transfer just times out from w/in the hub probes hub's tab. Both entries are red.

In my support case (20127165) it was suggested to follow this KB:

Article title: Remote Hub is offline and unreachablea

which say to stop Nimsoft, and delete the hubs.sds and robots.sds file and then restart.

I am trying this in my lab but I'm noticing that no robots are re-appearing under the primary hub afterwards. It's been almost 30 min and so far all 12 robots in lab are still gone from under primary hub.
If I forcibly do a stop/start on the nimsoft service the robots appear but I cannot do this in PROD. Will these robots eventually check back in at some point with the pri-hub in lab?
Also what other options do I have in order to fix this hub corruption as it seems what is happening here? Is deleting the hubs.sds the only method to fixing this issue?

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com
------------------------------

10. RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Recommend

Garin Walsh

Posted Dec 05, 2019 01:38 PM

So, from wherever you are running this, the hub /NMS/Onxxx/xxxxnms1/hub isn't reachable.

The name is apparently valid because the error is "Error code (2) communication error", if the name was completely bad, it would have been Error Code (4) not found.

Once your network crests a particular size (for me it was when I had to move to using tunnel proxies because of the Windows subscriber limits) this gets more and more frequent.

Is /NMS/Onxxx/xxxxnms1/hub where your nas is? If not, I'd start by changing the starting point to the hub where your nas is which should at least get you past the first gethubs callback.

Original Message

Original Message:
Sent: 12-05-2019 01:25 PM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Ok tried running this on my laptop and it failed. Then directly off the primary hub and also failed.
Getting:
----------- Executing script at 12/5/2019 1:19:05 PM ----------

1:19:04 PM
1:19:04 PM Hop: /NMS/Onxxx/xxxxnms1/hub at depth 0
1:19:04 PM Sending gethubs to /NMS/Onxxxx/xxxxnms1/hub
1:19:04 PM gethubs Failed for /NMS/Onxxxx/xxxxnms1/hub Error code (2) communication error
1:19:05 PM gethubs retries exhausted for /NMS/Onxxxx/xxxxnms1/hub Error code communication error
1:19:05 PM Error 2 : communication error
Error in line 186: attempt to index local 'hubs' (a nil value)

Line 186 is:
>>> if ( hubs[dest] == nil ) then
TimeStamp("Hub hop " .. current_hop .. " doesn't know about " .. dest)
else

On the last few lines I specified:
--== MAIN BEGIN:
HopList = {}
--Enter the Hub you want to test
test_hub = "Alphaserve"
depth = 0
--Specify the starting hub you want to test from here
--User full /DOMAIN/HUB/ROBOT/hub address
--FindNextHop(test_hub, "/Domain/hub/startingrobot/hub")
FindNextHop(test_hub, "/NMS/Onxxx/xxxxnms1/hub")

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-05-2019 11:31 AM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

It's a Lua script - just open nas, select the Auto-Operator/scripts tab, right click in the white space and select "New -> script"

Paste this in.

The last couple lines define the destination hub and the starting point. Update those to match your needs.
Original Message:
Sent: 12-05-2019 10:54 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Nice. What is this written in? How do I run this? Just copy this into the nas a new script or some other way?
TIA...

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 04:44 PM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

I wrote this "traceroute" tool - takes a destination and starting hub and displays the path:

depth = 0function TimeStamp(stuff)   print (string.format("%s %s", timestamp.format(timestamp.now(), "%X"), (stuff or "(nil)")))endfunction tdump(t)   local function dmp(t, l, k)      if type(t) == "table" then         print(string.format("%s%s:", string.rep(" ", l*2), tostring(k)))         for k, v in pairs(t) do            dmp(v, l+1, k)         end      else         print(string.format("%s%s:%s", string.rep(" ", l*2), tostring(k), tostring(t)))      end   end   dmp(t, 1, "root")endfunction error_text(e)   local x   if ( e == nil ) then      x = "error is nil"   elseif ( e == NIME_OK ) then      x = "OK"   elseif ( e == NIME_ERROR ) then      x = "error"   elseif ( e == NIME_COMERR ) then      x = "communication error"   elseif ( e == NIME_INVAL ) then      x = "invalid argument"   elseif ( e == NIME_NOENT ) then      x = "not found"   elseif ( e == NIME_ISENT ) then      x = "already defined"   elseif ( e == NIME_ACCESS ) then      x = "permission denied"   elseif ( e == NIME_AGAIN ) then      x = "temporarily out of resources"   elseif ( e == NIME_NOMEM ) then      x = "out of resources"   elseif ( e == NIME_NOSPC ) then      x = "no space left"   elseif ( e == NIME_EPIPE ) then      x = "broken connection"   elseif ( e == NIME_NOCMD ) then      x = "command not found"   elseif ( e == NIME_LOGIN ) then      x = "login failed"   elseif ( e == NIME_SIDEXP ) then      x = "SID expired"   elseif ( e == NIME_ILLMAC ) then      x = "illegal MAC"   elseif ( e == NIME_ILLSID ) then      x = "illegal SID"   elseif ( e == NIME_SIDSESS ) then      x = "Session id for hub is invalid"   elseif ( e == NIME_EXPIRED ) then      x = "Expired"   elseif ( e == NIME_NOLIC ) then      x = "No valid license"   elseif ( e == NIME_INVLIC ) then      x = "Invalid license"   elseif ( e == NIME_ILLLIC ) then      x = "Illegal license"   elseif ( e == NIME_INVOP ) then      x = "Invalid operation finv"   else  --if ( e >= NIME_USER ) then      x = "user error from this value and up"   end   return xendfunction NimbusRequest (address, command, arguments, retries, noisy, delay)   local counter, response, retcode   if ( retries == nil ) then      retries = 1   end   if (noisy == nil) then      noisy = 1   end   if (delay == nil) then      delay = 1 * 1000   end   counter = retries   repeat      if ( noisy == 1 ) then         TimeStamp ( "Sending " .. command .. " to " .. address)      end      response, retcode = nimbus.request(address, command, arguments)      if ( retcode ~= NIME_OK ) then         -- counter = counter - 1         if ( noisy == 1 ) then            TimeStamp ( command .. " Failed for " .. address .. " Error code (" .. retcode .. ") " .. error_text(retcode) )            --tdump(response)         end         sleep(delay)      end      counter = counter - 1   until ( retcode == NIME_OK or counter == 0 )   if ( retcode == NIME_OK ) then      if ( noisy == 1 ) then         TimeStamp ( command .. " successful for " .. address )      end   else      TimeStamp ( command .. " retries exhausted for " .. address .. " Error code " .. error_text(retcode) )   end   return response, retcodeendfunction HubList(addr)   local gethubs, rc   local h = {}   local key, value   if ( addr == nil ) then      addr = "hub"   end   gethubs, rc = NimbusRequest(addr,"gethubs", nil, 1, 1, 1000)   if (rc ~= 0 ) then      TimeStamp("Error " .. rc .. " : " .. error_text(rc))      return nil   end   for key, value in pairs(gethubs.hublist) do      h[value.name] = { name = value.name, addr = value.addr, source = value.source, proximity = value.proximity, robotname = value.robotname }      if ( value.proximity == 0 ) then         h[value.name].source = value.addr      end   end   for key, value in pairs(gethubs.hublist) do      if ( h[value.name].proximity > 0 ) then         local PathParts = split ( h[value.name].source, "/")         local HubName = PathParts[2]         if (h[HubName] ~= nil) then            h[value.name].source = h[value.name].source .. "/" .. h[HubName].robotname .. "/hub"         else            TimeStamp("Unable to locate hub " .. HubName)            h[value.name].source = h[value.name].source .. "/" .. "ERROR"         end      end   end   return hendfunction FindNextHop(dest, current_hop)   local hubs   local PathParts = split ( current_hop, "/")   local HubName = PathParts[2]   if ( depth == 0 ) then      HopList = {}   end   TimeStamp(" ")   TimeStamp ("Hop: " .. current_hop .. " at depth " .. depth)   if ( HopList[current_hop] == nil ) then      HopList[current_hop] = depth   else      TimeStamp("Encountered circular routing to " .. current_hop)      tdump(HopList)      return 1   end   depth = depth + 1   if (depth > 30) then      TimeStamp( "Reached max depth of " .. depth)   elseif (dest == HubName) then      TimeStamp( "Found")   else      hubs = HubList(current_hop)      if ( hubs[dest] == nil ) then         TimeStamp("Hub hop " .. current_hop .. " doesn't know about " .. dest)      else         TimeStamp("Proximity to " .. dest .. " is " .. hubs[dest].proximity .. " via " .. hubs[dest].source)         rc = FindNextHop(dest, hubs[dest].source)         if ( rc == 1 ) then            return 1         end      end   end   return 0end HopList = {}test_hub = "DestHubHere"depth = 0FindNextHop(test_hub, "/Domain/hub/startingrobot/hub")

Original Message:
Sent: 12-04-2019 11:45 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Ok I'm good now. I just did this on our primary hub. Stopped Nimsoft, deleted the hubs.sds file and then started back up.

W/in a few minutes all the correct hubs showed up and all the old broken ones were gone and the new hubs were accessible now.

Garin I hear ya. I wish there were probe utility calls that would allow you to fix, correct this but there doesn't seem to be any. A visual hub route map would be great if it existed.

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 11:11 AM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

This is pretty much a constant experience with my environment.

With the way the Nimbus network is maintained, where every hub has a copy of everything, if one of those hubs gets a little bit of wrong information it will keep reinfecting that incorrectness into your network. The routing bit is based on the proximity value - essentially every hub in a hub's list of known hubs has a counter of how many hops away it is and which hub is the next hop. And a given hub is constantly listening for updates to this hub network and whenever it gets an update with a smaller proximity it updates the information it has with the new lower proximity information.

This all works great if your network is small, network latency is nonexistent, and hubs never crash or corrupt this data.

One of the problems I have is that at one point a hub set the proximity value of another hub to zero and sent that out into my network. It's impossible to have a proximity less than zero and so this hub will never get updated regardless how wrong it's information is. The problem is that it should have had a proximity of 2 which had put it in the path of the route to a fair number of other hubs.

This in itself is not a problem but then that hub was decommissioned. But it had a proximity of zero and so was the most favored next hop wherever it had been a possible hop. And there was no way to get it out of the network.

Currently the only solution that has been proposed to fix this is to stop UIM in its entirety across my network, delete hubs.sds and robots.sds wherever they exist, and then restart things from the central hub out to the remote systems. The idea being that UIM builds the network of systems on the fly and that if it starts from a clean slate then it will eventually build a correct network. The kicker though is that once all this effort is gone through, there's no guarantee that the same problem won't reoccur and worse, if someone out there had a server that was offline/retired/shut down and then brings it back after all this, that bad information might get reintroduced and you're back where you started.

What I can tell you with IM not reaching things is that it settles down over time - in a couple weeks to a month it will probably be working the way you hope for the hubs that are problematic today. And that the idea of logging into the hub closest to the hub/robot you are trying to interact with is usual practice for my team - you're about 10x more likely to connect to a given hub connected to it's tunnel server than the central hub for instance.

The other thing is that most of the underlying infrastructure is pretty resilient to this - get queues for instance are automatically taking advantage of this "closest hub" thing so while you might not be able to go from your central hub out the several hops to the leaf hub, the get queues between each pair of hubs are much more likely to be working.
Original Message:
Sent: 12-04-2019 10:47 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Hi Gene,
> Primary Hub is Windows
> The IM session is on my desktop but I tired this even on the Pri-hub box and same issue. Also about 10 others reported same issue from my team.
> All client hubs and these now 3 client hubs are tunnel clients. We've added different client hubs since then and they work.
> The primary hub doesn't have direct access to these boxes nor the 100+ other client hubs. All thru Tunnel(s)

Just FYI it was mentioned that the problem started happening after we started using the UMP to deploy robots to machines in their environment after discovery. If that helps with possibly root cause of the issue.
We also are seeing hubs that we off boarding, retired shut down last week still appearing in the IM list no matter what we do. We've tried deleting/REMOVING them many times on all hubs (primary, hubcol, tunner servers) and they keep coming back. There is no reference to these hubs anywhere and all their respective get queues to these retired hubs were deleted as well. This looks like a corrupted hubs.sds file I think. This is our first time ever this happening.

Primary Hub

^

Hub Collector (uimcol)

^

Tunnel Server (6 of these) - UIMHUB1|2|3|4|5|6

^

Tunnel Client Hubs (100+)

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 10:25 AM
From: Gene HOWARD
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

So we need some more information.
Is the primary hub windows?
Is the IM you are using running on the primary hub or a desktop?
Are thew new hubs connect via a tunnel to any other hubs?
Does the primary hub have direct access to the hubs?

there should be no need to remove the robots.sds for this issue, only the hub.sds
the robots should reappear on their own but there is no documented specific time interval for this.
if you want it to happen immediately a robot restart will be required.

------------------------------
Gene Howard
Principal Support Engineer
Broadcom

Original Message:
Sent: 12-03-2019 04:34 PM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

So this just started happening recently where we are setting up new hubs and when logged into the Primary Hub thru IM or in Admin Console these new Tunnel Clients are Red. They are in-accessible but if from w/in IM I r-click and Login to a tunnel server they can reach these new tunnel client hub entries.
I have already tried doing a remove of these tunnel clients from w/in all 8 of the hub's probe's hub's tab on the two tunnel client hubs. When they were re-created, the primary hub still could not find a path to them. They are still RED in IM and in Admin Console.
The r-click check access, check transfer just times out from w/in the hub probes hub's tab. Both entries are red.

In my support case (20127165) it was suggested to follow this KB:

Article title: Remote Hub is offline and unreachablea

which say to stop Nimsoft, and delete the hubs.sds and robots.sds file and then restart.

I am trying this in my lab but I'm noticing that no robots are re-appearing under the primary hub afterwards. It's been almost 30 min and so far all 12 robots in lab are still gone from under primary hub.
If I forcibly do a stop/start on the nimsoft service the robots appear but I cannot do this in PROD. Will these robots eventually check back in at some point with the pri-hub in lab?
Also what other options do I have in order to fix this hub corruption as it seems what is happening here? Is deleting the hubs.sds the only method to fixing this issue?

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com
------------------------------

11. RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Recommend

Daniel Blanco

Posted Dec 05, 2019 01:47 PM

Yes the nms1 box is our primary hub and its where the nas is running. I tried running it of an IM session running off the nsm1 box.
I guess it cannot reach itself or maybe its the case sensitivity. Let me try that...

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com
------------------------------

Original Message

Original Message:
Sent: 12-05-2019 01:37 PM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

So, from wherever you are running this, the hub /NMS/Onxxx/xxxxnms1/hub isn't reachable.

The name is apparently valid because the error is "Error code (2) communication error", if the name was completely bad, it would have been Error Code (4) not found.

Once your network crests a particular size (for me it was when I had to move to using tunnel proxies because of the Windows subscriber limits) this gets more and more frequent.

Is /NMS/Onxxx/xxxxnms1/hub where your nas is? If not, I'd start by changing the starting point to the hub where your nas is which should at least get you past the first gethubs callback.
Original Message:
Sent: 12-05-2019 01:25 PM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Ok tried running this on my laptop and it failed. Then directly off the primary hub and also failed.
Getting:
----------- Executing script at 12/5/2019 1:19:05 PM ----------

1:19:04 PM
1:19:04 PM Hop: /NMS/Onxxx/xxxxnms1/hub at depth 0
1:19:04 PM Sending gethubs to /NMS/Onxxxx/xxxxnms1/hub
1:19:04 PM gethubs Failed for /NMS/Onxxxx/xxxxnms1/hub Error code (2) communication error
1:19:05 PM gethubs retries exhausted for /NMS/Onxxxx/xxxxnms1/hub Error code communication error
1:19:05 PM Error 2 : communication error
Error in line 186: attempt to index local 'hubs' (a nil value)

Line 186 is:
>>> if ( hubs[dest] == nil ) then
TimeStamp("Hub hop " .. current_hop .. " doesn't know about " .. dest)
else

On the last few lines I specified:
--== MAIN BEGIN:
HopList = {}
--Enter the Hub you want to test
test_hub = "Alphaserve"
depth = 0
--Specify the starting hub you want to test from here
--User full /DOMAIN/HUB/ROBOT/hub address
--FindNextHop(test_hub, "/Domain/hub/startingrobot/hub")
FindNextHop(test_hub, "/NMS/Onxxx/xxxxnms1/hub")

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-05-2019 11:31 AM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

It's a Lua script - just open nas, select the Auto-Operator/scripts tab, right click in the white space and select "New -> script"

Paste this in.

The last couple lines define the destination hub and the starting point. Update those to match your needs.
Original Message:
Sent: 12-05-2019 10:54 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Nice. What is this written in? How do I run this? Just copy this into the nas a new script or some other way?
TIA...

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 04:44 PM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

I wrote this "traceroute" tool - takes a destination and starting hub and displays the path:

depth = 0function TimeStamp(stuff)   print (string.format("%s %s", timestamp.format(timestamp.now(), "%X"), (stuff or "(nil)")))endfunction tdump(t)   local function dmp(t, l, k)      if type(t) == "table" then         print(string.format("%s%s:", string.rep(" ", l*2), tostring(k)))         for k, v in pairs(t) do            dmp(v, l+1, k)         end      else         print(string.format("%s%s:%s", string.rep(" ", l*2), tostring(k), tostring(t)))      end   end   dmp(t, 1, "root")endfunction error_text(e)   local x   if ( e == nil ) then      x = "error is nil"   elseif ( e == NIME_OK ) then      x = "OK"   elseif ( e == NIME_ERROR ) then      x = "error"   elseif ( e == NIME_COMERR ) then      x = "communication error"   elseif ( e == NIME_INVAL ) then      x = "invalid argument"   elseif ( e == NIME_NOENT ) then      x = "not found"   elseif ( e == NIME_ISENT ) then      x = "already defined"   elseif ( e == NIME_ACCESS ) then      x = "permission denied"   elseif ( e == NIME_AGAIN ) then      x = "temporarily out of resources"   elseif ( e == NIME_NOMEM ) then      x = "out of resources"   elseif ( e == NIME_NOSPC ) then      x = "no space left"   elseif ( e == NIME_EPIPE ) then      x = "broken connection"   elseif ( e == NIME_NOCMD ) then      x = "command not found"   elseif ( e == NIME_LOGIN ) then      x = "login failed"   elseif ( e == NIME_SIDEXP ) then      x = "SID expired"   elseif ( e == NIME_ILLMAC ) then      x = "illegal MAC"   elseif ( e == NIME_ILLSID ) then      x = "illegal SID"   elseif ( e == NIME_SIDSESS ) then      x = "Session id for hub is invalid"   elseif ( e == NIME_EXPIRED ) then      x = "Expired"   elseif ( e == NIME_NOLIC ) then      x = "No valid license"   elseif ( e == NIME_INVLIC ) then      x = "Invalid license"   elseif ( e == NIME_ILLLIC ) then      x = "Illegal license"   elseif ( e == NIME_INVOP ) then      x = "Invalid operation finv"   else  --if ( e >= NIME_USER ) then      x = "user error from this value and up"   end   return xendfunction NimbusRequest (address, command, arguments, retries, noisy, delay)   local counter, response, retcode   if ( retries == nil ) then      retries = 1   end   if (noisy == nil) then      noisy = 1   end   if (delay == nil) then      delay = 1 * 1000   end   counter = retries   repeat      if ( noisy == 1 ) then         TimeStamp ( "Sending " .. command .. " to " .. address)      end      response, retcode = nimbus.request(address, command, arguments)      if ( retcode ~= NIME_OK ) then         -- counter = counter - 1         if ( noisy == 1 ) then            TimeStamp ( command .. " Failed for " .. address .. " Error code (" .. retcode .. ") " .. error_text(retcode) )            --tdump(response)         end         sleep(delay)      end      counter = counter - 1   until ( retcode == NIME_OK or counter == 0 )   if ( retcode == NIME_OK ) then      if ( noisy == 1 ) then         TimeStamp ( command .. " successful for " .. address )      end   else      TimeStamp ( command .. " retries exhausted for " .. address .. " Error code " .. error_text(retcode) )   end   return response, retcodeendfunction HubList(addr)   local gethubs, rc   local h = {}   local key, value   if ( addr == nil ) then      addr = "hub"   end   gethubs, rc = NimbusRequest(addr,"gethubs", nil, 1, 1, 1000)   if (rc ~= 0 ) then      TimeStamp("Error " .. rc .. " : " .. error_text(rc))      return nil   end   for key, value in pairs(gethubs.hublist) do      h[value.name] = { name = value.name, addr = value.addr, source = value.source, proximity = value.proximity, robotname = value.robotname }      if ( value.proximity == 0 ) then         h[value.name].source = value.addr      end   end   for key, value in pairs(gethubs.hublist) do      if ( h[value.name].proximity > 0 ) then         local PathParts = split ( h[value.name].source, "/")         local HubName = PathParts[2]         if (h[HubName] ~= nil) then            h[value.name].source = h[value.name].source .. "/" .. h[HubName].robotname .. "/hub"         else            TimeStamp("Unable to locate hub " .. HubName)            h[value.name].source = h[value.name].source .. "/" .. "ERROR"         end      end   end   return hendfunction FindNextHop(dest, current_hop)   local hubs   local PathParts = split ( current_hop, "/")   local HubName = PathParts[2]   if ( depth == 0 ) then      HopList = {}   end   TimeStamp(" ")   TimeStamp ("Hop: " .. current_hop .. " at depth " .. depth)   if ( HopList[current_hop] == nil ) then      HopList[current_hop] = depth   else      TimeStamp("Encountered circular routing to " .. current_hop)      tdump(HopList)      return 1   end   depth = depth + 1   if (depth > 30) then      TimeStamp( "Reached max depth of " .. depth)   elseif (dest == HubName) then      TimeStamp( "Found")   else      hubs = HubList(current_hop)      if ( hubs[dest] == nil ) then         TimeStamp("Hub hop " .. current_hop .. " doesn't know about " .. dest)      else         TimeStamp("Proximity to " .. dest .. " is " .. hubs[dest].proximity .. " via " .. hubs[dest].source)         rc = FindNextHop(dest, hubs[dest].source)         if ( rc == 1 ) then            return 1         end      end   end   return 0end HopList = {}test_hub = "DestHubHere"depth = 0FindNextHop(test_hub, "/Domain/hub/startingrobot/hub")

Original Message:
Sent: 12-04-2019 11:45 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Ok I'm good now. I just did this on our primary hub. Stopped Nimsoft, deleted the hubs.sds file and then started back up.

W/in a few minutes all the correct hubs showed up and all the old broken ones were gone and the new hubs were accessible now.

Garin I hear ya. I wish there were probe utility calls that would allow you to fix, correct this but there doesn't seem to be any. A visual hub route map would be great if it existed.

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 11:11 AM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

This is pretty much a constant experience with my environment.

With the way the Nimbus network is maintained, where every hub has a copy of everything, if one of those hubs gets a little bit of wrong information it will keep reinfecting that incorrectness into your network. The routing bit is based on the proximity value - essentially every hub in a hub's list of known hubs has a counter of how many hops away it is and which hub is the next hop. And a given hub is constantly listening for updates to this hub network and whenever it gets an update with a smaller proximity it updates the information it has with the new lower proximity information.

This all works great if your network is small, network latency is nonexistent, and hubs never crash or corrupt this data.

One of the problems I have is that at one point a hub set the proximity value of another hub to zero and sent that out into my network. It's impossible to have a proximity less than zero and so this hub will never get updated regardless how wrong it's information is. The problem is that it should have had a proximity of 2 which had put it in the path of the route to a fair number of other hubs.

This in itself is not a problem but then that hub was decommissioned. But it had a proximity of zero and so was the most favored next hop wherever it had been a possible hop. And there was no way to get it out of the network.

Currently the only solution that has been proposed to fix this is to stop UIM in its entirety across my network, delete hubs.sds and robots.sds wherever they exist, and then restart things from the central hub out to the remote systems. The idea being that UIM builds the network of systems on the fly and that if it starts from a clean slate then it will eventually build a correct network. The kicker though is that once all this effort is gone through, there's no guarantee that the same problem won't reoccur and worse, if someone out there had a server that was offline/retired/shut down and then brings it back after all this, that bad information might get reintroduced and you're back where you started.

What I can tell you with IM not reaching things is that it settles down over time - in a couple weeks to a month it will probably be working the way you hope for the hubs that are problematic today. And that the idea of logging into the hub closest to the hub/robot you are trying to interact with is usual practice for my team - you're about 10x more likely to connect to a given hub connected to it's tunnel server than the central hub for instance.

The other thing is that most of the underlying infrastructure is pretty resilient to this - get queues for instance are automatically taking advantage of this "closest hub" thing so while you might not be able to go from your central hub out the several hops to the leaf hub, the get queues between each pair of hubs are much more likely to be working.
Original Message:
Sent: 12-04-2019 10:47 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Hi Gene,
> Primary Hub is Windows
> The IM session is on my desktop but I tired this even on the Pri-hub box and same issue. Also about 10 others reported same issue from my team.
> All client hubs and these now 3 client hubs are tunnel clients. We've added different client hubs since then and they work.
> The primary hub doesn't have direct access to these boxes nor the 100+ other client hubs. All thru Tunnel(s)

Just FYI it was mentioned that the problem started happening after we started using the UMP to deploy robots to machines in their environment after discovery. If that helps with possibly root cause of the issue.
We also are seeing hubs that we off boarding, retired shut down last week still appearing in the IM list no matter what we do. We've tried deleting/REMOVING them many times on all hubs (primary, hubcol, tunner servers) and they keep coming back. There is no reference to these hubs anywhere and all their respective get queues to these retired hubs were deleted as well. This looks like a corrupted hubs.sds file I think. This is our first time ever this happening.

Primary Hub

^

Hub Collector (uimcol)

^

Tunnel Server (6 of these) - UIMHUB1|2|3|4|5|6

^

Tunnel Client Hubs (100+)

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 10:25 AM
From: Gene HOWARD
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

So we need some more information.
Is the primary hub windows?
Is the IM you are using running on the primary hub or a desktop?
Are thew new hubs connect via a tunnel to any other hubs?
Does the primary hub have direct access to the hubs?

there should be no need to remove the robots.sds for this issue, only the hub.sds
the robots should reappear on their own but there is no documented specific time interval for this.
if you want it to happen immediately a robot restart will be required.

------------------------------
Gene Howard
Principal Support Engineer
Broadcom

Original Message:
Sent: 12-03-2019 04:34 PM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

So this just started happening recently where we are setting up new hubs and when logged into the Primary Hub thru IM or in Admin Console these new Tunnel Clients are Red. They are in-accessible but if from w/in IM I r-click and Login to a tunnel server they can reach these new tunnel client hub entries.
I have already tried doing a remove of these tunnel clients from w/in all 8 of the hub's probe's hub's tab on the two tunnel client hubs. When they were re-created, the primary hub still could not find a path to them. They are still RED in IM and in Admin Console.
The r-click check access, check transfer just times out from w/in the hub probes hub's tab. Both entries are red.

In my support case (20127165) it was suggested to follow this KB:

Article title: Remote Hub is offline and unreachablea

which say to stop Nimsoft, and delete the hubs.sds and robots.sds file and then restart.

I am trying this in my lab but I'm noticing that no robots are re-appearing under the primary hub afterwards. It's been almost 30 min and so far all 12 robots in lab are still gone from under primary hub.
If I forcibly do a stop/start on the nimsoft service the robots appear but I cannot do this in PROD. Will these robots eventually check back in at some point with the pri-hub in lab?
Also what other options do I have in order to fix this hub corruption as it seems what is happening here? Is deleting the hubs.sds the only method to fixing this issue?

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com
------------------------------

12. RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Recommend

Daniel Blanco

Posted Dec 05, 2019 01:54 PM

Okay yes that was it. Its case sensitive. It's now working.. Cool script:
(Of course renaming hubs and robots)

----------- Executing script at 12/5/2019 1:48:36 PM ----------

1:48:36 PM
1:48:36 PM Hop: /NMS/On-Prem/NMS1/hub at depth 0
1:48:36 PM Sending gethubs to /NMS/On-Prem/NMS1/hub
1:48:36 PM gethubs successful for /NMS/On-Prem/NMS1/hub
1:48:36 PM Proximity to CLIENT_B is 1 via /NMS/UIMHUB5/uimhub5/hub
1:48:36 PM
1:48:36 PM Hop: /NMS/UIMHUB5/uimhub5/hub at depth 1
1:48:36 PM Sending gethubs to /NMS/UIMHUB5/uimhub5/hub
1:48:36 PM gethubs successful for /NMS/UIMHUB5/uimhub5/hub
1:48:36 PM Proximity to CLIENT_B is 0 via /NMS/CLIENT_B/clientBhub/hub
1:48:36 PM
1:48:36 PM Hop: /NMS/CLIENT_B/clientBhub/hub at depth 2
1:48:36 PM Found

----------- Executing script at 12/5/2019 1:47:51 PM ----------

1:47:50 PM
1:47:50 PM Hop: /NMS/On-Prem/NMS1/hub at depth 0
1:47:50 PM Sending gethubs to /NMS/On-Prem/NMS1/hub
1:47:51 PM gethubs successful for /NMS/On-Prem/NMS1/hub
1:47:51 PM Proximity to Alphaserve is 0 via /NMS/Alphaserve/hub1/hub
1:47:51 PM
1:47:51 PM Hop: /NMS/Alphaserve/hub1/hub at depth 1
1:47:51 PM Found

Thanks....

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com
------------------------------

Original Message

Original Message:
Sent: 12-05-2019 01:46 PM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Yes the nms1 box is our primary hub and its where the nas is running. I tried running it of an IM session running off the nsm1 box.
I guess it cannot reach itself or maybe its the case sensitivity. Let me try that...

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-05-2019 01:37 PM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

So, from wherever you are running this, the hub /NMS/Onxxx/xxxxnms1/hub isn't reachable.

The name is apparently valid because the error is "Error code (2) communication error", if the name was completely bad, it would have been Error Code (4) not found.

Once your network crests a particular size (for me it was when I had to move to using tunnel proxies because of the Windows subscriber limits) this gets more and more frequent.

Is /NMS/Onxxx/xxxxnms1/hub where your nas is? If not, I'd start by changing the starting point to the hub where your nas is which should at least get you past the first gethubs callback.
Original Message:
Sent: 12-05-2019 01:25 PM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Ok tried running this on my laptop and it failed. Then directly off the primary hub and also failed.
Getting:
----------- Executing script at 12/5/2019 1:19:05 PM ----------

1:19:04 PM
1:19:04 PM Hop: /NMS/Onxxx/xxxxnms1/hub at depth 0
1:19:04 PM Sending gethubs to /NMS/Onxxxx/xxxxnms1/hub
1:19:04 PM gethubs Failed for /NMS/Onxxxx/xxxxnms1/hub Error code (2) communication error
1:19:05 PM gethubs retries exhausted for /NMS/Onxxxx/xxxxnms1/hub Error code communication error
1:19:05 PM Error 2 : communication error
Error in line 186: attempt to index local 'hubs' (a nil value)

Line 186 is:
>>> if ( hubs[dest] == nil ) then
TimeStamp("Hub hop " .. current_hop .. " doesn't know about " .. dest)
else

On the last few lines I specified:
--== MAIN BEGIN:
HopList = {}
--Enter the Hub you want to test
test_hub = "Alphaserve"
depth = 0
--Specify the starting hub you want to test from here
--User full /DOMAIN/HUB/ROBOT/hub address
--FindNextHop(test_hub, "/Domain/hub/startingrobot/hub")
FindNextHop(test_hub, "/NMS/Onxxx/xxxxnms1/hub")

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-05-2019 11:31 AM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

It's a Lua script - just open nas, select the Auto-Operator/scripts tab, right click in the white space and select "New -> script"

Paste this in.

The last couple lines define the destination hub and the starting point. Update those to match your needs.
Original Message:
Sent: 12-05-2019 10:54 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Nice. What is this written in? How do I run this? Just copy this into the nas a new script or some other way?
TIA...

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 04:44 PM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

I wrote this "traceroute" tool - takes a destination and starting hub and displays the path:

depth = 0function TimeStamp(stuff)   print (string.format("%s %s", timestamp.format(timestamp.now(), "%X"), (stuff or "(nil)")))endfunction tdump(t)   local function dmp(t, l, k)      if type(t) == "table" then         print(string.format("%s%s:", string.rep(" ", l*2), tostring(k)))         for k, v in pairs(t) do            dmp(v, l+1, k)         end      else         print(string.format("%s%s:%s", string.rep(" ", l*2), tostring(k), tostring(t)))      end   end   dmp(t, 1, "root")endfunction error_text(e)   local x   if ( e == nil ) then      x = "error is nil"   elseif ( e == NIME_OK ) then      x = "OK"   elseif ( e == NIME_ERROR ) then      x = "error"   elseif ( e == NIME_COMERR ) then      x = "communication error"   elseif ( e == NIME_INVAL ) then      x = "invalid argument"   elseif ( e == NIME_NOENT ) then      x = "not found"   elseif ( e == NIME_ISENT ) then      x = "already defined"   elseif ( e == NIME_ACCESS ) then      x = "permission denied"   elseif ( e == NIME_AGAIN ) then      x = "temporarily out of resources"   elseif ( e == NIME_NOMEM ) then      x = "out of resources"   elseif ( e == NIME_NOSPC ) then      x = "no space left"   elseif ( e == NIME_EPIPE ) then      x = "broken connection"   elseif ( e == NIME_NOCMD ) then      x = "command not found"   elseif ( e == NIME_LOGIN ) then      x = "login failed"   elseif ( e == NIME_SIDEXP ) then      x = "SID expired"   elseif ( e == NIME_ILLMAC ) then      x = "illegal MAC"   elseif ( e == NIME_ILLSID ) then      x = "illegal SID"   elseif ( e == NIME_SIDSESS ) then      x = "Session id for hub is invalid"   elseif ( e == NIME_EXPIRED ) then      x = "Expired"   elseif ( e == NIME_NOLIC ) then      x = "No valid license"   elseif ( e == NIME_INVLIC ) then      x = "Invalid license"   elseif ( e == NIME_ILLLIC ) then      x = "Illegal license"   elseif ( e == NIME_INVOP ) then      x = "Invalid operation finv"   else  --if ( e >= NIME_USER ) then      x = "user error from this value and up"   end   return xendfunction NimbusRequest (address, command, arguments, retries, noisy, delay)   local counter, response, retcode   if ( retries == nil ) then      retries = 1   end   if (noisy == nil) then      noisy = 1   end   if (delay == nil) then      delay = 1 * 1000   end   counter = retries   repeat      if ( noisy == 1 ) then         TimeStamp ( "Sending " .. command .. " to " .. address)      end      response, retcode = nimbus.request(address, command, arguments)      if ( retcode ~= NIME_OK ) then         -- counter = counter - 1         if ( noisy == 1 ) then            TimeStamp ( command .. " Failed for " .. address .. " Error code (" .. retcode .. ") " .. error_text(retcode) )            --tdump(response)         end         sleep(delay)      end      counter = counter - 1   until ( retcode == NIME_OK or counter == 0 )   if ( retcode == NIME_OK ) then      if ( noisy == 1 ) then         TimeStamp ( command .. " successful for " .. address )      end   else      TimeStamp ( command .. " retries exhausted for " .. address .. " Error code " .. error_text(retcode) )   end   return response, retcodeendfunction HubList(addr)   local gethubs, rc   local h = {}   local key, value   if ( addr == nil ) then      addr = "hub"   end   gethubs, rc = NimbusRequest(addr,"gethubs", nil, 1, 1, 1000)   if (rc ~= 0 ) then      TimeStamp("Error " .. rc .. " : " .. error_text(rc))      return nil   end   for key, value in pairs(gethubs.hublist) do      h[value.name] = { name = value.name, addr = value.addr, source = value.source, proximity = value.proximity, robotname = value.robotname }      if ( value.proximity == 0 ) then         h[value.name].source = value.addr      end   end   for key, value in pairs(gethubs.hublist) do      if ( h[value.name].proximity > 0 ) then         local PathParts = split ( h[value.name].source, "/")         local HubName = PathParts[2]         if (h[HubName] ~= nil) then            h[value.name].source = h[value.name].source .. "/" .. h[HubName].robotname .. "/hub"         else            TimeStamp("Unable to locate hub " .. HubName)            h[value.name].source = h[value.name].source .. "/" .. "ERROR"         end      end   end   return hendfunction FindNextHop(dest, current_hop)   local hubs   local PathParts = split ( current_hop, "/")   local HubName = PathParts[2]   if ( depth == 0 ) then      HopList = {}   end   TimeStamp(" ")   TimeStamp ("Hop: " .. current_hop .. " at depth " .. depth)   if ( HopList[current_hop] == nil ) then      HopList[current_hop] = depth   else      TimeStamp("Encountered circular routing to " .. current_hop)      tdump(HopList)      return 1   end   depth = depth + 1   if (depth > 30) then      TimeStamp( "Reached max depth of " .. depth)   elseif (dest == HubName) then      TimeStamp( "Found")   else      hubs = HubList(current_hop)      if ( hubs[dest] == nil ) then         TimeStamp("Hub hop " .. current_hop .. " doesn't know about " .. dest)      else         TimeStamp("Proximity to " .. dest .. " is " .. hubs[dest].proximity .. " via " .. hubs[dest].source)         rc = FindNextHop(dest, hubs[dest].source)         if ( rc == 1 ) then            return 1         end      end   end   return 0end HopList = {}test_hub = "DestHubHere"depth = 0FindNextHop(test_hub, "/Domain/hub/startingrobot/hub")

Original Message:
Sent: 12-04-2019 11:45 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Ok I'm good now. I just did this on our primary hub. Stopped Nimsoft, deleted the hubs.sds file and then started back up.

W/in a few minutes all the correct hubs showed up and all the old broken ones were gone and the new hubs were accessible now.

Garin I hear ya. I wish there were probe utility calls that would allow you to fix, correct this but there doesn't seem to be any. A visual hub route map would be great if it existed.

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 11:11 AM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

This is pretty much a constant experience with my environment.

With the way the Nimbus network is maintained, where every hub has a copy of everything, if one of those hubs gets a little bit of wrong information it will keep reinfecting that incorrectness into your network. The routing bit is based on the proximity value - essentially every hub in a hub's list of known hubs has a counter of how many hops away it is and which hub is the next hop. And a given hub is constantly listening for updates to this hub network and whenever it gets an update with a smaller proximity it updates the information it has with the new lower proximity information.

This all works great if your network is small, network latency is nonexistent, and hubs never crash or corrupt this data.

One of the problems I have is that at one point a hub set the proximity value of another hub to zero and sent that out into my network. It's impossible to have a proximity less than zero and so this hub will never get updated regardless how wrong it's information is. The problem is that it should have had a proximity of 2 which had put it in the path of the route to a fair number of other hubs.

This in itself is not a problem but then that hub was decommissioned. But it had a proximity of zero and so was the most favored next hop wherever it had been a possible hop. And there was no way to get it out of the network.

Currently the only solution that has been proposed to fix this is to stop UIM in its entirety across my network, delete hubs.sds and robots.sds wherever they exist, and then restart things from the central hub out to the remote systems. The idea being that UIM builds the network of systems on the fly and that if it starts from a clean slate then it will eventually build a correct network. The kicker though is that once all this effort is gone through, there's no guarantee that the same problem won't reoccur and worse, if someone out there had a server that was offline/retired/shut down and then brings it back after all this, that bad information might get reintroduced and you're back where you started.

What I can tell you with IM not reaching things is that it settles down over time - in a couple weeks to a month it will probably be working the way you hope for the hubs that are problematic today. And that the idea of logging into the hub closest to the hub/robot you are trying to interact with is usual practice for my team - you're about 10x more likely to connect to a given hub connected to it's tunnel server than the central hub for instance.

The other thing is that most of the underlying infrastructure is pretty resilient to this - get queues for instance are automatically taking advantage of this "closest hub" thing so while you might not be able to go from your central hub out the several hops to the leaf hub, the get queues between each pair of hubs are much more likely to be working.
Original Message:
Sent: 12-04-2019 10:47 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Hi Gene,
> Primary Hub is Windows
> The IM session is on my desktop but I tired this even on the Pri-hub box and same issue. Also about 10 others reported same issue from my team.
> All client hubs and these now 3 client hubs are tunnel clients. We've added different client hubs since then and they work.
> The primary hub doesn't have direct access to these boxes nor the 100+ other client hubs. All thru Tunnel(s)

Just FYI it was mentioned that the problem started happening after we started using the UMP to deploy robots to machines in their environment after discovery. If that helps with possibly root cause of the issue.
We also are seeing hubs that we off boarding, retired shut down last week still appearing in the IM list no matter what we do. We've tried deleting/REMOVING them many times on all hubs (primary, hubcol, tunner servers) and they keep coming back. There is no reference to these hubs anywhere and all their respective get queues to these retired hubs were deleted as well. This looks like a corrupted hubs.sds file I think. This is our first time ever this happening.

Primary Hub

^

Hub Collector (uimcol)

^

Tunnel Server (6 of these) - UIMHUB1|2|3|4|5|6

^

Tunnel Client Hubs (100+)

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 10:25 AM
From: Gene HOWARD
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

So we need some more information.
Is the primary hub windows?
Is the IM you are using running on the primary hub or a desktop?
Are thew new hubs connect via a tunnel to any other hubs?
Does the primary hub have direct access to the hubs?

there should be no need to remove the robots.sds for this issue, only the hub.sds
the robots should reappear on their own but there is no documented specific time interval for this.
if you want it to happen immediately a robot restart will be required.

------------------------------
Gene Howard
Principal Support Engineer
Broadcom

Original Message:
Sent: 12-03-2019 04:34 PM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

So this just started happening recently where we are setting up new hubs and when logged into the Primary Hub thru IM or in Admin Console these new Tunnel Clients are Red. They are in-accessible but if from w/in IM I r-click and Login to a tunnel server they can reach these new tunnel client hub entries.
I have already tried doing a remove of these tunnel clients from w/in all 8 of the hub's probe's hub's tab on the two tunnel client hubs. When they were re-created, the primary hub still could not find a path to them. They are still RED in IM and in Admin Console.
The r-click check access, check transfer just times out from w/in the hub probes hub's tab. Both entries are red.

In my support case (20127165) it was suggested to follow this KB:

Article title: Remote Hub is offline and unreachablea

which say to stop Nimsoft, and delete the hubs.sds and robots.sds file and then restart.

I am trying this in my lab but I'm noticing that no robots are re-appearing under the primary hub afterwards. It's been almost 30 min and so far all 12 robots in lab are still gone from under primary hub.
If I forcibly do a stop/start on the nimsoft service the robots appear but I cannot do this in PROD. Will these robots eventually check back in at some point with the pri-hub in lab?
Also what other options do I have in order to fix this hub corruption as it seems what is happening here? Is deleting the hubs.sds the only method to fixing this issue?

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com
------------------------------

13. RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Recommend

Garin Walsh

Posted Dec 05, 2019 02:00 PM

Hope it helps the next time you have issues. Might give you a better place to start looking at resetting things other than your central hub.

Original Message

Original Message:
Sent: 12-05-2019 01:54 PM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Okay yes that was it. Its case sensitive. It's now working.. Cool script:
(Of course renaming hubs and robots)

----------- Executing script at 12/5/2019 1:48:36 PM ----------

1:48:36 PM
1:48:36 PM Hop: /NMS/On-Prem/NMS1/hub at depth 0
1:48:36 PM Sending gethubs to /NMS/On-Prem/NMS1/hub
1:48:36 PM gethubs successful for /NMS/On-Prem/NMS1/hub
1:48:36 PM Proximity to CLIENT_B is 1 via /NMS/UIMHUB5/uimhub5/hub
1:48:36 PM
1:48:36 PM Hop: /NMS/UIMHUB5/uimhub5/hub at depth 1
1:48:36 PM Sending gethubs to /NMS/UIMHUB5/uimhub5/hub
1:48:36 PM gethubs successful for /NMS/UIMHUB5/uimhub5/hub
1:48:36 PM Proximity to CLIENT_B is 0 via /NMS/CLIENT_B/clientBhub/hub
1:48:36 PM
1:48:36 PM Hop: /NMS/CLIENT_B/clientBhub/hub at depth 2
1:48:36 PM Found

----------- Executing script at 12/5/2019 1:47:51 PM ----------

1:47:50 PM
1:47:50 PM Hop: /NMS/On-Prem/NMS1/hub at depth 0
1:47:50 PM Sending gethubs to /NMS/On-Prem/NMS1/hub
1:47:51 PM gethubs successful for /NMS/On-Prem/NMS1/hub
1:47:51 PM Proximity to Alphaserve is 0 via /NMS/Alphaserve/hub1/hub
1:47:51 PM
1:47:51 PM Hop: /NMS/Alphaserve/hub1/hub at depth 1
1:47:51 PM Found

Thanks....

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-05-2019 01:46 PM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Yes the nms1 box is our primary hub and its where the nas is running. I tried running it of an IM session running off the nsm1 box.
I guess it cannot reach itself or maybe its the case sensitivity. Let me try that...

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-05-2019 01:37 PM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

So, from wherever you are running this, the hub /NMS/Onxxx/xxxxnms1/hub isn't reachable.

The name is apparently valid because the error is "Error code (2) communication error", if the name was completely bad, it would have been Error Code (4) not found.

Once your network crests a particular size (for me it was when I had to move to using tunnel proxies because of the Windows subscriber limits) this gets more and more frequent.

Is /NMS/Onxxx/xxxxnms1/hub where your nas is? If not, I'd start by changing the starting point to the hub where your nas is which should at least get you past the first gethubs callback.
Original Message:
Sent: 12-05-2019 01:25 PM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Ok tried running this on my laptop and it failed. Then directly off the primary hub and also failed.
Getting:
----------- Executing script at 12/5/2019 1:19:05 PM ----------

1:19:04 PM
1:19:04 PM Hop: /NMS/Onxxx/xxxxnms1/hub at depth 0
1:19:04 PM Sending gethubs to /NMS/Onxxxx/xxxxnms1/hub
1:19:04 PM gethubs Failed for /NMS/Onxxxx/xxxxnms1/hub Error code (2) communication error
1:19:05 PM gethubs retries exhausted for /NMS/Onxxxx/xxxxnms1/hub Error code communication error
1:19:05 PM Error 2 : communication error
Error in line 186: attempt to index local 'hubs' (a nil value)

Line 186 is:
>>> if ( hubs[dest] == nil ) then
TimeStamp("Hub hop " .. current_hop .. " doesn't know about " .. dest)
else

On the last few lines I specified:
--== MAIN BEGIN:
HopList = {}
--Enter the Hub you want to test
test_hub = "Alphaserve"
depth = 0
--Specify the starting hub you want to test from here
--User full /DOMAIN/HUB/ROBOT/hub address
--FindNextHop(test_hub, "/Domain/hub/startingrobot/hub")
FindNextHop(test_hub, "/NMS/Onxxx/xxxxnms1/hub")

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-05-2019 11:31 AM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

It's a Lua script - just open nas, select the Auto-Operator/scripts tab, right click in the white space and select "New -> script"

Paste this in.

The last couple lines define the destination hub and the starting point. Update those to match your needs.
Original Message:
Sent: 12-05-2019 10:54 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Nice. What is this written in? How do I run this? Just copy this into the nas a new script or some other way?
TIA...

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 04:44 PM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

I wrote this "traceroute" tool - takes a destination and starting hub and displays the path:

depth = 0function TimeStamp(stuff)   print (string.format("%s %s", timestamp.format(timestamp.now(), "%X"), (stuff or "(nil)")))endfunction tdump(t)   local function dmp(t, l, k)      if type(t) == "table" then         print(string.format("%s%s:", string.rep(" ", l*2), tostring(k)))         for k, v in pairs(t) do            dmp(v, l+1, k)         end      else         print(string.format("%s%s:%s", string.rep(" ", l*2), tostring(k), tostring(t)))      end   end   dmp(t, 1, "root")endfunction error_text(e)   local x   if ( e == nil ) then      x = "error is nil"   elseif ( e == NIME_OK ) then      x = "OK"   elseif ( e == NIME_ERROR ) then      x = "error"   elseif ( e == NIME_COMERR ) then      x = "communication error"   elseif ( e == NIME_INVAL ) then      x = "invalid argument"   elseif ( e == NIME_NOENT ) then      x = "not found"   elseif ( e == NIME_ISENT ) then      x = "already defined"   elseif ( e == NIME_ACCESS ) then      x = "permission denied"   elseif ( e == NIME_AGAIN ) then      x = "temporarily out of resources"   elseif ( e == NIME_NOMEM ) then      x = "out of resources"   elseif ( e == NIME_NOSPC ) then      x = "no space left"   elseif ( e == NIME_EPIPE ) then      x = "broken connection"   elseif ( e == NIME_NOCMD ) then      x = "command not found"   elseif ( e == NIME_LOGIN ) then      x = "login failed"   elseif ( e == NIME_SIDEXP ) then      x = "SID expired"   elseif ( e == NIME_ILLMAC ) then      x = "illegal MAC"   elseif ( e == NIME_ILLSID ) then      x = "illegal SID"   elseif ( e == NIME_SIDSESS ) then      x = "Session id for hub is invalid"   elseif ( e == NIME_EXPIRED ) then      x = "Expired"   elseif ( e == NIME_NOLIC ) then      x = "No valid license"   elseif ( e == NIME_INVLIC ) then      x = "Invalid license"   elseif ( e == NIME_ILLLIC ) then      x = "Illegal license"   elseif ( e == NIME_INVOP ) then      x = "Invalid operation finv"   else  --if ( e >= NIME_USER ) then      x = "user error from this value and up"   end   return xendfunction NimbusRequest (address, command, arguments, retries, noisy, delay)   local counter, response, retcode   if ( retries == nil ) then      retries = 1   end   if (noisy == nil) then      noisy = 1   end   if (delay == nil) then      delay = 1 * 1000   end   counter = retries   repeat      if ( noisy == 1 ) then         TimeStamp ( "Sending " .. command .. " to " .. address)      end      response, retcode = nimbus.request(address, command, arguments)      if ( retcode ~= NIME_OK ) then         -- counter = counter - 1         if ( noisy == 1 ) then            TimeStamp ( command .. " Failed for " .. address .. " Error code (" .. retcode .. ") " .. error_text(retcode) )            --tdump(response)         end         sleep(delay)      end      counter = counter - 1   until ( retcode == NIME_OK or counter == 0 )   if ( retcode == NIME_OK ) then      if ( noisy == 1 ) then         TimeStamp ( command .. " successful for " .. address )      end   else      TimeStamp ( command .. " retries exhausted for " .. address .. " Error code " .. error_text(retcode) )   end   return response, retcodeendfunction HubList(addr)   local gethubs, rc   local h = {}   local key, value   if ( addr == nil ) then      addr = "hub"   end   gethubs, rc = NimbusRequest(addr,"gethubs", nil, 1, 1, 1000)   if (rc ~= 0 ) then      TimeStamp("Error " .. rc .. " : " .. error_text(rc))      return nil   end   for key, value in pairs(gethubs.hublist) do      h[value.name] = { name = value.name, addr = value.addr, source = value.source, proximity = value.proximity, robotname = value.robotname }      if ( value.proximity == 0 ) then         h[value.name].source = value.addr      end   end   for key, value in pairs(gethubs.hublist) do      if ( h[value.name].proximity > 0 ) then         local PathParts = split ( h[value.name].source, "/")         local HubName = PathParts[2]         if (h[HubName] ~= nil) then            h[value.name].source = h[value.name].source .. "/" .. h[HubName].robotname .. "/hub"         else            TimeStamp("Unable to locate hub " .. HubName)            h[value.name].source = h[value.name].source .. "/" .. "ERROR"         end      end   end   return hendfunction FindNextHop(dest, current_hop)   local hubs   local PathParts = split ( current_hop, "/")   local HubName = PathParts[2]   if ( depth == 0 ) then      HopList = {}   end   TimeStamp(" ")   TimeStamp ("Hop: " .. current_hop .. " at depth " .. depth)   if ( HopList[current_hop] == nil ) then      HopList[current_hop] = depth   else      TimeStamp("Encountered circular routing to " .. current_hop)      tdump(HopList)      return 1   end   depth = depth + 1   if (depth > 30) then      TimeStamp( "Reached max depth of " .. depth)   elseif (dest == HubName) then      TimeStamp( "Found")   else      hubs = HubList(current_hop)      if ( hubs[dest] == nil ) then         TimeStamp("Hub hop " .. current_hop .. " doesn't know about " .. dest)      else         TimeStamp("Proximity to " .. dest .. " is " .. hubs[dest].proximity .. " via " .. hubs[dest].source)         rc = FindNextHop(dest, hubs[dest].source)         if ( rc == 1 ) then            return 1         end      end   end   return 0end HopList = {}test_hub = "DestHubHere"depth = 0FindNextHop(test_hub, "/Domain/hub/startingrobot/hub")

Original Message:
Sent: 12-04-2019 11:45 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Ok I'm good now. I just did this on our primary hub. Stopped Nimsoft, deleted the hubs.sds file and then started back up.

W/in a few minutes all the correct hubs showed up and all the old broken ones were gone and the new hubs were accessible now.

Garin I hear ya. I wish there were probe utility calls that would allow you to fix, correct this but there doesn't seem to be any. A visual hub route map would be great if it existed.

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 11:11 AM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

This is pretty much a constant experience with my environment.

With the way the Nimbus network is maintained, where every hub has a copy of everything, if one of those hubs gets a little bit of wrong information it will keep reinfecting that incorrectness into your network. The routing bit is based on the proximity value - essentially every hub in a hub's list of known hubs has a counter of how many hops away it is and which hub is the next hop. And a given hub is constantly listening for updates to this hub network and whenever it gets an update with a smaller proximity it updates the information it has with the new lower proximity information.

This all works great if your network is small, network latency is nonexistent, and hubs never crash or corrupt this data.

One of the problems I have is that at one point a hub set the proximity value of another hub to zero and sent that out into my network. It's impossible to have a proximity less than zero and so this hub will never get updated regardless how wrong it's information is. The problem is that it should have had a proximity of 2 which had put it in the path of the route to a fair number of other hubs.

This in itself is not a problem but then that hub was decommissioned. But it had a proximity of zero and so was the most favored next hop wherever it had been a possible hop. And there was no way to get it out of the network.

Currently the only solution that has been proposed to fix this is to stop UIM in its entirety across my network, delete hubs.sds and robots.sds wherever they exist, and then restart things from the central hub out to the remote systems. The idea being that UIM builds the network of systems on the fly and that if it starts from a clean slate then it will eventually build a correct network. The kicker though is that once all this effort is gone through, there's no guarantee that the same problem won't reoccur and worse, if someone out there had a server that was offline/retired/shut down and then brings it back after all this, that bad information might get reintroduced and you're back where you started.

What I can tell you with IM not reaching things is that it settles down over time - in a couple weeks to a month it will probably be working the way you hope for the hubs that are problematic today. And that the idea of logging into the hub closest to the hub/robot you are trying to interact with is usual practice for my team - you're about 10x more likely to connect to a given hub connected to it's tunnel server than the central hub for instance.

The other thing is that most of the underlying infrastructure is pretty resilient to this - get queues for instance are automatically taking advantage of this "closest hub" thing so while you might not be able to go from your central hub out the several hops to the leaf hub, the get queues between each pair of hubs are much more likely to be working.
Original Message:
Sent: 12-04-2019 10:47 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Hi Gene,
> Primary Hub is Windows
> The IM session is on my desktop but I tired this even on the Pri-hub box and same issue. Also about 10 others reported same issue from my team.
> All client hubs and these now 3 client hubs are tunnel clients. We've added different client hubs since then and they work.
> The primary hub doesn't have direct access to these boxes nor the 100+ other client hubs. All thru Tunnel(s)

Just FYI it was mentioned that the problem started happening after we started using the UMP to deploy robots to machines in their environment after discovery. If that helps with possibly root cause of the issue.
We also are seeing hubs that we off boarding, retired shut down last week still appearing in the IM list no matter what we do. We've tried deleting/REMOVING them many times on all hubs (primary, hubcol, tunner servers) and they keep coming back. There is no reference to these hubs anywhere and all their respective get queues to these retired hubs were deleted as well. This looks like a corrupted hubs.sds file I think. This is our first time ever this happening.

Primary Hub

^

Hub Collector (uimcol)

^

Tunnel Server (6 of these) - UIMHUB1|2|3|4|5|6

^

Tunnel Client Hubs (100+)

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 10:25 AM
From: Gene HOWARD
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

So we need some more information.
Is the primary hub windows?
Is the IM you are using running on the primary hub or a desktop?
Are thew new hubs connect via a tunnel to any other hubs?
Does the primary hub have direct access to the hubs?

there should be no need to remove the robots.sds for this issue, only the hub.sds
the robots should reappear on their own but there is no documented specific time interval for this.
if you want it to happen immediately a robot restart will be required.

------------------------------
Gene Howard
Principal Support Engineer
Broadcom

Original Message:
Sent: 12-03-2019 04:34 PM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

So this just started happening recently where we are setting up new hubs and when logged into the Primary Hub thru IM or in Admin Console these new Tunnel Clients are Red. They are in-accessible but if from w/in IM I r-click and Login to a tunnel server they can reach these new tunnel client hub entries.
I have already tried doing a remove of these tunnel clients from w/in all 8 of the hub's probe's hub's tab on the two tunnel client hubs. When they were re-created, the primary hub still could not find a path to them. They are still RED in IM and in Admin Console.
The r-click check access, check transfer just times out from w/in the hub probes hub's tab. Both entries are red.

In my support case (20127165) it was suggested to follow this KB:

Article title: Remote Hub is offline and unreachablea

which say to stop Nimsoft, and delete the hubs.sds and robots.sds file and then restart.

I am trying this in my lab but I'm noticing that no robots are re-appearing under the primary hub afterwards. It's been almost 30 min and so far all 12 robots in lab are still gone from under primary hub.
If I forcibly do a stop/start on the nimsoft service the robots appear but I cannot do this in PROD. Will these robots eventually check back in at some point with the pri-hub in lab?
Also what other options do I have in order to fix this hub corruption as it seems what is happening here? Is deleting the hubs.sds the only method to fixing this issue?

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com
------------------------------

14. RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Recommend

Garin Walsh

Posted Dec 05, 2019 01:58 PM

I created the script from what I posted and it worked.

I corrupted the case of my domain in the last line of the script and got the error 2 as opposed to 4 which I would have expected.

I corrupted the name of the hub and it still worked which was completely unexpected. I wonder if the nimbus request is falling back to the local hub if the specified hub isn't reachable.

I corrupted the name of the robot and got a a valid gethubs return (again completely unexpected) and a legitimate error message.

So, I think you are on track in checking the case.

-Garin

Original Message

Original Message:
Sent: 12-05-2019 01:46 PM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Yes the nms1 box is our primary hub and its where the nas is running. I tried running it of an IM session running off the nsm1 box.
I guess it cannot reach itself or maybe its the case sensitivity. Let me try that...

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-05-2019 01:37 PM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

So, from wherever you are running this, the hub /NMS/Onxxx/xxxxnms1/hub isn't reachable.

The name is apparently valid because the error is "Error code (2) communication error", if the name was completely bad, it would have been Error Code (4) not found.

Once your network crests a particular size (for me it was when I had to move to using tunnel proxies because of the Windows subscriber limits) this gets more and more frequent.

Is /NMS/Onxxx/xxxxnms1/hub where your nas is? If not, I'd start by changing the starting point to the hub where your nas is which should at least get you past the first gethubs callback.
Original Message:
Sent: 12-05-2019 01:25 PM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Ok tried running this on my laptop and it failed. Then directly off the primary hub and also failed.
Getting:
----------- Executing script at 12/5/2019 1:19:05 PM ----------

1:19:04 PM
1:19:04 PM Hop: /NMS/Onxxx/xxxxnms1/hub at depth 0
1:19:04 PM Sending gethubs to /NMS/Onxxxx/xxxxnms1/hub
1:19:04 PM gethubs Failed for /NMS/Onxxxx/xxxxnms1/hub Error code (2) communication error
1:19:05 PM gethubs retries exhausted for /NMS/Onxxxx/xxxxnms1/hub Error code communication error
1:19:05 PM Error 2 : communication error
Error in line 186: attempt to index local 'hubs' (a nil value)

Line 186 is:
>>> if ( hubs[dest] == nil ) then
TimeStamp("Hub hop " .. current_hop .. " doesn't know about " .. dest)
else

On the last few lines I specified:
--== MAIN BEGIN:
HopList = {}
--Enter the Hub you want to test
test_hub = "Alphaserve"
depth = 0
--Specify the starting hub you want to test from here
--User full /DOMAIN/HUB/ROBOT/hub address
--FindNextHop(test_hub, "/Domain/hub/startingrobot/hub")
FindNextHop(test_hub, "/NMS/Onxxx/xxxxnms1/hub")

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-05-2019 11:31 AM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

It's a Lua script - just open nas, select the Auto-Operator/scripts tab, right click in the white space and select "New -> script"

Paste this in.

The last couple lines define the destination hub and the starting point. Update those to match your needs.
Original Message:
Sent: 12-05-2019 10:54 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Nice. What is this written in? How do I run this? Just copy this into the nas a new script or some other way?
TIA...

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 04:44 PM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

I wrote this "traceroute" tool - takes a destination and starting hub and displays the path:

depth = 0function TimeStamp(stuff)   print (string.format("%s %s", timestamp.format(timestamp.now(), "%X"), (stuff or "(nil)")))endfunction tdump(t)   local function dmp(t, l, k)      if type(t) == "table" then         print(string.format("%s%s:", string.rep(" ", l*2), tostring(k)))         for k, v in pairs(t) do            dmp(v, l+1, k)         end      else         print(string.format("%s%s:%s", string.rep(" ", l*2), tostring(k), tostring(t)))      end   end   dmp(t, 1, "root")endfunction error_text(e)   local x   if ( e == nil ) then      x = "error is nil"   elseif ( e == NIME_OK ) then      x = "OK"   elseif ( e == NIME_ERROR ) then      x = "error"   elseif ( e == NIME_COMERR ) then      x = "communication error"   elseif ( e == NIME_INVAL ) then      x = "invalid argument"   elseif ( e == NIME_NOENT ) then      x = "not found"   elseif ( e == NIME_ISENT ) then      x = "already defined"   elseif ( e == NIME_ACCESS ) then      x = "permission denied"   elseif ( e == NIME_AGAIN ) then      x = "temporarily out of resources"   elseif ( e == NIME_NOMEM ) then      x = "out of resources"   elseif ( e == NIME_NOSPC ) then      x = "no space left"   elseif ( e == NIME_EPIPE ) then      x = "broken connection"   elseif ( e == NIME_NOCMD ) then      x = "command not found"   elseif ( e == NIME_LOGIN ) then      x = "login failed"   elseif ( e == NIME_SIDEXP ) then      x = "SID expired"   elseif ( e == NIME_ILLMAC ) then      x = "illegal MAC"   elseif ( e == NIME_ILLSID ) then      x = "illegal SID"   elseif ( e == NIME_SIDSESS ) then      x = "Session id for hub is invalid"   elseif ( e == NIME_EXPIRED ) then      x = "Expired"   elseif ( e == NIME_NOLIC ) then      x = "No valid license"   elseif ( e == NIME_INVLIC ) then      x = "Invalid license"   elseif ( e == NIME_ILLLIC ) then      x = "Illegal license"   elseif ( e == NIME_INVOP ) then      x = "Invalid operation finv"   else  --if ( e >= NIME_USER ) then      x = "user error from this value and up"   end   return xendfunction NimbusRequest (address, command, arguments, retries, noisy, delay)   local counter, response, retcode   if ( retries == nil ) then      retries = 1   end   if (noisy == nil) then      noisy = 1   end   if (delay == nil) then      delay = 1 * 1000   end   counter = retries   repeat      if ( noisy == 1 ) then         TimeStamp ( "Sending " .. command .. " to " .. address)      end      response, retcode = nimbus.request(address, command, arguments)      if ( retcode ~= NIME_OK ) then         -- counter = counter - 1         if ( noisy == 1 ) then            TimeStamp ( command .. " Failed for " .. address .. " Error code (" .. retcode .. ") " .. error_text(retcode) )            --tdump(response)         end         sleep(delay)      end      counter = counter - 1   until ( retcode == NIME_OK or counter == 0 )   if ( retcode == NIME_OK ) then      if ( noisy == 1 ) then         TimeStamp ( command .. " successful for " .. address )      end   else      TimeStamp ( command .. " retries exhausted for " .. address .. " Error code " .. error_text(retcode) )   end   return response, retcodeendfunction HubList(addr)   local gethubs, rc   local h = {}   local key, value   if ( addr == nil ) then      addr = "hub"   end   gethubs, rc = NimbusRequest(addr,"gethubs", nil, 1, 1, 1000)   if (rc ~= 0 ) then      TimeStamp("Error " .. rc .. " : " .. error_text(rc))      return nil   end   for key, value in pairs(gethubs.hublist) do      h[value.name] = { name = value.name, addr = value.addr, source = value.source, proximity = value.proximity, robotname = value.robotname }      if ( value.proximity == 0 ) then         h[value.name].source = value.addr      end   end   for key, value in pairs(gethubs.hublist) do      if ( h[value.name].proximity > 0 ) then         local PathParts = split ( h[value.name].source, "/")         local HubName = PathParts[2]         if (h[HubName] ~= nil) then            h[value.name].source = h[value.name].source .. "/" .. h[HubName].robotname .. "/hub"         else            TimeStamp("Unable to locate hub " .. HubName)            h[value.name].source = h[value.name].source .. "/" .. "ERROR"         end      end   end   return hendfunction FindNextHop(dest, current_hop)   local hubs   local PathParts = split ( current_hop, "/")   local HubName = PathParts[2]   if ( depth == 0 ) then      HopList = {}   end   TimeStamp(" ")   TimeStamp ("Hop: " .. current_hop .. " at depth " .. depth)   if ( HopList[current_hop] == nil ) then      HopList[current_hop] = depth   else      TimeStamp("Encountered circular routing to " .. current_hop)      tdump(HopList)      return 1   end   depth = depth + 1   if (depth > 30) then      TimeStamp( "Reached max depth of " .. depth)   elseif (dest == HubName) then      TimeStamp( "Found")   else      hubs = HubList(current_hop)      if ( hubs[dest] == nil ) then         TimeStamp("Hub hop " .. current_hop .. " doesn't know about " .. dest)      else         TimeStamp("Proximity to " .. dest .. " is " .. hubs[dest].proximity .. " via " .. hubs[dest].source)         rc = FindNextHop(dest, hubs[dest].source)         if ( rc == 1 ) then            return 1         end      end   end   return 0end HopList = {}test_hub = "DestHubHere"depth = 0FindNextHop(test_hub, "/Domain/hub/startingrobot/hub")

Original Message:
Sent: 12-04-2019 11:45 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Ok I'm good now. I just did this on our primary hub. Stopped Nimsoft, deleted the hubs.sds file and then started back up.

W/in a few minutes all the correct hubs showed up and all the old broken ones were gone and the new hubs were accessible now.

Garin I hear ya. I wish there were probe utility calls that would allow you to fix, correct this but there doesn't seem to be any. A visual hub route map would be great if it existed.

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 11:11 AM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

This is pretty much a constant experience with my environment.

With the way the Nimbus network is maintained, where every hub has a copy of everything, if one of those hubs gets a little bit of wrong information it will keep reinfecting that incorrectness into your network. The routing bit is based on the proximity value - essentially every hub in a hub's list of known hubs has a counter of how many hops away it is and which hub is the next hop. And a given hub is constantly listening for updates to this hub network and whenever it gets an update with a smaller proximity it updates the information it has with the new lower proximity information.

This all works great if your network is small, network latency is nonexistent, and hubs never crash or corrupt this data.

One of the problems I have is that at one point a hub set the proximity value of another hub to zero and sent that out into my network. It's impossible to have a proximity less than zero and so this hub will never get updated regardless how wrong it's information is. The problem is that it should have had a proximity of 2 which had put it in the path of the route to a fair number of other hubs.

This in itself is not a problem but then that hub was decommissioned. But it had a proximity of zero and so was the most favored next hop wherever it had been a possible hop. And there was no way to get it out of the network.

Currently the only solution that has been proposed to fix this is to stop UIM in its entirety across my network, delete hubs.sds and robots.sds wherever they exist, and then restart things from the central hub out to the remote systems. The idea being that UIM builds the network of systems on the fly and that if it starts from a clean slate then it will eventually build a correct network. The kicker though is that once all this effort is gone through, there's no guarantee that the same problem won't reoccur and worse, if someone out there had a server that was offline/retired/shut down and then brings it back after all this, that bad information might get reintroduced and you're back where you started.

What I can tell you with IM not reaching things is that it settles down over time - in a couple weeks to a month it will probably be working the way you hope for the hubs that are problematic today. And that the idea of logging into the hub closest to the hub/robot you are trying to interact with is usual practice for my team - you're about 10x more likely to connect to a given hub connected to it's tunnel server than the central hub for instance.

The other thing is that most of the underlying infrastructure is pretty resilient to this - get queues for instance are automatically taking advantage of this "closest hub" thing so while you might not be able to go from your central hub out the several hops to the leaf hub, the get queues between each pair of hubs are much more likely to be working.
Original Message:
Sent: 12-04-2019 10:47 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Hi Gene,
> Primary Hub is Windows
> The IM session is on my desktop but I tired this even on the Pri-hub box and same issue. Also about 10 others reported same issue from my team.
> All client hubs and these now 3 client hubs are tunnel clients. We've added different client hubs since then and they work.
> The primary hub doesn't have direct access to these boxes nor the 100+ other client hubs. All thru Tunnel(s)

Just FYI it was mentioned that the problem started happening after we started using the UMP to deploy robots to machines in their environment after discovery. If that helps with possibly root cause of the issue.
We also are seeing hubs that we off boarding, retired shut down last week still appearing in the IM list no matter what we do. We've tried deleting/REMOVING them many times on all hubs (primary, hubcol, tunner servers) and they keep coming back. There is no reference to these hubs anywhere and all their respective get queues to these retired hubs were deleted as well. This looks like a corrupted hubs.sds file I think. This is our first time ever this happening.

Primary Hub

^

Hub Collector (uimcol)

^

Tunnel Server (6 of these) - UIMHUB1|2|3|4|5|6

^

Tunnel Client Hubs (100+)

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com

Original Message:
Sent: 12-04-2019 10:25 AM
From: Gene HOWARD
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

So we need some more information.
Is the primary hub windows?
Is the IM you are using running on the primary hub or a desktop?
Are thew new hubs connect via a tunnel to any other hubs?
Does the primary hub have direct access to the hubs?

there should be no need to remove the robots.sds for this issue, only the hub.sds
the robots should reappear on their own but there is no documented specific time interval for this.
if you want it to happen immediately a robot restart will be required.

------------------------------
Gene Howard
Principal Support Engineer
Broadcom

Original Message:
Sent: 12-03-2019 04:34 PM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub

So this just started happening recently where we are setting up new hubs and when logged into the Primary Hub thru IM or in Admin Console these new Tunnel Clients are Red. They are in-accessible but if from w/in IM I r-click and Login to a tunnel server they can reach these new tunnel client hub entries.
I have already tried doing a remove of these tunnel clients from w/in all 8 of the hub's probe's hub's tab on the two tunnel client hubs. When they were re-created, the primary hub still could not find a path to them. They are still RED in IM and in Admin Console.
The r-click check access, check transfer just times out from w/in the hub probes hub's tab. Both entries are red.

In my support case (20127165) it was suggested to follow this KB:

Article title: Remote Hub is offline and unreachablea

which say to stop Nimsoft, and delete the hubs.sds and robots.sds file and then restart.

I am trying this in my lab but I'm noticing that no robots are re-appearing under the primary hub afterwards. It's been almost 30 min and so far all 12 robots in lab are still gone from under primary hub.
If I forcibly do a stop/start on the nimsoft service the robots appear but I cannot do this in PROD. Will these robots eventually check back in at some point with the pri-hub in lab?
Also what other options do I have in order to fix this hub corruption as it seems what is happening here? Is deleting the hubs.sds the only method to fixing this issue?

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com
------------------------------

15. RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Recommend

Daniel Blanco

Posted Dec 11, 2019 11:12 AM
Edited by Daniel Blanco Dec 11, 2019 12:15 PM

So this problem started happening again and it happened 3x this morning. To clarify the issue:

So the issue is before it's happening, when logged into IM or Admin console we can access these new client hubs. In this example the Client-Z hubs are reachable. We can get to them in IM and we can the robots under their hub.

After our engineer tries to deploy robots to discovered machines under Client-Z things break. When logged into IM and connected to our primary hub the new clients hub Client-Z is RED. They are down and not accessible. But if we log onto another hub, a tunnel server or our hub collector hub, the Client-Z hub then works. We can access it.
So somehow the Client-Z hub's address/location is getting corrupted/broken on our primary hub. Its not until we delete the hubs.sds file on the primary hub and re-start that then Client-Z is accessible again in IM or Admin Console.

So the engineer did the stop nimsoft on pri-hub, deleted the hubs.sds file and restarted. Waited and eventually all hubs showed up. He was able to access Client-Z. He then re-tried deploying and it broke again.

We opened a new case on this. 20140704 HUB's showing down under Primary HUB(On-Prem)

Anyone ever see this happen to them? This never happened in 8.x but just started happening in 9.x as we hit now 5x.

We are running latest of robot v9.20HF7 and hub v9.20HF6

16. RE: Tunnel Clients not accessible from PriHub but can access via TunnelHub

Recommend

Daniel Blanco

Posted Feb 06, 2020 11:31 AM

So this happened again today. We deployed the robot to a hub site and the primary hub's SDS got corrupted. When in IM and connected to the primary hub we now cannot access the hub we deployed the robot to using the UMP deploy robot feature. This is definitely a defect.

Using Garin's script to check if the hub can reach the broken hub it shows it works but in IM the hub is red and won't open but if we log into any of the tunnel servers we can access that hub in IM.

10:35:24 AM

10:35:24 AM Hop: /NMS/On-Prem/NMS1/hub at depth 0

10:35:24 AM Sending gethubs to /NMS/On-Prem/NMS1/hub

10:35:24 AM gethubs successful for /NMS/On-Prem/NMS1/hub

10:35:24 AM Proximity to ClientA is 0 via /NMS/ClientA/clientA-asrelay/hub

10:35:24 AM

10:35:24 AM Hop: /NMS/ClientA/clientA-asrelay/hub at depth 1

10:35:24 AM Found

Anyone else hit this issue?

------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com
------------------------------

Original Message

DX Unified Infrastructure Management