Okay yes that was it. Its case sensitive. It's now working.. Cool script:
(
Of course renaming hubs and robots)
----------- Executing script at 12/5/2019 1:48:36 PM ----------
1:48:36 PM
1:48:36 PM Hop: /NMS/On-Prem/NMS1/hub at depth 0
1:48:36 PM Sending gethubs to /NMS/On-Prem/NMS1/hub
1:48:36 PM gethubs successful for /NMS/On-Prem/NMS1/hub
1:48:36 PM Proximity to CLIENT_B is 1 via /NMS/UIMHUB5/uimhub5/hub
1:48:36 PM
1:48:36 PM Hop: /NMS/UIMHUB5/uimhub5/hub at depth 1
1:48:36 PM Sending gethubs to /NMS/UIMHUB5/uimhub5/hub
1:48:36 PM gethubs successful for /NMS/UIMHUB5/uimhub5/hub
1:48:36 PM Proximity to CLIENT_B is 0 via /NMS/CLIENT_B/clientBhub/hub
1:48:36 PM
1:48:36 PM Hop: /NMS/CLIENT_B/clientBhub/hub at depth 2
1:48:36 PM Found
----------- Executing script at 12/5/2019 1:47:51 PM ----------
1:47:50 PM
1:47:50 PM Hop: /NMS/On-Prem/NMS1/hub at depth 0
1:47:50 PM Sending gethubs to /NMS/On-Prem/NMS1/hub
1:47:51 PM gethubs successful for /NMS/On-Prem/NMS1/hub
1:47:51 PM Proximity to Alphaserve is 0 via /NMS/Alphaserve/hub1/hub
1:47:51 PM
1:47:51 PM Hop: /NMS/Alphaserve/hub1/hub at depth 1
1:47:51 PM Found
Thanks....
------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com------------------------------
Original Message:
Sent: 12-05-2019 01:46 PM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub
Yes the nms1 box is our primary hub and its where the nas is running. I tried running it of an IM session running off the nsm1 box.
I guess it cannot reach itself or maybe its the case sensitivity. Let me try that...
------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com
Original Message:
Sent: 12-05-2019 01:37 PM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub
So, from wherever you are running this, the hub /NMS/Onxxx/xxxxnms1/hub isn't reachable.
The name is apparently valid because the error is "Error code (2) communication error", if the name was completely bad, it would have been Error Code (4) not found.
Once your network crests a particular size (for me it was when I had to move to using tunnel proxies because of the Windows subscriber limits) this gets more and more frequent.
Is /NMS/Onxxx/xxxxnms1/hub where your nas is? If not, I'd start by changing the starting point to the hub where your nas is which should at least get you past the first gethubs callback.
Original Message:
Sent: 12-05-2019 01:25 PM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub
Ok tried running this on my laptop and it failed. Then directly off the primary hub and also failed.
Getting:
----------- Executing script at 12/5/2019 1:19:05 PM ----------
1:19:04 PM
1:19:04 PM Hop: /NMS/Onxxx/xxxxnms1/hub at depth 0
1:19:04 PM Sending gethubs to /NMS/Onxxxx/xxxxnms1/hub
1:19:04 PM gethubs Failed for /NMS/Onxxxx/xxxxnms1/hub Error code (2) communication error
1:19:05 PM gethubs retries exhausted for /NMS/Onxxxx/xxxxnms1/hub Error code communication error
1:19:05 PM Error 2 : communication error
Error in line 186: attempt to index local 'hubs' (a nil value)
Line 186 is:
>>> if ( hubs[dest] == nil ) then
TimeStamp("Hub hop " .. current_hop .. " doesn't know about " .. dest)
else
On the last few lines I specified:
--== MAIN BEGIN:
HopList = {}
--Enter the Hub you want to test
test_hub = "Alphaserve"
depth = 0
--Specify the starting hub you want to test from here
--User full /DOMAIN/HUB/ROBOT/hub address
--FindNextHop(test_hub, "/Domain/hub/startingrobot/hub")
FindNextHop(test_hub, "/NMS/Onxxx/xxxxnms1/hub")
------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com
Original Message:
Sent: 12-05-2019 11:31 AM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub
It's a Lua script - just open nas, select the Auto-Operator/scripts tab, right click in the white space and select "New -> script"
Paste this in.
The last couple lines define the destination hub and the starting point. Update those to match your needs.
Original Message:
Sent: 12-05-2019 10:54 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub
Nice. What is this written in? How do I run this? Just copy this into the nas a new script or some other way?
TIA...
------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com
Original Message:
Sent: 12-04-2019 04:44 PM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub
I wrote this "traceroute" tool - takes a destination and starting hub and displays the path:
depth = 0function TimeStamp(stuff) print (string.format("%s %s", timestamp.format(timestamp.now(), "%X"), (stuff or "(nil)")))endfunction tdump(t) local function dmp(t, l, k) if type(t) == "table" then print(string.format("%s%s:", string.rep(" ", l*2), tostring(k))) for k, v in pairs(t) do dmp(v, l+1, k) end else print(string.format("%s%s:%s", string.rep(" ", l*2), tostring(k), tostring(t))) end end dmp(t, 1, "root")endfunction error_text(e) local x if ( e == nil ) then x = "error is nil" elseif ( e == NIME_OK ) then x = "OK" elseif ( e == NIME_ERROR ) then x = "error" elseif ( e == NIME_COMERR ) then x = "communication error" elseif ( e == NIME_INVAL ) then x = "invalid argument" elseif ( e == NIME_NOENT ) then x = "not found" elseif ( e == NIME_ISENT ) then x = "already defined" elseif ( e == NIME_ACCESS ) then x = "permission denied" elseif ( e == NIME_AGAIN ) then x = "temporarily out of resources" elseif ( e == NIME_NOMEM ) then x = "out of resources" elseif ( e == NIME_NOSPC ) then x = "no space left" elseif ( e == NIME_EPIPE ) then x = "broken connection" elseif ( e == NIME_NOCMD ) then x = "command not found" elseif ( e == NIME_LOGIN ) then x = "login failed" elseif ( e == NIME_SIDEXP ) then x = "SID expired" elseif ( e == NIME_ILLMAC ) then x = "illegal MAC" elseif ( e == NIME_ILLSID ) then x = "illegal SID" elseif ( e == NIME_SIDSESS ) then x = "Session id for hub is invalid" elseif ( e == NIME_EXPIRED ) then x = "Expired" elseif ( e == NIME_NOLIC ) then x = "No valid license" elseif ( e == NIME_INVLIC ) then x = "Invalid license" elseif ( e == NIME_ILLLIC ) then x = "Illegal license" elseif ( e == NIME_INVOP ) then x = "Invalid operation finv" else --if ( e >= NIME_USER ) then x = "user error from this value and up" end return xendfunction NimbusRequest (address, command, arguments, retries, noisy, delay) local counter, response, retcode if ( retries == nil ) then retries = 1 end if (noisy == nil) then noisy = 1 end if (delay == nil) then delay = 1 * 1000 end counter = retries repeat if ( noisy == 1 ) then TimeStamp ( "Sending " .. command .. " to " .. address) end response, retcode = nimbus.request(address, command, arguments) if ( retcode ~= NIME_OK ) then -- counter = counter - 1 if ( noisy == 1 ) then TimeStamp ( command .. " Failed for " .. address .. " Error code (" .. retcode .. ") " .. error_text(retcode) ) --tdump(response) end sleep(delay) end counter = counter - 1 until ( retcode == NIME_OK or counter == 0 ) if ( retcode == NIME_OK ) then if ( noisy == 1 ) then TimeStamp ( command .. " successful for " .. address ) end else TimeStamp ( command .. " retries exhausted for " .. address .. " Error code " .. error_text(retcode) ) end return response, retcodeendfunction HubList(addr) local gethubs, rc local h = {} local key, value if ( addr == nil ) then addr = "hub" end gethubs, rc = NimbusRequest(addr,"gethubs", nil, 1, 1, 1000) if (rc ~= 0 ) then TimeStamp("Error " .. rc .. " : " .. error_text(rc)) return nil end for key, value in pairs(gethubs.hublist) do h[value.name] = { name = value.name, addr = value.addr, source = value.source, proximity = value.proximity, robotname = value.robotname } if ( value.proximity == 0 ) then h[value.name].source = value.addr end end for key, value in pairs(gethubs.hublist) do if ( h[value.name].proximity > 0 ) then local PathParts = split ( h[value.name].source, "/") local HubName = PathParts[2] if (h[HubName] ~= nil) then h[value.name].source = h[value.name].source .. "/" .. h[HubName].robotname .. "/hub" else TimeStamp("Unable to locate hub " .. HubName) h[value.name].source = h[value.name].source .. "/" .. "ERROR" end end end return hendfunction FindNextHop(dest, current_hop) local hubs local PathParts = split ( current_hop, "/") local HubName = PathParts[2] if ( depth == 0 ) then HopList = {} end TimeStamp(" ") TimeStamp ("Hop: " .. current_hop .. " at depth " .. depth) if ( HopList[current_hop] == nil ) then HopList[current_hop] = depth else TimeStamp("Encountered circular routing to " .. current_hop) tdump(HopList) return 1 end depth = depth + 1 if (depth > 30) then TimeStamp( "Reached max depth of " .. depth) elseif (dest == HubName) then TimeStamp( "Found") else hubs = HubList(current_hop) if ( hubs[dest] == nil ) then TimeStamp("Hub hop " .. current_hop .. " doesn't know about " .. dest) else TimeStamp("Proximity to " .. dest .. " is " .. hubs[dest].proximity .. " via " .. hubs[dest].source) rc = FindNextHop(dest, hubs[dest].source) if ( rc == 1 ) then return 1 end end end return 0end HopList = {}test_hub = "DestHubHere"depth = 0FindNextHop(test_hub, "/Domain/hub/startingrobot/hub")
Original Message:
Sent: 12-04-2019 11:45 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub
Ok I'm good now. I just did this on our primary hub. Stopped Nimsoft, deleted the hubs.sds file and then started back up.W/in a few minutes all the correct hubs showed up and all the old broken ones were gone and the new hubs were accessible now.
Garin I hear ya. I wish there were probe utility calls that would allow you to fix, correct this but there doesn't seem to be any. A visual hub route map would be great if it existed.
------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com
Original Message:
Sent: 12-04-2019 11:11 AM
From: Garin Walsh
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub
This is pretty much a constant experience with my environment.
With the way the Nimbus network is maintained, where every hub has a copy of everything, if one of those hubs gets a little bit of wrong information it will keep reinfecting that incorrectness into your network. The routing bit is based on the proximity value - essentially every hub in a hub's list of known hubs has a counter of how many hops away it is and which hub is the next hop. And a given hub is constantly listening for updates to this hub network and whenever it gets an update with a smaller proximity it updates the information it has with the new lower proximity information.
This all works great if your network is small, network latency is nonexistent, and hubs never crash or corrupt this data.
One of the problems I have is that at one point a hub set the proximity value of another hub to zero and sent that out into my network. It's impossible to have a proximity less than zero and so this hub will never get updated regardless how wrong it's information is. The problem is that it should have had a proximity of 2 which had put it in the path of the route to a fair number of other hubs.
This in itself is not a problem but then that hub was decommissioned. But it had a proximity of zero and so was the most favored next hop wherever it had been a possible hop. And there was no way to get it out of the network.
Currently the only solution that has been proposed to fix this is to stop UIM in its entirety across my network, delete hubs.sds and robots.sds wherever they exist, and then restart things from the central hub out to the remote systems. The idea being that UIM builds the network of systems on the fly and that if it starts from a clean slate then it will eventually build a correct network. The kicker though is that once all this effort is gone through, there's no guarantee that the same problem won't reoccur and worse, if someone out there had a server that was offline/retired/shut down and then brings it back after all this, that bad information might get reintroduced and you're back where you started.
What I can tell you with IM not reaching things is that it settles down over time - in a couple weeks to a month it will probably be working the way you hope for the hubs that are problematic today. And that the idea of logging into the hub closest to the hub/robot you are trying to interact with is usual practice for my team - you're about 10x more likely to connect to a given hub connected to it's tunnel server than the central hub for instance.
The other thing is that most of the underlying infrastructure is pretty resilient to this - get queues for instance are automatically taking advantage of this "closest hub" thing so while you might not be able to go from your central hub out the several hops to the leaf hub, the get queues between each pair of hubs are much more likely to be working.
Original Message:
Sent: 12-04-2019 10:47 AM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub
Hi Gene,
> Primary Hub is Windows
> The IM session is on my desktop but I tired this even on the Pri-hub box and same issue. Also about 10 others reported same issue from my team.
> All client hubs and these now 3 client hubs are tunnel clients. We've added different client hubs since then and they work.
> The primary hub doesn't have direct access to these boxes nor the 100+ other client hubs. All thru Tunnel(s)
Just FYI it was mentioned that the problem started happening after we started using the UMP to deploy robots to machines in their environment after discovery. If that helps with possibly root cause of the issue.
We also are seeing hubs that we off boarding, retired shut down last week still appearing in the IM list no matter what we do. We've tried deleting/REMOVING them many times on all hubs (primary, hubcol, tunner servers) and they keep coming back. There is no reference to these hubs anywhere and all their respective get queues to these retired hubs were deleted as well. This looks like a corrupted hubs.sds file I think. This is our first time ever this happening.
Primary Hub
^
Hub Collector (uimcol)
^
Tunnel Server (6 of these) - UIMHUB1|2|3|4|5|6
^
Tunnel Client Hubs (100+)
------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com
Original Message:
Sent: 12-04-2019 10:25 AM
From: Gene HOWARD
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub
So we need some more information.
Is the primary hub windows?
Is the IM you are using running on the primary hub or a desktop?
Are thew new hubs connect via a tunnel to any other hubs?
Does the primary hub have direct access to the hubs?
there should be no need to remove the robots.sds for this issue, only the hub.sds
the robots should reappear on their own but there is no documented specific time interval for this.
if you want it to happen immediately a robot restart will be required.
------------------------------
Gene Howard
Principal Support Engineer
Broadcom
Original Message:
Sent: 12-03-2019 04:34 PM
From: Daniel Blanco
Subject: Tunnel Clients not accessible from PriHub but can access via TunnelHub
So this just started happening recently where we are setting up new hubs and when logged into the Primary Hub thru IM or in Admin Console these new Tunnel Clients are Red. They are in-accessible but if from w/in IM I r-click and Login to a tunnel server they can reach these new tunnel client hub entries.
I have already tried doing a remove of these tunnel clients from w/in all 8 of the hub's probe's hub's tab on the two tunnel client hubs. When they were re-created, the primary hub still could not find a path to them. They are still RED in IM and in Admin Console.
The r-click check access, check transfer just times out from w/in the hub probes hub's tab. Both entries are red.
In my support case (20127165) it was suggested to follow this KB:
which say to stop Nimsoft, and delete the hubs.sds and robots.sds file and then restart.
I am trying this in my lab but I'm noticing that no robots are re-appearing under the primary hub afterwards. It's been almost 30 min and so far all 12 robots in lab are still gone from under primary hub.
If I forcibly do a stop/start on the nimsoft service the robots appear but I cannot do this in PROD. Will these robots eventually check back in at some point with the pri-hub in lab?
Also what other options do I have in order to fix this hub corruption as it seems what is happening here? Is deleting the hubs.sds the only method to fixing this issue?
------------------------------
Daniel Blanco
Enterprise Tools Team Architect
DBlanco@alphaserveit.com
------------------------------