Bit of advice, don't rely on the 'robot inactive' alarm generated by the hub for the heartbeat, default interval is 15 minutes and you could wait upto 30 minutes for the alarm to come through and reducing the interval on the controller has caused me false robot inactive alarms in the past.
Don't forget you'd only need one net connect probe as that'll ping all your servers/devices and you can mass configure them by dragging a list of server names and IPs and it the most reliable method on detecting a server down (although you can get network interfaces still responding to ping while the O/S has crashed...). Just have to weigh up the investment I guess.
There really needs to be a better way than setting up so many net_connect probes and profiles.
Where I work we have a multi-tennant environment. We have HUB servers in each tenant environment.
We would have to setup the net_connect probe on every tenants hub server, and then configure a net_connect profile for every target ROBOT in every tenant hub. THat sucks.
Seems that when you are logged into the primary hub and have infrastructure manager open, you can hit "F5" to refresh the Domain tree of origins and robots. If a Robot is down or not responding it turns red fairly instantly.
Why not have a Primary HUB based probe that alarms to the NAS if any Robot is in a RED state?
That is what I really want. It would dynamically include all new robots setup in the future that way.
And it would just need to poll every 3 minutes or so and return one alarm for each Robot in a RED state.
Anyone know how to do that?
If I understand your idea correctly, that already exists. When a robot goes down, the default behavior for the hub is to send an alarm message. The alarm messages repeat until the robot is up again (every minute I think).
The trick with robot down alarms it to set the hub update interval on the robot/controller to make sure you find out quickly enough when they go down. If a robot stops cleanly, it should tell the hub it is doing down, and you will see it turn red immediately. But if it crashes or network connectivity to the hub is severed, you may not get a robot down alarm for up to 1.5 times the update interval. The default update interval is 15 minutes, meaning the hub will not conclude a robot is down until 22.5 minutes have passed since the last check-in. With a 5-minute update interval, the hub will generate the robot down alarm after 7.5 minutes since the last check-in.
have a look at Discovery and Unified Service Manager in the UMP? You can set up monitoring profiles to automaticaly deploy ping tests to net_connect probes. This may cover what you need.