DX Unified Infrastructure Management

 View Only

 How to alert if tunnel client is down?

Eivind Olsen's profile image
Eivind Olsen posted May 31, 2023 10:05 AM

I have a requirement to create an alert when a tunnel between two hubs stops working. The setup is currently like this:

PRIMARY_HUB (tunnel client) ---- (tunnel server) TUNNEL_HUB (tunnel server) ---- (tunnel client) REMOTE_HUB

(in other words, since my text-drawing skills are limited, TUNNEL_HUB is a tunnel server, and PRIMARY_HUB and REMOTE_HUB both connect to it as tunnel clients)

If I check the settings on all these servers (HUB config -> Tunnels -> Advanced), they all have "Tunnel is Hanging Timeout" set to 120, but we've had tunnels down for longer than that (when REMOTE_HUB had problems) without seeing any alerts for it.

UIM - Alarm configuration for tunnels when they go down (broadcom.com) does suggest that "the best way to monitor other hubs is to use the net_connect probe to check if the NimBUS_HUB service (port 48002) is responding", but that's not really an option here. Due to security policies, the TUNNEL_HUB isn't allowed to initiate any connections into the net where REMOTE_HUB resides.

Are anyone successfully doing this, or have you found a work-around of some sorts? Oh, this is on UIM 20.4 by the way.

Regards,
Eivind Olsen

Garin Walsh's profile image
Garin Walsh

The usual approach to this is to monitor for some evidence that the tunnel is working, not that it's down. And then infer from that whether the tunnel is there.

So for instance, query against s_qos_snapshot with sql/jdbc_response for a qos that's captured reliably from every hub (like CDM cpu usage) and then if that value in the snapshot table gets old, you know that no data is coming in from that hub/robot. 

That will get many more failure cases than just the tunnel down but those typically are more valuable than just the infrastructure failure.