2017-03-16 12:51:46 WARNING 10.2.161.162 ServerReachabilityMonitor.run Server '10.2.161.161' reachability changed: 7901=open (filtered) 2017-03-16 12:51:30 WARNING 10.2.161.162 ServerReachabilityMonitor.run Server '10.2.161.161' reachability changed: 7901=filtered (open) 2017-03-16 12:24:06 WARNING 10.2.161.161 ServerReachabilityMonitor.run Server '10.2.161.162' reachability changed: 5900=open (filtered) 2017-03-16 12:23:51 WARNING 10.2.161.161 ServerReachabilityMonitor.run Server '10.2.161.162' reachability changed: 5900=filtered (open) 2017-03-16 02:56:57 WARNING 10.2.161.162 ServerReachabilityMonitor.run Server '10.2.161.161' reachability changed: 3306=open (filtered) 2017-03-16 02:56:42 WARNING 10.2.161.162 ServerReachabilityMonitor.run Server '10.2.161.161' reachability changed: 3306=filtered (open)
2 virtual appliances, on the same VLAN, no firewall policies in between. We are observing these warnings a couple of times daily. CPU, memory and disk usage is minimal on both members.
During previous weeks, there had been some occasions where one of the members got out of sync. Stopping and starting the cluster is needed if the member does not successfully re-sync's by itself. This problems always occurs after the warnings get more frequent.
For 2 days now, cluster is running OK, but occasionally we're still observing these lines. Is this information sufficient for justifying a network quality issue, or what would be the usual suspects if there's any?
Yes, I agree that these messages appear to be network problems.
I understand that the VMs are in the same VLAN, so one question I would ask is what kind of distance is there between them. For example, is this a local/same DC, one city to another, state to state, etc?
If you check further in the Session logs and or Cluster logs, you may find more answers. Look at the times that the cluster became unsynchronized, especially with messages about a node not being able to ping the gateway.
You can try below checking list.
1. Checking logs ( Sessions -> logs) and Cluster logs(Config -> synchronization -> View Cluster logs) to see what error message you see.
2. Checking network ports(5900, 7900, 7901, 7902) for cluster using tools (Devices -> Tools) on both nodes to another.
we are expecting all ports to be open.
3. Checking network configuration on both nodes to make sure you have set up gateway properly.
Based on what you get from above checking you may have idea who/where to check further. you can also update here to check together. Hope this would be helpful.
As Anthony already indicated it is suggested to operate a PAM Cluster in the same site only.
You should determine the network quality in between the cluster nodes using a 3rd party network analyser outputting lost packets, delays, etc.
The tools provided within PAM are rather limited for this task.
Note, in CA PAM 2.8.2 the Cluster feature will be enhanced and will provide an asynchronous replication mechanism which is tolerant of networking issues.
Both nodes reside on the same VMware installation (there's only 1 anyway). I'm going to experiment with a third node, moving it to different hosts and storages to see if I get different results.
Right now all arrows point to VMware itself. Sadly we don't have anything to analyse network performance, so all I have is logs and "Tools" menu.
Thank you all for you suggestions.