VMware Aria

 View Only

 VMware Aria Operations Unable to Activate Continuous Availability

dcarey3's profile image
dcarey3 posted May 08, 2024 08:19 AM

Any assistance here would be much appreciated. I am running a fresh installation of Aria Ops 8.16.1 and trying to activate CA. The network comprises 3 regions. In region A, I have the Primary node and 2 data nodes. In region B, I have 3 data nodes. In region C, I have the witness node. In this state, the cluster is fully configured, collecting data and should be ready to activate CA. However, when I hit the button and apportion nodes to fault domains, all I get is an error message, "Failed to activate".

If I knew which log file to inspect, that might help to diagnose the issue.

stingray751's profile image
stingray751

I had a similar problem, although I was getting a "waiting for analytics" error on all nodes when running 8.17.1 and enabling CA.  I thought it was likely to be firewall related, but as you say, the noisy log files make it quite difficult to diagnose.  I have a smaller setup with a primary and data node in region A, primary replica and data node in region B and a witness in region C. 

TL:DR: The primary replica in region B starts life as a data node (see Adding Continuous Availability) and is then "promoted" to a primary replica.  This means that, from a firewall rule standpoint, it also needs ports TCP/5433 and TCP+UDP/20002-20010 to be opened to the primary node in region A.

Long answer: Trying to build all servers at installation time gave the "waiting for analytics" message on every host, the cluster in deadlock.  I then built just the hosts in region A, and it was successful. Then tried adding the primary replica into region B and it failed.  So rebuilt the cluster again, and then added the data node into Region B.  That worked.  So the problem was with the primary replica in region B.  SSH'ing into this host and running netstat -tan showed it was trying to communicate with the primary in region A over port 5433.  That's odd, as according to the documentation only the data nodes need this port.

Adding Continuous Availability (vmware.com) says that "The data node becomes the replica node and is assigned to Fault Domain 2". Therefore the proposed replica node needs to have both data node ports and replica node ports opened. This is not at all clear in the VMware documentation!