Vijay,
PAM 3.3 completely changed the way clustering worked.
A two node cluster is not HA. In 3.3 and up, the only way the cluster remains operational after a node failure, is if a majority of nodes in that cluster are still able to communicate (a quorum). Since a two node cluster cannot have a majority after a single node failure, there is no quorum and the database on all nodes will lock and prevent any user access.
If you need single node failure tolerance, you must have at least 3 nodes in the site. With three nodes, any single node failure will leave two nodes communicating, two nodes is a majority so those nodes will know without a doubt that they are safe to proceed.
Imagine this scenario on your two node cluster if quorum loss protection wasn't used:
- Node A has the VIP.
- There is a switch failure and that prevents Node A and B from communicating with each other.
- Both nodes believe that the other node went down.
- Node A, keeps the VIP, while Node B thinks Node A failed so it takes the VIP. Now you have two nodes, both using the VIP.
- Worse, you have two nodes believing they are primary and rotating credentials and updating their local database.
In this scenario... what happens when connectivity is resolved. Which database do you trust?
I hope the above scenario illustrates why 3.3 implemented quorum loss protection, and why your node failure is behaving the way it does. Here are some references:
https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/layer7-privileged-access-management/privileged-access-manager/3-3/deploying/set-up-a-cluster/cluster-synchronization-promotion-and-recovery/primary-site-fault-tolerance.html
https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/layer7-privileged-access-management/privileged-access-manager/3-3/deploying/set-up-a-cluster/cluster-synchronization-promotion-and-recovery.html#concept.dita_f58a3e782c2b3c5b616dff2be1c3b751b73c276b_PrimarySiteRecovery
Original Message:
Sent: 01-15-2020 03:07 AM
From: vijayakumarc chandrasekaran
Subject: CA PAM High Availability
Hi Team,
We have configured PAM cluster using two hardware appliances running v3.3. Load balancing between the nodes are working as expected, but if a primary node goes down we loose connectivity to PAM using VIP. Also even if i login the secondary node, i have to remove this node from cluster then only i can login target devices without any issues. Is this the HA behavior of the PAM or do i have to tweak any HA settings ?
Thanks,
Vijay