Symantec Privileged Access Management

 View Only
  • 1.  CA PAM High Availability

    Posted Jan 15, 2020 03:07 AM
    Hi Team,

    We have configured PAM cluster using two hardware appliances running v3.3. Load balancing between the nodes are working as expected, but if a primary node goes down we loose connectivity to PAM using VIP. Also even if i login the secondary node, i have to remove this node from cluster then only i can login target devices without any issues. Is this the HA behavior of the PAM or do i have to tweak any HA settings ? 

    Thanks,
    Vijay


  • 2.  RE: CA PAM High Availability

    Broadcom Employee
    Posted Jan 15, 2020 08:52 PM
    If the primary node which is managing the load balancing VIP is unreachable, the next available member in the list takes up that responsibility automatically.
    Do you see any error message in the cluster logs or session logs specifically on VIP reassignment?


  • 3.  RE: CA PAM High Availability

    Posted Jan 15, 2020 10:00 PM
    Hi,

    If i am doing a proper reboot of primary appliance then the other node take over. But if there is any power failure to primary node then it is not automatic. I have raised support case for this issue as well. Just raised this question in community to check anyone else also facing this issue.

    Thanks,


  • 4.  RE: CA PAM High Availability

    Posted Jan 16, 2020 01:04 AM

    We faced a similar problem in last two days; initially we supposed a network problem because some client can connect with the vip and some other no.

    After a complete network analisys we found that for some connections the vip was in the primary node, for some
    other the vip "redirect "/was on the secondary. 
    Trying to  connect directly on both nodes works with all clients.

    We also have 2 nodes in single site but on esx enviroment 




  • 5.  RE: CA PAM High Availability
    Best Answer

    Broadcom Employee
    Posted Jan 16, 2020 09:40 AM
    Edited by Christopher Hackett Jan 21, 2020 06:00 PM
    Vijay,

    PAM 3.3 completely changed the way clustering worked.

    A two node cluster is not HA.  In 3.3 and up, the only way the cluster remains operational after a node failure, is if a majority of nodes in that cluster are still able to communicate (a quorum).  Since a two node cluster cannot have a majority after a single node failure, there is no quorum and the database on all nodes will lock and prevent any user access.

    If you need single node failure tolerance, you must have at least 3 nodes in the site.  With three nodes, any single node failure will leave two nodes communicating, two nodes is a majority so those nodes will know without a doubt that they are safe to proceed.

    Imagine this scenario on your two node cluster if quorum loss protection wasn't used:

    • Node A has the VIP. 
    • There is a switch failure and that prevents Node A and B from communicating with each other. 
    • Both nodes believe that the other node went down. 
    • Node A, keeps the VIP, while Node B thinks Node A failed so it takes the VIP.  Now you have two nodes, both using the VIP. 
    • Worse, you have two nodes believing they are primary and rotating credentials and updating their local database.
    In this scenario... what happens when connectivity is resolved.  Which database do you trust?

    I hope the above scenario illustrates why 3.3 implemented quorum loss protection, and why your node failure is behaving the way it does.  Here are some references: 

    https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/layer7-privileged-access-management/privileged-access-manager/3-3/deploying/set-up-a-cluster/cluster-synchronization-promotion-and-recovery/primary-site-fault-tolerance.html

    https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/layer7-privileged-access-management/privileged-access-manager/3-3/deploying/set-up-a-cluster/cluster-synchronization-promotion-and-recovery.html#concept.dita_f58a3e782c2b3c5b616dff2be1c3b751b73c276b_PrimarySiteRecovery




  • 6.  RE: CA PAM High Availability

    Broadcom Employee
    Posted Jan 17, 2020 10:30 AM
    I was just about to post that same link to the Primary Site Fault Tolerance page.  It clearly recommends 3 nodes in the primary site to ensure Fault Tolerance of the Primary site.  I saw the same behavior in my test cluster, which consists of 2 nodes.  PAM is working as designed.

    ------------------------------
    Principal Support Engineer
    Broadcom
    ------------------------------