Symantec Privileged Access Management

 View Only
Expand all | Collapse all

Cluster Synchronization - Is there a tombstone age?

  • 1.  Cluster Synchronization - Is there a tombstone age?

    Posted Jun 24, 2019 09:51 AM
    I remember being on a call with product management about a year ago, in which the PM mentioned a (sort-of) tombstone age in relation to x-site synchronization.

    That is to say, a period of time (or number of changes to the DB) during which a secondary site cluster member can be re-synced, but after which a re-sync is not possible and therefore require a cluster restart.

    Use Case:
    CA PAM 3.2.4
    Multi Site Cluster
    One node in a secondary site falls out of sync, all other nodes in all sites are synced.

    Is there a tombstone age that, once passed, would require a cluster-restart to get all nodes back in sync? Or can that secondary node stay out of sync for a very long time and be re-synced with a simple "RE-SYNC SITE MEMBER" ?

    Thanks in advance.

    ------------------------------
    Services Architect
    HCL Technologies Ltd
    ------------------------------


  • 2.  RE: Cluster Synchronization - Is there a tombstone age?

    Posted Jun 24, 2019 02:18 PM
    I think i've found it: https://docops.ca.com/ca-privileged-access-manager/3-2-4/en/deploying/set-up-a-cluster/

    Secondary members can "self-heal" after being disconnected. Up to a configurable number of missed transactions, members download data until they catch up. This threshold defaults to 10,000 transactions, above which the member requires resyncing.

    I believe this is true for 3.24 & 3.2.5, however the 3.3 Multi Site Cluster and Secondary Sites Documentation doesn't explicitly mention this threshold.

    ------------------------------
    Services Architect
    HCL Technologies Ltd
    ------------------------------



  • 3.  RE: Cluster Synchronization - Is there a tombstone age?

    Broadcom Employee
    Posted Jun 25, 2019 03:44 AM

    Hello Sebastiano,

     

    Unlike earlier versions of PAM, r3.3 and newer is using mySQL asynchronous group replication (with single primary) mechanism which guarantees delivery.

    Once a member (re)joins the cluster it will receive all previous updates while it was unavailable.

     

    Note, due to the new method in 3.3 it is highly recommended to have at least three nodes configured in the primary Cluster Site.

    (else the possibility to run into quorum loss mode is given which then basically renders the complete PAM Cluster being unavailable)

     

    For further details about mySQL asynchronous group replication please see its documentation

    https://dev.mysql.com/doc/refman/8.0/en/group-replication.html

     

    Best Regards,

    Andreas

     






  • 4.  RE: Cluster Synchronization - Is there a tombstone age?

    Posted Jun 25, 2019 09:27 AM
    Thank you Andreas.

    Can you confirm that my understanding of this passage is correct for 3.2.4 & 3.2.5?​

    Secondary members can "self-heal" after being disconnected. Up to a configurable number of missed transactions, members download data until they catch up. This threshold defaults to 10,000 transactions, above which the member requires resyncing.

    Is this saying that a cluster restart is required to resync a secondary node that is > 10K transactions behind?

    Or is this really saying that 

    We should only need to perform a "RE-SYNC SITE MEMBER" to synchronize the secondary node (Which would then suggest, that a secondary node, can stay out of sync for an indefinite amount of time and be "RE-SYNC'ed" whenever)?

    Finally, the blurb says that it's a configurable threshold. In which cases would it be advisable to increase / decrease that threshold ?

    Much obliged.​

    ------------------------------
    Services Architect
    HCL Technologies Ltd
    ------------------------------



  • 5.  RE: Cluster Synchronization - Is there a tombstone age?
    Best Answer

    Broadcom Employee
    Posted Jun 25, 2019 10:02 AM
    ​The configurable transaction limit of 10000 is for automatic self-healing. If a node falls behind due to temporary network problems, usage spikes etc, it can catch up as long as it is behind by less than the limit. Once the limit is exceeded the CM database of this node becomes inactive. A node re-sync was meant to always work, independent of the current state of the local database, because it involves a download of the current database from the master node. However, there is a known problem in recent 3.2.X release where this does not work. We have a defect open with PAM Engineering to get it fixed. For now a full cluster restart is required to get the node back in sync.


  • 6.  RE: Cluster Synchronization - Is there a tombstone age?

    Posted Jun 25, 2019 10:09 AM
    ​Cheers Ralf.

    I was suspecting that.

    Thanks

    ------------------------------
    Services Architect
    HCL Technologies Ltd
    ------------------------------



  • 7.  RE: Cluster Synchronization - Is there a tombstone age?

    Posted Oct 04, 2019 03:24 AM
    Hi Ralf,

    1. Has this issue ("However, there is a known problem in recent 3.2.X release where this does not work. We have a defect open with PAM Engineering to get it fixed. For now a full cluster restart is required to get the node back in sync") been fixed yet by PAM Engineering? If Yes, in which version of 3.2.x has this been fixed?

    2. There is another issue about secondary site node being out of sync, which was reported for v3.2.4 - ""Clustering keeps getting out-of-sync"" (https://community.broadcom.com/enterprisesoftware/communities/community-home/digestviewer/viewthread?GroupId=1501&MID=805277&CommunityKey=3e91a086-c7b2-4bd0-9f8d-3493ed834111&tab=digestviewer#bm4d769b52-2a6e-48c6-bdcf-208e2a92732d)

    From the release notes, I can see that, there are a number of clustering related issues which have been fixed in v3.2.5 and v3.2.6. Is this issue resolution included in the fixes?



    The reason I am keen to know is because, our client is on PAM v3.2.4, which seems to be affected by both the aforementioned issues. We are considering upgrading to v3.2.6 so that we can implement multi-site DR i.e. primary site with 2 physical appliances and secondary site with one soft appliance. But we shall go ahead with this only if there is no known/ existing issue with clustering. 

    If we upgrade to v3.3, that would require an additional physical appliance for the Primary site and this is something which we would like to avoid.

    Thanks!


  • 8.  RE: Cluster Synchronization - Is there a tombstone age?

    Broadcom Employee
    Posted Oct 04, 2019 01:27 PM
    Hi Sandeep, The resync problem is not fixed yet. The other thread does not discuss a specific root cause for a cluster sync problem, it's not possible to say whether whatever was observed there is fixed in 3.2.6 or not.


  • 9.  RE: Cluster Synchronization - Is there a tombstone age?

    Posted Oct 07, 2019 05:48 AM
    Hi Ralf, Thanks for your response on status of the issues i.e. resync problem is not fixed yet. 

    Given that there are known issues with clustering (multi-site cluster for DR), some of which might still be existing in v3.2.6, would it be appropriate to upgrade to v3.3 as the clustering capabilities have been revamped quite a fair bit? Hope these issues do not affect v3.3 and clustering is stable in this release.

    Regards


  • 10.  RE: Cluster Synchronization - Is there a tombstone age?

    Broadcom Employee
    Posted Jun 25, 2019 03:44 AM
    Hello Sebastiano,

     

    Unlike earlier versions of PAM, r3.3 and newer is using mySQL asynchronous group replication (with single primary) mechanism which guarantees delivery.

    Once a member (re)joins the cluster it will receive all previous updates while it was unavailable.

     

    Note, due to the new method in 3.3 it is highly recommended to have at least three nodes configured in the primary Cluster Site.

    (else the possibility to run into quorum loss mode is given which then basically renders the complete PAM Cluster being unavailable)

     

    For further details about mySQL asynchronous group replication please see its documentation

    https://dev.mysql.com/doc/refman/8.0/en/group-replication.html

     

    Best Regards,

    Andreas


  • 11.  RE: Cluster Synchronization - Is there a tombstone age?

    Broadcom Employee
    Posted Jun 25, 2019 03:45 AM

    Hello Sebastiano,

     

    Unlike earlier versions of PAM, r3.3 and newer is using mySQL asynchronous group replication (with single primary) mechanism which guarantees delivery.

    Once a member (re)joins the cluster it will receive all previous updates while it was unavailable.

     

    Note, due to the new method in 3.3 it is highly recommended to have at least three nodes configured in the primary Cluster Site.

    (else the possibility to run into quorum loss mode is given which then basically renders the complete PAM Cluster being unavailable)

     

    For further details about mySQL asynchronous group replication please see its documentation

    https://dev.mysql.com/doc/refman/8.0/en/group-replication.html

     

    Best Regards,

    Andreas




  • 12.  RE: Cluster Synchronization - Is there a tombstone age?

    Posted Jun 26, 2019 09:38 AM
    Hi team

    We currently have a cluster configuration in which two sites are configured, each site contains two nodes, if launch 3.3 is applied. Since at the moment it is not viable at the moment to have for each site 3 nodes, that the impact could have on the solution, we could apply this release?


    Julian Riaño
    MSL



  • 13.  RE: Cluster Synchronization - Is there a tombstone age?

    Broadcom Employee
    Posted Jun 27, 2019 07:29 AM
    Hello Julian,

    The suggestion to have at least 3 nodes only applies to a Multi-Master / Primary site.

    A secondary site with only 2 nodes is perfectly fine.

    Regards,
    Andreas