Imagine a setup with PAM installed on ServerA.
I reconfigure it to become a cluster and ServerA becomes node1.
I add ServerB, Server C, Server D, Server E and Server F respectively known as node2, node3, node4, node5, node6.
Now if i remove node1, node2 and node3 from the said cluster, is everything will continue to work properly?
Based on our last discussion and what MWNiebuhr said about Pam clusters here Work assignment logic in a cluster,
I believe that when you first start the service of the first node (any of 4,5,6) it will be the "master" and everything will work as expected. But all the services on all nodes should be down prior to that.
MWNiebuhr any other thoughts about this?
The only possible issues you would encounter would be if you have any external applications that connect directly with Node1 bypassing the load balancer.
Otherwise everything should continue to work as expected with node 1 off or completely gone.
I am not sure what your goals are here, but you may want to look into the new 'active / passive' node configuration we have added in 4.3 for disaster recovery situations.
My goal is :
We added 3 nodes on Windows server 2016.
I will uninstall the first 3 nodes on windows server 2008.
I will restore the database on a new SQL 2016.
Then i will have an OS migrated PAM cluster
Tested in dev and it worked like a charm. It's just that i did not tried to remove node1 since i wanted to get it back.
Excellent plan. That should work without issue.
A longer term consideration would be that there will still be references to the 3 old servers in the database, and the Orchestrator cluster will occasionally reach out to see if they are available. This shouldn't have an impact on actual performance, but I am not sure what would occur if you attempted to install a new 'node1' cluster node.
We should be able to assist with clearing the old 2008 nodes from the database, but I would advise opening a support case when you get to that point so we can get the SE team to comment to ensure we fully clear the database and other references to those nodes.
I did delete a VM that was part of a cluster and PAM was not able to start as it was trying to reach said VM.
I had to remove several entries from the DBs but we managed it. I will open a support case if we can't get around it in prod.
So as per what you said, if the uninstallation of a node doesn't remove anything from the DB, the cluster will not start if the VM hosting the uninstalled node is deleted.
This is what happened in DEV.
So, you uninstalled 'node1' then deleted the entire VM node that was hosting that node; after that the other nodes would not start? That does not seem right and I would suggest you open a support case so we can investigate this before you get to the production environment.
Correction, i did not uninstalled it.
The VM was deleted at run-time
But you said that even after the uninstallation, there would still be refenreces in the DB. So my concerns are that we may encounter the exact same behavior. So i will setup a cluster in my sandbox and test this.
That is very weird.
We regularly test with nodes shut down; and in internal lab clusters we have what would be similar behavior, sometimes the VM lab environment is restarted and one or another of a cluster nodes server does not start up.
One of the nodes server server not being available does not impact the other nodes ability to start unless EEM or the backend database are housed on the server that didn't start properly.
There were errors in the log saying that the host XXXX is unreachable.
So we deleted the ActiveMQ tables associated to the node.
We updated the [ClusterNodeProperties] column in [dbo].[Orchestrator] to remove a <ClusterNodeProperties> section.
And i think this is it, i do not remember perfectly, we did it on the rush just to get things up again.
Weird, The 'host unreachable' message is normal and would be expected if the node is either shut down or otherwise unavailable, but that should not prevent other nodes from starting.
This is exactly what happened.
Try to delete a VM that is part of a running cluster and see.
Do you have a window of time to do this?
Why dont you just create a new cluster with only node4, node5, node6?
Does passwords in dataset a big issue
Because i want my 600 waiting process to continue working
And apparently, PAM "may not" be able to get process from another cluster to work again.
Stop all of your running orchestrators.
Drop the dbo.properties table from the runtime database
Restart the nodes, one at a time, that you wish to remain in the cluster.
The properties table will get recreated with the data for the nodes that are active in the cluster, and will no longer have the data for the nodes that have been removed.
And what happens with the [ClusterNodeProperties] column in [dbo].[Orchestrator] ?
Do we need to remove the reference to the old node in <ClusterNodeProperties> section?
well, who calls pam? catalog? pam is just the orchestrator, right? So you just have to change this reference, like, if it's catalog, add the new VIP cluster to catalog and change from default to new pam cluster.
So when the 600 running process you your old cluster die, the old cluster can die.
This is how I usually do this things, never had a problem.
And what happen if a process is waiting for catalog to complete a pending action? Catalog will send the "continue" call to the wrong cluster ?
This is a problem in general with catalog and even worst with catalog in cluster.
For some reason it can get lost when waiting on pending action.
This is totally of topic, but the way I deal with this is I have a custom operator that checks if a requestItem is on a desired status continuously.
Something like this:
This is my default approach to catalog > pam interaction during pendingAction.
Okay so to summarize,
I've just added a third node in my DEV cluster named node10.
Shutdown everything, uninstalled node10. Drop properties table.
Restart node1 and node2. Everything works fine.
But there is still reference to my third node in the Orchestrator table.
<ClusterNodeProperties> <UUID>50b5fb4b-c09e-44aa-899b-df26a543f4d0</UUID> <NodeID>node10</NodeID><HostName>*********.domain.local</HostName> <IPAddress>***.***.***.***</IPAddress> <TransportPort>0</TransportPort> <CommsV2Port>1111</CommsV2Port> <ClusterNodeID>6b516e2f-6331-46ec-830d-e01cba584d0b</ClusterNodeID> <securePort>1111</securePort> <unSecurePort>1111</unSecurePort> <IsNodeUp>true</IsNodeUp></ClusterNodeProperties>
Should i remove it?
MWNiebuhr Jennifer_Jessup any idea?
Should i remove the infos from the Orchestrator table referencing my old node?
Yes, but as I said you need to stop all of your orchestrators and drop that properties table.
Once you restart your first orchestrator, the table will be recreated and populated with that orchestrator information.
Then as you bring up each of your orchestrators, the info for each one will be populated in that table.