Hi Team.
Our customer has the following questions.
I would appreciate some advice on how to solve this question.
[Product]
OS:RHEL7.2
Spectrum 22.2.6
[Issue]
Currently, the customer is using 22.2.6 in their environment, but because the CROBA certificate has expired,
there is a concern that they will not be able to restart until they can upgrade,
so we asked the customer to take action by referring to the following knowledge.
https://knowledge.broadcom.com/external/article/271114/cannot-start-corba-applications-with-err.html
Customer configuration
SS server
Landscape (0x1000000)
Primary: ss01e (MLS)
Secondary: ss01w
Landscape (0x2000000)
Primary: ss02w
Secondary: ss02e
Landscape (0x3000000)
Primary: ss03e
Secondary: ss03w
OC Server
Primary: oc01e
Secondary: oc01w
Since there is no secondary_polling entry in the .vnmrc of the secondary SS server,
it is recognized as a warm configuration.
We have received reports from customers that when they operated ss03e using the following procedure,
the system did not switch over to the secondary server seamlessly,
resulting in both systems going down.
--- Customer work ---
1. Stop the ss03e process and OS. (At this point, ss03w is promoted to the active system without any problems)
2. Start the ss03e OS and process.
3. Reflect the ss03e CORBA settings (set true to false in the configuration).
4. Start the ss03e process.
5. After the settings are reflected, stop the ss03e process to restart ss03e again just to be sure.
* ss03w is not promoted to the active system and monitoring of both systems goes down.
6. Start the ss03e process to resolve the disconnection of both systems.
7. Confirm that ss03w has returned to monitoring. (Recovered naturally while ss03e was running)
* Service recovered at this point (hazard state)
8. ss03e process started (hazard resolved)
----------------------
[Questions]
Q1
In the VNM.OUT of ss03w, there is a note like the following that says there is no response from CORBA.
Could this be the cause of promotion failure?
*/* **:**:** ERROR TRACE at CsVNMCorbaMgr.cc(151): SpectroSERVER unresponsive: Pausing processsing of CORBA requests.
*/* **:**:** ERROR TRACE at CsVNMCorbaMgr.cc(135): SpectroSERVER recovered: Resuming processsing of CORBA requests
Q2
What are the conditions for promoting a standby system in ss?
For example, the standby system periodically sends heartbeats, and if it stops responding
for a certain period of time it is promoted to the active system. How does this happen?
I would like to confirm the actions that the secondary takes when it detects that
the primary is down and is promoted, such as the port it is using or whether it sends a hello packet.
Q3
Is there a flag or log that indicates the secondary SS server is ready for promotion?
*How can I know in advance when it is ready for promotion?
-> Is it "How to Monitor the Secondary SpectroSERVER Status"?
https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/spectrum/22-2/administrating/distributed-spectroserver-administration/how-to-monitor-the-secondary-spectroserver-status.html
Thanks,