DX NetOps

View Only

Conditions for a secondary SS server to be promoted to a primary

MARUBUN SUPPORT posted Apr 23, 2025 04:20 AM

Hi Team.

Our customer has the following questions.
I would appreciate some advice on how to solve this question.

[Product]
OS：RHEL7.2
Spectrum 22.2.6

[Issue]

Currently, the customer is using 22.2.6 in their environment, but because the CROBA certificate has expired,
there is a concern that they will not be able to restart until they can upgrade,
so we asked the customer to take action by referring to the following knowledge.

https://knowledge.broadcom.com/external/article/271114/cannot-start-corba-applications-with-err.html

Customer configuration
SS server
Landscape (0x1000000)
Primary: ss01e (MLS)
Secondary: ss01w

Landscape (0x2000000)
Primary: ss02w
Secondary: ss02e

Landscape (0x3000000)
Primary: ss03e
Secondary: ss03w

OC Server
Primary: oc01e
Secondary: oc01w

Since there is no secondary_polling entry in the .vnmrc of the secondary SS server,
it is recognized as a warm configuration.

We have received reports from customers that when they operated ss03e using the following procedure,
the system did not switch over to the secondary server seamlessly,
resulting in both systems going down.

--- Customer work ---
1. Stop the ss03e process and OS. (At this point, ss03w is promoted to the active system without any problems)
2. Start the ss03e OS and process.
3. Reflect the ss03e CORBA settings (set true to false in the configuration).
4. Start the ss03e process.
5. After the settings are reflected, stop the ss03e process to restart ss03e again just to be sure.
* ss03w is not promoted to the active system and monitoring of both systems goes down.
6. Start the ss03e process to resolve the disconnection of both systems.
7. Confirm that ss03w has returned to monitoring. (Recovered naturally while ss03e was running)
* Service recovered at this point (hazard state)
8. ss03e process started (hazard resolved)
----------------------

[Questions]
Q1
In the VNM.OUT of ss03w, there is a note like the following that says there is no response from CORBA.
Could this be the cause of promotion failure?

*/* **:**:** ERROR TRACE at CsVNMCorbaMgr.cc(151): SpectroSERVER unresponsive: Pausing processsing of CORBA requests.
*/* **:**:** ERROR TRACE at CsVNMCorbaMgr.cc(135): SpectroSERVER recovered: Resuming processsing of CORBA requests

Q2
What are the conditions for promoting a standby system in ss?
For example, the standby system periodically sends heartbeats, and if it stops responding
for a certain period of time it is promoted to the active system. How does this happen?
I would like to confirm the actions that the secondary takes when it detects that
the primary is down and is promoted, such as the port it is using or whether it sends a hello packet.

Q3
Is there a flag or log that indicates the secondary SS server is ready for promotion?
*How can I know in advance when it is ready for promotion?

-> Is it "How to Monitor the Secondary SpectroSERVER Status"?
https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/spectrum/22-2/administrating/distributed-spectroserver-administration/how-to-monitor-the-secondary-spectroserver-status.html

Thanks,

MARUBUN SUPPORT posted Apr 28, 2025 04:00 AM

Hi everyone,

Could someone please give me some advice on my question?
I would be very grateful for any advice.

Thanks,

Catalin Farcasanu posted Apr 28, 2025 04:51 AM

Q1:Ask support. Open a case describing the situation.

Q2: FT is described in the documentation. You can have several levels of precedence in a DSS. The one that have the lowest value and it is running is the active one at some point. You need to have all relevant ports opened in the firewalls (if any between the 2 SS). They are all mentioned in the documentation.

Q3: Once the FT is established, whenever the FT server is not available, an alarm is created in the OneClick.

MARUBUN SUPPORT posted Apr 30, 2025 03:30 AM

Hi Catalin,

Thank you for your answer.

> Q3: Once the FT is established, whenever the FT server is not available, an alarm is created in the OneClick.

Is it correct to understand that an FT Server is a server that belongs to Fault Tolerance?
Also, are there any flags or logs that indicate that Fault Tolerance has been established?

Thanks,

Catalin Farcasanu posted Apr 30, 2025 05:09 AM

The MapUpdate utility provides this information.

These are aspects that are described in the Course DX NetOps 22.2.x: Spectrum Foundations 200. I highly recommend the entire courses path for both Spectrum and PM.

MARUBUN SUPPORT posted May 30, 2025 05:30 AM

In the fault tolerance of SpectroSERVER, if the port used by CORBA becomes unresponsive, will the server be changed to the secondary server?

Thanks,

Catalin Farcasanu posted May 30, 2025 06:03 AM

What would be the reason behind this question? It's not like you have half FT on the system. You have all ports open then you get FT. If some ports are missing, no FT working correctly.

MARUBUN SUPPORT posted Jun 02, 2025 05:35 AM

End users are interested in knowing the conditions for switching to a secondary.
Is the condition for the secondary server to be promoted when the CORBA port used by SS becomes unresponsive?

Thanks,

Broadcom Employee Todd Kornely posted Jun 03, 2025 08:46 AM

There are two parts to SpectroSERVER fault-tolerance, and each are independent from each other:

The secondary SS polls the primary SS every minute. If the polling determines the primary is down, the secondary starts SNMP polling, processing SNMP traps, etc, and essentially takes over the role of the primary SS. If a network issue is preventing the polling, both SpectroSERVER's can be 'active'.
The Oneclick Server polls both the primary and the backup SpectroSERVER, via CORBA, every 10s and a 'deeper' poll every 60s. If the polling determines the primary is down, OneClick will failover and connect to the secondary. It is possible, if a network issue is blocking the communication, that OneClick server will failover to a secondary that is not 'active' which will display as a grey/suppressed VNM icon.

-Todd

MARUBUN SUPPORT posted Jun 04, 2025 02:15 AM

Thanks,

I'll check just to be sure.
Am I correct in understanding that the reason for determining that the primary SS is down is not just because there is no response from the port, but also because of the content of the communication?

Broadcom Employee Todd Kornely posted Jun 04, 2025 08:53 AM

Yes, when I say polling I am referring to API calls. The SpectroSERVER needs to return a valid response or will be considered down.

MARUBUN SUPPORT posted Jun 12, 2025 03:38 AM

Hi everyone,

about Q3

A question from a customer was, "Is there a way to check using status or logs whether the secondary server is running normally and is ready to be switched over to if the primary goes down?"
*Although you can confirm that the switch has occurred in the Java console or in "Landscape" on the OneClick web management screen, you cannot know the status of the secondary server in advance.

The command you provided in your answer returns the same result whether the SpectroSERVER process is started or stopped, so it cannot confirm the status, but appears to be a command to check the settings.
We would appreciate it if you could tell us if there are any other ways to check.
If there is no way to check the logs or status, the customer's question is to know the conditions under which there is no problem, for example, if a certain process is running.

Thanks,

Raj.A posted Jun 12, 2025 03:57 AM

You can check the secondary status through OneClick web UI.

Administration --> Landscapes.

Regards,

Raj

MARUBUN SUPPORT posted Jun 12, 2025 04:57 AM

> Administration --> Landscapes.

If the status is "Ready", does that mean the secondary server is ready to be switched over?

Catalin Farcasanu posted Jun 12, 2025 05:06 AM

I think it's obvious that the condition you want to check first, is the existence/non-existence of the "CONTACT LOST TO SECONDARY SPECTROSERVER" alarm in OC. You can have an email sent, whenever the alarm occurs. This what the system checks for itself.

A running process does not necessarily mean that the process is doing what is supposed to be doing.

The SS process is called SpectroSERVER/SpectroSERVER.exe. Process Daemon is responsible for keeping up running all Spectrum required processes. These are described in the documentation and also in the 200 Administration course that is available.