Hello,
I updated the queues as you clarified, thanks,
However I have 2 problems now:
1) very often the primary is up but the secondary thinks it is down
2) If I shutdown the primary I see that the HA takes place as expected but then HA cannot activate the queues, start probes etc.
Here follow a log where I put just 2 probes:
Jul 23 11:06:07:551 0 HA: WARN: FAILOVER: Failed to contact primary hub '/APCEDNAAPP037_domain/APCEDNAAPP037_hub/APCEDNAAPP037/hub': communication error. Issuing state change.
Jul 23 11:06:10:583 0 HA: ERROR: Failed to activate queue 'action_manager'. rc: 1, error: error
Jul 23 11:06:13:606 0 HA: ERROR: Failed to activate queue 'alarm_manager'. rc: 1, error: error
Jul 23 11:06:16:629 0 HA: ERROR: Failed to activate queue 'baseline_engine.BASELINE_CONFIG'. rc: 1, error: error
Jul 23 11:06:19:662 0 HA: ERROR: Failed to activate queue 'data_engine'. rc: 1, error: error
Jul 23 11:06:22:706 0 HA: ERROR: Failed to activate queue 'ems'. rc: 1, error: error
Jul 23 11:06:25:751 0 HA: ERROR: Failed to activate queue 'event_manager'. rc: 1, error: error
Jul 23 11:06:28:793 0 HA: ERROR: Failed to activate queue 'legacy_alarm_manager'. rc: 1, error: error
Jul 23 11:06:31:841 0 HA: ERROR: Failed to activate queue 'prediction_engine.PREDICTION_CONFIG'. rc: 1, error: error
Jul 23 11:06:34:874 0 HA: ERROR: Failed to activate queue 'probeDiscovery'. rc: 1, error: error
Jul 23 11:06:37:916 0 HA: ERROR: Failed to activate queue 'tot_rule_config'. rc: 1, error: error
Jul 23 11:06:40:954 0 HA: ERROR: Failed to activate queue 'udm_inventory'. rc: 1, error: error
Jul 23 11:06:45:962 0 HA: ERROR: queue 'action_manager' expected to be in activate state but is not.
Jul 23 11:06:45:962 0 HA: ERROR: queue 'alarm_manager' expected to be in activate state but is not.
Jul 23 11:06:45:962 0 HA: ERROR: queue 'baseline_engine.BASELINE_CONFIG' expected to be in activate state but is not.
Jul 23 11:06:45:962 0 HA: ERROR: queue 'data_engine' expected to be in activate state but is not.
Jul 23 11:06:45:962 0 HA: ERROR: queue 'ems' expected to be in activate state but is not.
Jul 23 11:06:45:962 0 HA: ERROR: queue 'event_manager' expected to be in activate state but is not.
Jul 23 11:06:45:962 0 HA: ERROR: queue 'legacy_alarm_manager' expected to be in activate state but is not.
Jul 23 11:06:45:962 0 HA: ERROR: queue 'prediction_engine.PREDICTION_CONFIG' expected to be in activate state but is not.
Jul 23 11:06:45:962 0 HA: ERROR: queue 'probeDiscovery' expected to be in activate state but is not.
Jul 23 11:06:45:962 0 HA: ERROR: queue 'tot_rule_config' expected to be in activate state but is not.
Jul 23 11:06:52:007 0 HA: ERROR: Failed to send request to controller to update nas auto_operator/setup 'active' key to 'yes' value. rc: 6, error: permission denied
Jul 23 11:06:52:007 0 HA: ERROR: Failed to change state of nas-ao
Jul 23 11:06:57:015 0 HA: ERROR: probe_activate request for data_engine returned 6 (permission denied)
Jul 23 11:06:57:015 0 HA: ERROR: probe_activate request for ace returned 6 (permission denied)
Any idea?
Thanks, Pino
PS: user and password used during the installation phase are the same and certificate.pem was copied from primary.
Original Message:
Sent: 07-23-2020 02:51 AM
From: Giuseppe Venturella
Subject: HA Configuration - Primary Hub GET queue
Gregg, Luc.
Thanks. I will try and let you know.
BR, Pino
Original Message:
Sent: 07-22-2020 09:06 AM
From: GREGG STILLWELL
Subject: HA Configuration - Primary Hub GET queue
Hi Pino,
The "GET QoS" on the primary should have subject of the 3 QOS (message,def,baseline), not "data_engine". This queue attachs to the HA Hub's attach queue for QoS during normal operation. Secondary_QOS attach queue connects to it, on my screen shot of Secondary queues usually do the HA probe/queue activation at a bear minimum as it should be a temporary alarm bridge. ALso have AC and package page working on Secondary hub with wasp up. Anyway here is a queue config from a successful HA for 8.51. This doc is the latest I found on the portal:
https://knowledge.broadcom.com/external/article?articleId=35432
Review Luc's doc as well and hopefully it will come together.
Original Message:
Sent: 07-22-2020 04:23 AM
From: Luc Christiaens
Subject: HA Configuration - Primary Hub GET queue
Here is the latest version of that doc you referred to.
You need, on the secondary hub, to keep the queue: data_engine active, but the probe: data_engine inactive
Original Message:
Sent: 07-22-2020 04:18 AM
From: Luc Christiaens
Subject: HA Configuration - Primary Hub GET queue
When you read "Define a QoS GET queue on the Primary to get the data_engine queue of the Secondary core hub. (so that you have the QoS entries of the active probes on your Secondary core hub)":
This means that you need to define a GET queue on your Primary hub to get the "qos" queue from your secondary hub (the queue that is normally GET by the inactive data_engine on the secondary hub)
Original Message:
Sent: 07-22-2020 03:50 AM
From: Giuseppe Venturella
Subject: HA Configuration - Primary Hub GET queue
Thanks Gregg,
I'm using UIM 20.1.
Yesterday I tried and this is what I thought: the data_engine queue has NOT to be deactivated on the secondary.
In such a case I can define a queue on the primary to get the data from data_engine on the secondary as shown in this picture:
Original Message:
Sent: 07-21-2020 12:53 PM
From: GREGG STILLWELL
Subject: HA Configuration - Primary Hub GET queue
HI Giuseppe,
What Version?
I thought it was an attach queue. Anyway I cant remember when I do new implementations so I make a document for them. This is a screen shot of one for said probes.. Here is a screen shot of queues I need to make or get made by the probes on the secondary/HA Hub server.
after you deploy DE on HA Hub you should start and test login. Is there a queue after your test??
Original Message:
Sent: 07-21-2020 03:45 AM
From: Giuseppe Venturella
Subject: HA Configuration - Primary Hub GET queue
Hello,
I am trying to setup HA as described in "AIOps - DX Infrastructure Manager - High Availability (HA) setup guide.pdf" but I'm blocked at page 15 at the following step:
Primary Hub GET Queues
o Define QOS 'GET' queues on the Primary hub to 'get' the data_engine and probe_discovery ATTACH queue messages from the Secondary hub (so that you get the QoS messages from any/all monitoring probes that are deployed on your Secondary hub as well as any QOS generated by Robots that belong to the Secondary hub.
If I try to create a new queue on the primary I can connect to the secondary hub but data_engine was disabled on the secondary. If I enable data_engine on the secondary I can define the queue on the primary, but then should I disable data_engine again on the primary?
What about the "probe_discovery" queue?
Could someone kindly explain this step for a beginner like me?
Thanks, Pino