DX Unified Infrastructure Management

 View Only
Expand all | Collapse all

HA Configuration - Primary Hub GET queue

  • 1.  HA Configuration - Primary Hub GET queue

    Posted Jul 21, 2020 10:59 AM
    Hello,

    I am trying to setup HA as described in "AIOps - DX Infrastructure Manager - High Availability (HA) setup guide.pdf" but I'm blocked at page 15 at the following step:

    Primary Hub GET Queues

    o Define QOS 'GET' queues on the Primary hub to 'get' the data_engine and probe_discovery ATTACH queue messages from the Secondary hub (so that you get the QoS messages from any/all monitoring probes that are deployed on your Secondary hub as well as any QOS generated by Robots that belong to the Secondary hub.

    If I try to create a new queue on the primary I can connect to the secondary hub but data_engine was disabled on the secondary. If I enable data_engine on the secondary I can define the queue on the primary, but then should I disable data_engine again on the primary?

    What about the "probe_discovery" queue?

    Could someone kindly explain this step for a beginner like me?

    Thanks, Pino


  • 2.  RE: HA Configuration - Primary Hub GET queue

    Broadcom Employee
    Posted Jul 21, 2020 12:54 PM
    HI Giuseppe,
    What Version?

    I thought it was an attach queue. Anyway I cant remember when I do new implementations so I make a document for them. This is a screen shot of one for said probes.. Here is a screen shot of queues I need to make or get made by the probes on the secondary/HA Hub server.
    after you deploy DE on HA Hub you should start and test login. Is there a queue after your test?? 





  • 3.  RE: HA Configuration - Primary Hub GET queue

    Broadcom Employee
    Posted Jul 21, 2020 01:01 PM
    Those queues should mimic what is on Primary just not active, and be activated in HA probe Config. Hope that helps. May need to do the same with ems and qos_processor queues.


  • 4.  RE: HA Configuration - Primary Hub GET queue

    Posted Jul 22, 2020 03:50 AM
    Thanks Gregg,
    I'm using UIM 20.1.

    Yesterday I tried and this is what I thought: the data_engine queue has NOT to be deactivated on the secondary.
    In such a case I can define a queue on the primary to get the data from data_engine on the secondary as shown in this picture:


    Infact in a previous release of the document (CA UIM_8.5.1_HA setup_0.11_review.docx) I see:
    • Define a QoS GET queue on the Primary to get the data_engine queue of the Secondary core hub. (so that you have the QoS entries of the active probes on your Secondary core hub)
    However the screenshots in "AIOps - DX Infrastructure Manager - High Availability (HA) setup guide.pdf" are a bit different as the data_engine queue on the secondary is deselected.

    Did I get it in the right way?

    Thanks and best regards, Pino

    PS: now I have a different problem. If I shutdown the primary I see  "INFO: state == 'HA_ACTIVATE'" but I am receiving some authorization error to start probes on the secondary. Investigating...




  • 5.  RE: HA Configuration - Primary Hub GET queue

    Posted Jul 22, 2020 04:19 AM
    When you read "Define a QoS GET queue on the Primary to get the data_engine queue of the Secondary core hub. (so that you have the QoS entries of the active probes on your Secondary core hub)":
    This means that you need to define a GET queue on your Primary hub to get the "qos" queue from your secondary hub (the queue that is normally GET by the inactive data_engine on the secondary hub)



  • 6.  RE: HA Configuration - Primary Hub GET queue

    Posted Jul 22, 2020 04:23 AM
      |   view attached
    Here is the latest version of that doc you referred to.
    You need, on the secondary hub, to keep the queue: data_engine active, but the probe: data_engine inactive

    Attachment(s)



  • 7.  RE: HA Configuration - Primary Hub GET queue

    Broadcom Employee
    Posted Jul 22, 2020 09:06 AM
    Hi Pino,
    The "GET QoS"  on the primary should have subject of the 3 QOS (message,def,baseline), not "data_engine". This queue attachs to the HA Hub's attach queue for QoS  during normal operation. Secondary_QOS attach queue connects to it, on my screen shot of Secondary queues usually do the HA probe/queue activation at a bear minimum as it should be a temporary alarm bridge. ALso have AC and package page working on Secondary hub with wasp up. Anyway here is a queue config from a successful HA for 8.51. This doc is the latest I found on the portal:
    https://knowledge.broadcom.com/external/article?articleId=35432

    Review Luc's doc as well and hopefully it will come together.



  • 8.  RE: HA Configuration - Primary Hub GET queue

    Posted Jul 23, 2020 02:51 AM
    Gregg, Luc.
    Thanks. I will try and let you know.

    BR, Pino


  • 9.  RE: HA Configuration - Primary Hub GET queue

    Posted Jul 23, 2020 05:14 AM
    Hello,
    I updated the queues as you clarified, thanks,

    However I have 2 problems now:

    1) very often the primary is up but the secondary thinks it is down

    2) If I shutdown the primary I see that the HA takes place as expected but then HA cannot activate the queues, start probes etc.

    Here follow a log where I put just 2 probes:

    Jul 23 11:06:07:551 0 HA: WARN: FAILOVER: Failed to contact primary hub '/APCEDNAAPP037_domain/APCEDNAAPP037_hub/APCEDNAAPP037/hub': communication error. Issuing state change.
    Jul 23 11:06:10:583 0 HA: ERROR: Failed to activate queue 'action_manager'. rc: 1, error: error
    Jul 23 11:06:13:606 0 HA: ERROR: Failed to activate queue 'alarm_manager'. rc: 1, error: error
    Jul 23 11:06:16:629 0 HA: ERROR: Failed to activate queue 'baseline_engine.BASELINE_CONFIG'. rc: 1, error: error
    Jul 23 11:06:19:662 0 HA: ERROR: Failed to activate queue 'data_engine'. rc: 1, error: error
    Jul 23 11:06:22:706 0 HA: ERROR: Failed to activate queue 'ems'. rc: 1, error: error
    Jul 23 11:06:25:751 0 HA: ERROR: Failed to activate queue 'event_manager'. rc: 1, error: error
    Jul 23 11:06:28:793 0 HA: ERROR: Failed to activate queue 'legacy_alarm_manager'. rc: 1, error: error
    Jul 23 11:06:31:841 0 HA: ERROR: Failed to activate queue 'prediction_engine.PREDICTION_CONFIG'. rc: 1, error: error
    Jul 23 11:06:34:874 0 HA: ERROR: Failed to activate queue 'probeDiscovery'. rc: 1, error: error
    Jul 23 11:06:37:916 0 HA: ERROR: Failed to activate queue 'tot_rule_config'. rc: 1, error: error
    Jul 23 11:06:40:954 0 HA: ERROR: Failed to activate queue 'udm_inventory'. rc: 1, error: error
    Jul 23 11:06:45:962 0 HA: ERROR: queue 'action_manager' expected to be in activate state but is not.
    Jul 23 11:06:45:962 0 HA: ERROR: queue 'alarm_manager' expected to be in activate state but is not.
    Jul 23 11:06:45:962 0 HA: ERROR: queue 'baseline_engine.BASELINE_CONFIG' expected to be in activate state but is not.
    Jul 23 11:06:45:962 0 HA: ERROR: queue 'data_engine' expected to be in activate state but is not.
    Jul 23 11:06:45:962 0 HA: ERROR: queue 'ems' expected to be in activate state but is not.
    Jul 23 11:06:45:962 0 HA: ERROR: queue 'event_manager' expected to be in activate state but is not.
    Jul 23 11:06:45:962 0 HA: ERROR: queue 'legacy_alarm_manager' expected to be in activate state but is not.
    Jul 23 11:06:45:962 0 HA: ERROR: queue 'prediction_engine.PREDICTION_CONFIG' expected to be in activate state but is not.
    Jul 23 11:06:45:962 0 HA: ERROR: queue 'probeDiscovery' expected to be in activate state but is not.
    Jul 23 11:06:45:962 0 HA: ERROR: queue 'tot_rule_config' expected to be in activate state but is not.
    Jul 23 11:06:52:007 0 HA: ERROR: Failed to send request to controller to update nas auto_operator/setup 'active' key to 'yes' value. rc: 6, error: permission denied
    Jul 23 11:06:52:007 0 HA: ERROR: Failed to change state of nas-ao
    Jul 23 11:06:57:015 0 HA: ERROR: probe_activate request for data_engine returned 6 (permission denied)
    Jul 23 11:06:57:015 0 HA: ERROR: probe_activate request for ace returned 6 (permission denied)

    Any idea?

    Thanks, Pino

    PS: user and password used during the installation phase are the same and certificate.pem was copied from primary.


  • 10.  RE: HA Configuration - Primary Hub GET queue

    Posted Jul 23, 2020 05:28 AM
    Did you follow the section on page 10 in the attached doc?
    • Probes to enable (and in they must be in the correct order)
    You can test this out once you are in failover mode with your probes in fail/inactive state and try to activate them manually (that is how we found the correct order to start them)


  • 11.  RE: HA Configuration - Primary Hub GET queue

    Posted Jul 23, 2020 05:37 AM
    Hello Luc,
    to test this I added just 2 probes (data_engine and ace). After HA Failover I can start them manually from IM.

    Please note that according to logs:
    1) queue activation should occur before probe activation, but it is failing
    2) probe activation fails with "permission denied"

    I'm sure I missed something but what?

    Thanks, Pino


  • 12.  RE: HA Configuration - Primary Hub GET queue

    Broadcom Employee
    Posted Jul 23, 2020 09:19 AM
    You got a good one there. Found this to try:
    https://knowledge.broadcom.com/external/article?articleId=143684

    If it keeps trying to fail it could be a network issue. Increase connection setup in HA. Forget exact fields. Verify DNS resolution(eachhub has other in Name Resolution). Verify no AV or port restrictions between the 2.


  • 13.  RE: HA Configuration - Primary Hub GET queue
    Best Answer

    Posted Jul 23, 2020 10:29 AM
    Hi Gregg,

    I reinstalled a few of probes on the secondary and now everything seems to work as expected.

    I fear there is a problem with the multiple interfaces I have on the servers.

    I set the robotip address to a fixed value, maybe I should bind other components to the same IP address.

    Thanks, Pino


  • 14.  RE: HA Configuration - Primary Hub GET queue

    Broadcom Employee
    Posted Jul 23, 2020 01:02 PM
    Could be. lock down the IP they need to communicate on. You could use net_connect to continually test your connection, but maybe just keep an eye on it.