DX NetOps

 View Only
Expand all | Collapse all

Missing alerts

  • 1.  Missing alerts

    Posted Aug 04, 2021 10:49 AM

    Our CA Performance Management ver 21.2.1 has two Data Collectors one at each primary and backup datacenter.  Due to some power issues we had to shut down one of the data center. It's a round robin allocation of devices, so due to outage we missed alerts on about 1/2 the network devices. Any suggestions how we can address this issue.

     



  • 2.  RE: Missing alerts

    Posted Aug 04, 2021 11:06 AM
    Edited by David DuPre Aug 04, 2021 11:06 AM
    Add two or more data collectors at each datacenter and select all of them and balance the load across collectors each datacenter, so that primary and backup are polling devices.  This would mean each device would be configured to expect a poll from all of the Data Collectors.  Then Primary will have polled all devices, and the backup would polled all devices.


  • 3.  RE: Missing alerts

    Posted Aug 04, 2021 12:06 PM
    Does it mean I have to add F5 at each datacenter for load balancing?


  • 4.  RE: Missing alerts

    Posted Aug 04, 2021 12:37 PM
    Normally the Data Aggregator tells the data collectors which devices to poll.  There is a menu option to balance the load across multiple data collectors.  Select all the Data Collectors attached to a Data Aggregator and rebalance (for an IP Domain)  you can have more than one Data Collector assigned to an IP Domain.  The Data Collector reaches out to the devices it is assigned to contact.  So I am not sure what your F5 loadbalancer is for in this case.


  • 5.  RE: Missing alerts

    Posted Aug 04, 2021 01:20 PM
    Hi David ,

     Where is the option to do this load balance with Data collectors .


  • 6.  RE: Missing alerts

    Posted Aug 04, 2021 01:44 PM
    You need to navigate to the Data Aggregator.  I don't have version 21.x installed so the screen will be a little different.   Here is what my screen looks like in my lab.  I have only two data collectors.
    To get there I mouse over the "Administration" menu, then "Data Sources", then "Data Aggregator".



  • 7.  RE: Missing alerts

    Posted Aug 04, 2021 05:56 PM
    I have two data collectors (marked and highlighted), the two other data collectors are the proxy servers for our secure RCSC zone.


    Here is the sample device, if you see the highlighted text,  you can see that it is configured for "Default" IP Domain and the "DC Host" only gives me a an option to pick one of the Data Collector not both. Furthermore, how can I confirm if the devices is configured to accept polls from both Data Collectors.


    Thanks,


  • 8.  RE: Missing alerts

    Posted Aug 04, 2021 10:38 PM
    It appears you have a design problem.

    If you want to have a Primary and Backup Data Center (DC)

    1.  Don't you need two CAPM installations, one in each data center?
    2. Wouldn't this mean that each data center was polling all devices in both centers?
    3  So you could access the CAPM Primary and see all devices in Pri and Backup DCs.
    4  When Primary DC was offline, you could use the CAPM in the BAckup DC to view all the devices still active in the Backup DC.
    5.  This is why I said your Data Collectors need the Access to poll all your devices from Primary and Backup Data Centers... where each has it's own set of NetOps Portal, Data Agg, and Data Collectors.

    You need Netops duplicated in both Primary and Backup Data Centers.

    Network devices normally require a permission (ACL) to be set for a device to do SNMP discovery and SNMP Polling. 
    If yours are not locked down then any device in your network can query the SNMP settings give they have the community strings.

    David


  • 9.  RE: Missing alerts

    Posted Aug 05, 2021 09:28 AM
    What is time duration of the rebalance checks within DC's. One of my DC got down but the devices are not moved to other DC, if I do a manual rebalance devices get migrates.


  • 10.  RE: Missing alerts

    Posted Aug 05, 2021 09:59 AM
    Best practice is to move all devices off the Data Collector you are going to shutdown prior to the shutdown.


  • 11.  RE: Missing alerts

    Posted Aug 05, 2021 11:04 AM
    So for unusual outage we can automate to move the devices?


  • 12.  RE: Missing alerts

    Posted Aug 05, 2021 12:18 PM
    No.  Not really.  If monitoring and reporting is considered critical to the business you should design it so that data is collected redundantly so that if you fail over to the backup site nothing is lost.

    If the disaster recovery plan does not need Performance Management data to continue to do business then a loss of some reports and metrics collected is not a problem.

    If doing business requires certain SLAs that depend on data from Performance Management, then you need to redesign it so that PM is accessable in a disaster.

    David


  • 13.  RE: Missing alerts

    Posted Aug 05, 2021 01:27 PM
    But as per the product design these Data collectors doesn't have redundancy itself .Assume that one of my Dc does goes down due to some VM issue which is located in APP layer and it is collected data for APP , WEB and DB layers .. So now there is impact across monitored devices. As of now no one does poll the devices from two different setup continuously . Even spectrum ,UIM has in build feature which supports HA without dual polling


  • 14.  RE: Missing alerts

    Posted Aug 05, 2021 02:50 PM
    Fault Tollerance is documented here:
    https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/performance-management/20-2/administrating/fault-tolerance/configure-a-fault-tolerant-environment.html