DX NetOps

 View Only
Expand all | Collapse all

How to safely move all workload from a TADCO Data Collector to a new DC instance

  • 1.  How to safely move all workload from a TADCO Data Collector to a new DC instance

    Posted Jan 10, 2022 12:32 PM
    Hi: Happy moving polling workload from one DC to another (getting off old RHEL version), but need to check if this is as straightforward for a Tenant Agnostic DC handling polling load for a number of Tenants?


  • 2.  RE: How to safely move all workload from a TADCO Data Collector to a new DC instance

    Broadcom Employee
    Posted Jan 10, 2022 01:23 PM
    If you are just looking to run the TADco DC on a diff box running a newer OS, you need only get the DCM ID "hostname:UUID" (from DC UI or /rest/dcms in DA).  And install the DC on a new RHE release box by doing:

    as root:  DCM_ID="<dcm id>" ./install.bin
    or using sudo:  sudo DCM_ID="<dcm id>" ./install.bin

    See https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/dx-netops/21-2/Performance-Monitoring-with-DX-Performance-Management/administrating/data-aggregator-administration/update-the-data-collector.html#concept.dita_c4a5cc8a3c8a1b20159db12cb8756d3e00c82077_ReinstalltheDataCollectoronaCleanHost

    The key is using the same DCM ID, so when the new DC machine registers, the DA sees it as the older DC by it's DCM ID.  Be sure to stop the old DC before bringing the new one up.


  • 3.  RE: How to safely move all workload from a TADCO Data Collector to a new DC instance

    Posted Jan 12, 2022 03:09 PM
    Thanks Jeff.  This refreshed my memory.  Sadly, however, it all crashed around my ears. I had three unallocated DCs prebuilt, so went to remove them from PM using the Delete REST call (which I've successfully used twice today, and 3 times y'day) =HTTP 500.  No obvious reason, checked and rechecked my work, still HTTP 500. Shut down the first TADCO (stop dcmd), installed DC with DCM_ID env var set to clone it - never registered.  Eventually after much checking, ripped out new DC code, restarted old TADCO DC, it came back.  Tried stopping dadaemon to see if that cleared the DELETE REST call - tool 15min to reappear in PC - way after GET call for all DCs was coming back HTTP 200.
    Result: failure, reverted to status quo ante.
    Retire hurt to lick wounds.


  • 4.  RE: How to safely move all workload from a TADCO Data Collector to a new DC instance

    Broadcom Employee
    Posted Jan 12, 2022 04:28 PM
    Remember you can only delete a DC if it's assigned to an active IP Domain.  If unassigned, can't use DELETE /rest/ipdomainmember/<DC itemid>.

    If unassigned, you could assign to a valid IP domain and then delete via REST.  Or you'd have to remove DC itemid from item table in vertica DB and restart DB for DA to see the deletion.


  • 5.  RE: How to safely move all workload from a TADCO Data Collector to a new DC instance

    Posted Jan 13, 2022 04:28 PM
    Hmm, perhaps this is more difficult than I had hoped.  I'm changing the hostname and IP address for my replacement DC, assuming that the cloned DCM_ID will make it acceptable to DA as a replacement.  The DA logs don't seem to show any acceptance, the newly installed DC code sits there not connecting and PC GUI DC list shows no connection.
    IIRC there's a file in DC somewhere that shows what DCM_ID the installed code has acquired or picked up, but I can't recall where it is?
    The docs imply that changing hostname and IP won't be a problem but it's not working in my hands, so I'm missing something.


  • 6.  RE: How to safely move all workload from a TADCO Data Collector to a new DC instance

    Broadcom Employee
    Posted Jan 13, 2022 05:17 PM
    The DCM_ID will always be the "original DC hostname:UUID" even when moved to new hostname/IP.
    You can't change the hostname in the DCM_ID, as it will create a new DC item in the DA.

    It's a unique key.  Thinking back, we probably should've come up with something better than hostname.

    Did you confirm that 4 ports for AMQ are connected between the new DC and DA?   Maybe just start AMQ and confirm:
    netstat -an | grep 616 | grep ESTABL

    Should see 61616/61618/61620/61622.  If not, then need to open ports from new DC to DA.

    Also, the replaced DC should be done also if it's not when bringing up new dcmd process.

    Yes, IMDataCollector/apache-karaf-*/etc/com.ca.im.dm.core.collector.cfg
    collector-manager-id= should reflect the DCM_ID you used when running installer.


  • 7.  RE: How to safely move all workload from a TADCO Data Collector to a new DC instance

    Posted Jan 16, 2022 03:15 PM
    Thanks Jeff.  I'm wondering if you're hinting that this trick to replace a DC with the same DCM_ID only works if hostname is unchanged?  Different IP address might be ok.  I used this extensively back in the day (when it was only just past Polaris) but I no longer have the notes and don't recall this level of detail.
    I've had the f/w between DC => DA changed to permit incoming (DC=>DA) on TCP 8582 and checked it and it's permitted.  DA local f/w also permits incoming TCP 8582.  Other DC replacements (five, so far) have all connected and worked 1st time.  So it's only the TADCOs that are not working.  If I understand the docs correctly, there are NO ports on the DA-facing side that need to permit incoming traffic to the DC, only requirements (apart from SNMP for DC=>managed devices) are DC=>DA.  Docs appear to suggest that DC initiates all comms to DA?
    In this instance, I need to change hostname and IP.  I wonder if it would be more certain to use REST calls to create another TADCO DC and then move the workload - but this operation I shall need help with.  Perhaps I should pester BC Support?


  • 8.  RE: How to safely move all workload from a TADCO Data Collector to a new DC instance

    Broadcom Employee
    Posted Jan 18, 2022 02:25 PM
    DA requires port 61616-61623 to be opened for any DC trying to contact the DA.  DC initiates connections, not DA.

    I think you're missing my point about hostname/IP rename.   The DC can change hostname and IP all it wants.

    BUT the DCM_ID (DC unique key) MUST remain untouched.  It's the identifier for the DC, so we treat the new machine or renamed/re-ip machine as the same DC item in the DA.  So we can associate the same tenant/domain, and list of devices to poll to the DC.
    So the hostname that appears in the DCM_ID is fixed to the initial hostname the DC was created on.  You can't change the hostname in the DCM_ID or DA will create a new DA item for the DC and it will have 0 items to poll.



  • 9.  RE: How to safely move all workload from a TADCO Data Collector to a new DC instance

    Posted Jan 19, 2022 09:31 AM
    OK, you've confirmed my thinking.  The docs referenced at the start of this thread don't explicitly state that DCM_ID is actually hostname:UID, although many would argue that it's pretty clear in the UI.  This has changed since Polaris, and that's my error.
    One last thing though: looking at one of the replacement DCs (not cloned, just installed and assigned to Tenant/IPdomain): in UI it's clearly hostname:UID, but in the com.ca..im.dm.core.collector.cfg file it's written as
    hostname\:UID
    Which format should be used for the Environment variable if I need to clone any more?


  • 10.  RE: How to safely move all workload from a TADCO Data Collector to a new DC instance

    Broadcom Employee
    Posted Jan 19, 2022 09:53 AM
    The \ is there because : can be considered **** in Properties files.  InstallAnywhere adds it when it reads the file, modifies it, and saves it during upgrades.
    The \ is optional in the file really.

    The format for env var should be as seen in UI:  hostname:UUID