DX NetOps

 View Only

Heartbeat issues between OneClick and SpectroSERVERs and Main Location Servers

  • 1.  Heartbeat issues between OneClick and SpectroSERVERs and Main Location Servers

    Posted Oct 17, 2019 09:31 AM
    Issue:
    Heartbeat between SpectroSERVER (SS) and OneClick (OC) server shows greater than 45 seconds, where the heartbeat is sent every 45 seconds from SS to OC server.

    Impact:
    Alarms/Events from the remote SS are not sync'ed with the OC server for the length of time that the heartbeat is not received, in this case 365,161 seconds, or ~4.2 days.

    Symptoms:
    Heartbeat is not re-established from SS to OC server.
    Telnet's from SS to OC on TCP port 14001 are successful.
    Packet captures show heartbeat being sent from SS to OC server on TCP port 14001 every 45 seconds.

    Cause:
    Unknown

    Means to Monitor:
    There is no automated means to monitor for this issue. 
    REST queries against heartbeat only show "Success" and never fail.  
        http://<OCServer>:<port>/spectrum/restful/heartbeat 
    Manual means to monitor is to log into the OC server, go to Administration, then select "Landscapes" on the left.

    Temporary Workaround:
    Restart Tomcat on the OC server. 
    Users will experience an interruption to the OC Console. 
    Advise users to log into another Distributed OC server in the meantime.

    From Broadcom/CA Technical support:
    "We have one of our architects looking into this issue, as the same issue was also reported recently by another customer. The architect was able to reproduce the issue internally. They are currently developing a fix that will essentially re-register a connection between the two systems after a drop for a certain time interval.
    As far as automating monitoring of these heartbeat events, this would be considered an enhancement to the current functionality. This new code would only be included in a future release."

    fyi