Yep, the controller is restarting during the deployment of the data_engine. Controller log extract:
Nov 5 12:42:39:167 [139936862943040] 0 Controller: Selecting robotip from configuration. config_robotip = Primary_NMS_IP, cglob robotip = Primary_NMS_IP, local_ip_validation = 1, validate_ip_suggestion = 0, strict_ip_binding = 0
Nov 5 12:42:39:168 [139936862943040] 0 Controller: --------------------------------------------------------------------------------------------------------
Nov 5 12:42:39:168 [139936862943040] 0 Controller: ----- Robot controller 9.31 [Build 9.31.1501, Sep 17 2020] started -----
Nov 5 12:42:39:168 [139936862943040] 0 Controller: Name = Primary_NMS, IP = Primary_NMS_IP, Port = 48000
Nov 5 12:42:39:168 [139936862943040] 0 Controller: OS = UNIX / Linux / Linux 3.10.0-693.17.1.el7.x86_64 #1 SMP Sun Jan 14 10:36:03 EST 2018 x86_64
Nov 5 12:42:39:168 [139936862943040] 0 Controller: Domain = MDS
Nov 5 12:42:39:168 [139936862943040] 0 Controller: Primary HUB = /MDS/Primary_NMS/Primary_NMS Primary_NMS_IP
Nov 5 12:42:39:168 [139936862943040] 0 Controller: Loglevel = 0, Logfile = controller.log
Nov 5 12:42:39:198 [139936862943040] 0 Controller: Running as user root (0)
Nov 5 12:42:39:198 [139936862943040] 0 Controller: -----
Nov 5 12:42:39:199 [139936862943040] 0 Controller: Controller on Primary_NMS port 48000 started
Nov 5 12:42:40:139 [139936862943040] 0 Controller: _ProcStart - Probe 'hub' - starting
Nov 5 12:42:42:101 [139936862943040] 0 Controller: Hub localhost(Primary_NMS_IP) contact established
Nov 5 12:42:42:126 [139936862943040] 0 Controller: _ProcStart - Probe 'distsrv' - starting
Nov 5 12:42:43:462 [139936862943040] 0 Controller: _ProcStart - Probe 'hdb' - starting
Nov 5 12:42:44:586 [139936862943040] 0 Controller: _ProcStart - Probe 'mpse' - starting
Nov 5 12:42:45:629 [139936862943040] 0 Controller: _ProcStart - Probe 'alarm_enrichment' - starting
Nov 5 12:42:46:058 [139936862943040] 0 Controller: _ProcStart - Probe 'baseline_engine' - starting
Nov 5 12:42:47:014 [139936862943040] 0 Controller: _ProcStart - Probe 'prediction_engine' - starting
Nov 5 12:42:48:008 [139936862943040] 0 Controller: _ProcStart - Probe 'discovery_agent' - starting
Nov 5 12:42:49:331 [139936862943040] 0 Controller: _ProcStart - Probe 'cm_data_import' - starting
Nov 5 12:42:50:210 [139936862943040] 0 Controller: _ProcStart - Probe 'ppm' - starting
Nov 5 12:42:51:112 [139936862943040] 0 Controller: _ProcStart - Probe 'ems' - starting
Nov 5 12:42:51:533 [139936862943040] 0 Controller: login - unauthorized probe (Primary_NMS_IP/35024)
Nov 5 12:42:52:030 [139936862943040] 0 Controller: _ProcStart - Probe 'automated_deployment_engine' - starting
Nov 5 12:42:53:310 [139936862943040] 0 Controller: _ProcStart - Probe 'nas' - starting
Nov 5 12:42:54:114 [139936862943040] 0 Controller: _ProcStart - Probe 'data_engine' - starting
Nov 5 12:42:55:451 [139936862943040] 0 Controller: _ProcStart - Probe 'ems' - starting
Nov 5 12:43:04:185 [139936862943040] 0 Controller: _ProcStart - Probe 'udm_manager' - starting
Nov 5 12:43:05:008 [139936862943040] 0 Controller: _ProcStart - Probe 'maintenance_mode' - starting
Nov 5 12:43:06:172 [139936862943040] 0 Controller: _ProcStart - Probe 'sla_engine' - starting
Nov 5 12:43:07:002 [139936862943040] 0 Controller: _ProcStart - Probe 'qos_processor' - starting
Nov 5 12:43:08:051 [139936862943040] 0 Controller: _ProcStart - Probe 'nis_server' - starting
Nov 5 12:43:09:309 [139936862943040] 0 Controller: _ProcStart - Probe 'discovery_server' - starting
Nov 5 12:43:10:534 [139936862943040] 0 Controller: _ProcStart - Probe 'mon_config_service' - starting
Nov 5 12:43:11:156 [139936862943040] 0 Controller: _ProcStart - Probe 'ace' - starting
Nov 5 12:43:15:039 [139936862943040] 0 Controller: _ProcStart - Probe 'trellis' - starting
Nov 5 12:44:03:253 [140582783477568] 0 Controller: Selecting robotip from configuration. config_robotip = Primary_NMS_IP, cglob robotip = Primary_NMS_IP, local_ip_validation = 1, validate_ip_suggestion = 0, strict_ip_binding = 0
Nov 5 12:44:03:254 [140582783477568] 0 Controller: --------------------------------------------------------------------------------------------------------
Nov 5 12:44:03:254 [140582783477568] 0 Controller: ----- Robot controller 9.31 [Build 9.31.1501, Sep 17 2020] started -----
Nov 5 12:44:03:254 [140582783477568] 0 Controller: Name = Primary_NMS, IP = Primary_NMS_IP, Port = 48000
Nov 5 12:44:03:254 [140582783477568] 0 Controller: OS = UNIX / Linux / Linux 3.10.0-693.17.1.el7.x86_64 #1 SMP Sun Jan 14 10:36:03 EST 2018 x86_64
Nov 5 12:44:03:254 [140582783477568] 0 Controller: Domain = MDS
Nov 5 12:44:03:254 [140582783477568] 0 Controller: Primary HUB = /MDS/Primary_NMS/Primary_NMS Primary_NMS_IP
Nov 5 12:44:03:254 [140582783477568] 0 Controller: Loglevel = 0, Logfile = controller.log
Nov 5 12:44:03:285 [140582783477568] 0 Controller: Running as user root (0)
Nov 5 12:44:03:285 [140582783477568] 0 Controller: -----
Nov 5 12:44:03:285 [140582783477568] 0 Controller: Stopping processes from previous run
Nov 5 12:44:03:285 [140582783477568] 0 Controller: ProcessControl: Sending SIGTERM signal to hub (27545)...
Nov 5 12:44:09:286 [140582783477568] 0 Controller: ProcessControl: Child exited
Nov 5 12:44:09:286 [140582783477568] 0 Controller: ProcessControl: Sending SIGTERM signal to distsrv (27638)...
Nov 5 12:44:10:286 [140582783477568] 0 Controller: ProcessControl: Child exited
Nov 5 12:44:10:286 [140582783477568] 0 Controller: ProcessControl: Sending SIGTERM signal to hdb (27643)...
Nov 5 12:44:12:287 [140582783477568] 0 Controller: ProcessControl: Child exited
Nov 5 12:44:12:287 [140582783477568] 0 Controller: ProcessControl: Sending SIGTERM signal to mpse (27644)...
Nov 5 12:44:13:287 [140582783477568] 0 Controller: ProcessControl: Child exited
Nov 5 12:44:13:287 [140582783477568] 0 Controller: ProcessControl: Sending SIGTERM signal to alarm_enrichment (27657)...
Nov 5 12:44:14:287 [140582783477568] 0 Controller: ProcessControl: Child exited
Nov 5 12:44:14:287 [140582783477568] 0 Controller: ProcessControl: Sending SIGTERM signal to baseline_engine (27671)...
Nov 5 12:44:15:287 [140582783477568] 0 Controller: ProcessControl: Child exited
Nov 5 12:44:15:287 [140582783477568] 0 Controller: ProcessControl: Sending SIGTERM signal to prediction_engine (27702)...
Nov 5 12:44:16:287 [140582783477568] 0 Controller: ProcessControl: Child exited
Nov 5 12:44:16:287 [140582783477568] 0 Controller: ProcessControl: Sending SIGTERM signal to discovery_agent (27723)...
Nov 5 12:44:17:288 [140582783477568] 0 Controller: ProcessControl: Child exited
Nov 5 12:44:17:288 [140582783477568] 0 Controller: ProcessControl: Sending SIGTERM signal to cm_data_import (27748)...
Nov 5 12:44:18:288 [140582783477568] 0 Controller: ProcessControl: Child exited
Nov 5 12:44:18:288 [140582783477568] 0 Controller: ProcessControl: Sending SIGTERM signal to ppm (27781)...
Nov 5 12:44:19:288 [140582783477568] 0 Controller: ProcessControl: Child exited
Nov 5 12:44:19:288 [140582783477568] 0 Controller: ProcessControl: Sending SIGTERM signal to automated_deployment_engine (27861)...
Nov 5 12:44:20:288 [140582783477568] 0 Controller: ProcessControl: Child exited
Nov 5 12:44:20:288 [140582783477568] 0 Controller: ProcessControl: Sending SIGTERM signal to nas (27887)...
Nov 5 12:44:27:289 [140582783477568] 0 Controller: ProcessControl: Child exited
Nov 5 12:44:27:289 [140582783477568] 0 Controller: ProcessControl: Sending SIGTERM signal to ems (27928)...
Nov 5 12:44:28:289 [140582783477568] 0 Controller: ProcessControl: Child exited
Nov 5 12:44:28:289 [140582783477568] 0 Controller: ProcessControl: Sending SIGTERM signal to udm_manager (27971)...
Nov 5 12:44:29:290 [140582783477568] 0 Controller: ProcessControl: Child exited
Nov 5 12:44:29:290 [140582783477568] 0 Controller: ProcessControl: Sending SIGTERM signal to maintenance_mode (27995)...
Nov 5 12:44:30:290 [140582783477568] 0 Controller: ProcessControl: Child exited
Nov 5 12:44:30:290 [140582783477568] 0 Controller: ProcessControl: Sending SIGTERM signal to sla_engine (28021)...
Nov 5 12:44:31:290 [140582783477568] 0 Controller: ProcessControl: Child exited
Nov 5 12:44:31:290 [140582783477568] 0 Controller: ProcessControl: Sending SIGTERM signal to qos_processor (28043)...
Nov 5 12:44:41:291 [140582783477568] 0 Controller: ProcessControl: Child exited
Nov 5 12:44:41:291 [140582783477568] 0 Controller: ProcessControl: Sending SIGTERM signal to nis_server (28082)...
Nov 5 12:44:42:291 [140582783477568] 0 Controller: ProcessControl: Child exited
Nov 5 12:44:42:292 [140582783477568] 0 Controller: ProcessControl: Sending SIGTERM signal to discovery_server (28104)...
Nov 5 12:44:43:292 [140582783477568] 0 Controller: ProcessControl: Child exited
Nov 5 12:44:43:292 [140582783477568] 0 Controller: ProcessControl: Sending SIGTERM signal to mon_config_service (28127)...
Nov 5 12:44:44:292 [140582783477568] 0 Controller: ProcessControl: Child exited
Nov 5 12:44:44:292 [140582783477568] 0 Controller: ProcessControl: Sending SIGTERM signal to ace (28143)...
Nov 5 12:44:45:292 [140582783477568] 0 Controller: ProcessControl: Child exited
Nov 5 12:44:45:292 [140582783477568] 0 Controller: ProcessControl: Sending SIGTERM signal to trellis (28198)...
Nov 5 12:44:46:292 [140582783477568] 0 Controller: ProcessControl: Child exited
Nov 5 12:44:46:293 [140582783477568] 0 Controller: Controller on Primary_NMS port 48000 started
Nov 5 12:44:46:991 [140582783477568] 0 Controller: _ProcStart - Probe 'hub' - starting
Nov 5 12:44:49:078 [140582783477568] 0 Controller: Hub localhost(Primary_NMS_IP) contact established
Nov 5 12:44:49:138 [140582783477568] 0 Controller: _ProcStart - Probe 'distsrv' - starting
Nov 5 12:44:50:417 [140582783477568] 0 Controller: _ProcStart - Probe 'hdb' - starting
Nov 5 12:44:51:054 [140582783477568] 0 Controller: _ProcStart - Probe 'mpse' - starting
Nov 5 12:44:52:001 [140582783477568] 0 Controller: _ProcStart - Probe 'alarm_enrichment' - starting
Nov 5 12:44:59:257 [140243720660800] 0 Controller: Selecting robotip from configuration. config_robotip = Primary_NMS_IP, cglob robotip = Primary_NMS_IP, local_ip_validation = 1, validate_ip_suggestion = 0, strict_ip_binding = 0
Nov 5 12:44:59:258 [140243720660800] 0 Controller: --------------------------------------------------------------------------------------------------------
Nov 5 12:44:59:258 [140243720660800] 0 Controller: ----- Robot controller 9.31 [Build 9.31.1501, Sep 17 2020] started -----
Nov 5 12:44:59:258 [140243720660800] 0 Controller: Name = Primary_NMS, IP = Primary_NMS_IP, Port = 48000
Nov 5 12:44:59:258 [140243720660800] 0 Controller: OS = UNIX / Linux / Linux 3.10.0-693.17.1.el7.x86_64 #1 SMP Sun Jan 14 10:36:03 EST 2018 x86_64
Nov 5 12:44:59:258 [140243720660800] 0 Controller: Domain = MDS
Nov 5 12:44:59:258 [140243720660800] 0 Controller: Primary HUB = /MDS/Primary_NMS/Primary_NMS Primary_NMS_IP
Nov 5 12:44:59:258 [140243720660800] 0 Controller: Loglevel = 0, Logfile = controller.log
Nov 5 12:44:59:289 [140243720660800] 0 Controller: Running as user root (0)
Nov 5 12:44:59:289 [140243720660800] 0 Controller: -----
Nov 5 12:44:59:289 [140243720660800] 0 Controller: Stopping processes from previous run
Nov 5 12:44:59:289 [140243720660800] 0 Controller: ProcessControl: Sending SIGTERM signal to hub (28590)...
Nov 5 12:45:05:290 [140243720660800] 0 Controller: ProcessControl: Child exited
Nov 5 12:45:05:290 [140243720660800] 0 Controller: ProcessControl: Sending SIGTERM signal to distsrv (28683)...
Nov 5 12:45:06:290 [140243720660800] 0 Controller: ProcessControl: Child exited
Nov 5 12:45:06:290 [140243720660800] 0 Controller: ProcessControl: Sending SIGTERM signal to hdb (28691)...
Nov 5 12:45:08:290 [140243720660800] 0 Controller: ProcessControl: Child exited
Nov 5 12:45:08:291 [140243720660800] 0 Controller: ProcessControl: Sending SIGTERM signal to mpse (28693)...
Nov 5 12:45:08:291 [140243720660800] 0 Controller: ProcessControl: Unable to send stop signal to process mpse (28693)
Nov 5 12:45:09:291 [140243720660800] 0 Controller: ProcessControl: Child exited
Nov 5 12:45:09:291 [140243720660800] 0 Controller: ProcessControl: Sending SIGTERM signal to alarm_enrichment (28706)...
Nov 5 12:45:09:291 [140243720660800] 0 Controller: ProcessControl: Unable to send stop signal to process alarm_enrichment (28706)
Nov 5 12:45:10:291 [140243720660800] 0 Controller: ProcessControl: Child exited
Nov 5 12:45:10:291 [140243720660800] 0 Controller: Controller on Primary_NMS port 48000 started
Nov 5 12:45:11:000 [140243720660800] 0 Controller: _ProcStart - Probe 'hub' - starting
Nov 5 12:45:13:121 [140243720660800] 0 Controller: Hub localhost(Primary_NMS_IP) contact established
Nov 5 12:45:13:172 [140243720660800] 0 Controller: _ProcStart - Probe 'distsrv' - starting
Nov 5 12:45:14:454 [140243720660800] 0 Controller: _ProcStart - Probe 'hdb' - starting
Nov 5 12:45:15:192 [140243720660800] 0 Controller: _ProcStart - Probe 'mpse' - starting
Nov 5 12:45:16:643 [140243720660800] 0 Controller: _ProcStart - Probe 'alarm_enrichment' - starting
Nov 5 12:45:17:006 [140243720660800] 0 Controller: _ProcStart - Probe 'baseline_engine' - starting
Nov 5 12:45:18:147 [140243720660800] 0 Controller: _ProcStart - Probe 'prediction_engine' - starting
Nov 5 12:45:19:197 [140243720660800] 0 Controller: _ProcStart - Probe 'discovery_agent' - starting
Nov 5 12:45:20:541 [140243720660800] 0 Controller: _ProcStart - Probe 'cm_data_import' - starting
Nov 5 12:45:21:006 [140243720660800] 0 Controller: _ProcStart - Probe 'ppm' - starting
Nov 5 12:45:22:000 [140243720660800] 0 Controller: _ProcStart - Probe 'ems' - starting
Nov 5 12:45:22:063 [140243720660800] 0 Controller: login - unauthorized probe (Primary_NMS_IP/37938)
Nov 5 12:45:23:063 [140243720660800] 0 Controller: _ProcStart - Probe 'automated_deployment_engine' - starting
Nov 5 12:45:24:178 [140243720660800] 0 Controller: _ProcStart - Probe 'nas' - starting
Nov 5 12:45:25:560 [140243720660800] 0 Controller: _ProcStart - Probe 'ems' - starting
Nov 5 12:45:25:621 [140243720660800] 0 Controller: _ProcStart - Probe 'data_engine' - starting
------------------------------
CA - UIM administrator
------------------------------
Original Message:
Sent: 11-04-2020 03:36 PM
From: David MICHEL
Subject: Upgrade to 20.3 fails on configuring data_engine
looks like it did the database update correctly and then failed deploying the data_engine
strResultString=Failed, strInstState=Not Deployed
Session error, Unable to open a client session for :48000: Connection refused (Connection refused)
strange case if this is what it seems, the connection from ade running at the primary hub got a connection refused when trying to deploy to the primary hub.
not sure so some general ideas
if you got any custom packages in the nimsoft archive folder, move them out
ensure the virus scanner is disabled
don't know enough to know if linux firewall could be involved but worth checking, and SELinux
------------------------------
Support Engineer
Broadcom
Original Message:
Sent: 11-03-2020 04:48 AM
From: Sam Green
Subject: Upgrade to 20.3 fails on configuring data_engine
Hi all,
Can anyone shed any light on why this install is failing when it attempts to configure the data_engine probe?
Log attached, it suggests a connectivity issue, I've checked connectivity between the Servers and all seems fine.
------------------------------
CA - UIM administrator
------------------------------