DX NetOps

 View Only
  • 1.  SpectroSERVER process crashed in 4 SpectroSERVERs at the same time

    Posted May 23, 2020 07:05 PM
    Hello everyone, it is a pleasure to be here. I am new in this community.

    The scenario is as follows:

    Environment 1 Linux RedHat 7.5
    Sepctrum 10.4.0
    1 VM OneClick Server 54 GB Memory 63 SWAP 300 GB Disk partition and CPU 20 Cores Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
    1 VM SpectroSERVER MLS 23 GB Memory 9 SWAP 39 GB Disk partition and CPU 16 Cores Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
    1 VM SpectroSERVER secondary 62 GB Memory 127 SWAP 600 GB Disk partition and CPU 16 Cores Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
    1 VM SpectroSERVER secondary 62 GB Memory 127 SWAP 600 GB Disk partition and CPU 16 Cores Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz

    Environment 2 Linux RedHat 7.2
    Sepctrum 10.3.0
    1 VM OneClick Server 31 GB Memory 9 SWAP 440 GB Disk partition and CPU 16 Cores Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
    1 VM SpectroSERVER MLS 31 GB Memory 9 SWAP 49 GB Disk partition and CPU 4 Cores Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
    1 VM SpectroSERVER secondary 78 GB Memory 9 SWAP 540 GB Disk partition and CPU 16 Cores Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz

    In 2 occasions, the following error has occurred in the messages log file of the Linux OS at the same time causing the SpectroSERVER process to become corrupted or error mesaage SS "Terminated".

    SS HOSTNAME-SS-MLS Environment 1 Crash
    ./audit/audit.log:type=ANOM_ABEND msg=audit(1589468891.480:208339): auid=4294967295 uid=1000 gid=1000 ses=4294967295 pid=29884 comm="SpectroSERVER" reason="memory violation" sig=11
    ./audit/audit.log:type=ANOM_ABEND msg=audit(1589473208.345:208499): auid=4294967295 uid=1000 gid=1000 ses=4294967295 pid=24787 comm="SpectroSERVER" reason="memory violation" sig=11
    ./audit/audit.log:type=ANOM_ABEND msg=audit(1589561673.021:2249): auid=4294967295 uid=1000 gid=1000 ses=4294967295 pid=2749 comm="SpectroSERVER" reason="memory violation" sig=11
    ./messages-20200517:May 14 10:08:11 HOSTNAME-SS-MLS kernel: SpectroSERVER[29884]: segfault at 7fa227b0d000 ip 00007fa26f441f00 sp 00007fa22cfb96a0 error 4 in libGlobl.so.1[7fa26f3f3000+d4000]
    ./messages-20200517:May 14 10:08:11 HOSTNAME-SS-MLS abrt-hook-ccpp: Process 29884 (SpectroSERVER) of user 1000 killed by SIGSEGV - dumping core
    ./messages-20200517:May 14 10:08:16 HOSTNAME-SS-MLS abrt-server: Executable '/home/SPECTRUM/SS/SpectroSERVER' doesn't belong to any package and ProcessUnpackaged is set to 'no'
    ./messages-20200517:May 14 11:20:08 HOSTNAME-SS-MLS kernel: SpectroSERVER[24787]: segfault at 7feaee361000 ip 00007feb2aa1ef00 sp 00007feae82c96a0 error 4 in libGlobl.so.1[7feb2a9d0000+d4000]
    ./messages-20200517:May 14 11:20:08 HOSTNAME-SS-MLS abrt-hook-ccpp: Process 24787 (SpectroSERVER) of user 1000 killed by SIGSEGV - dumping core
    ./messages-20200517:May 14 11:20:11 HOSTNAME-SS-MLS abrt-server: Executable '/home/SPECTRUM/SS/SpectroSERVER' doesn't belong to any package and ProcessUnpackaged is set to 'no'
    ./messages-20200517:May 15 11:54:33 HOSTNAME-SS-MLS kernel: SpectroSERVER[2749]: segfault at 7f17376a4000 ip 00007f177bed3f00 sp 00007f1737f856a0 error 4 in libGlobl.so.1[7f177be85000+d4000]
    ./messages-20200517:May 15 11:54:33 HOSTNAME-SS-MLS abrt-hook-ccpp: Process 2749 (SpectroSERVER) of user 1000 killed by SIGSEGV - dumping core
    ./messages-20200517:May 15 11:54:36 HOSTNAME-SS-MLS abrt-server: Executable '/home/SPECTRUM/SS/SpectroSERVER' doesn't belong to any package and ProcessUnpackaged is set to 'no'

    SS HOSTNAME-SS-SECONDARY-1 Environment 1 Crash
    ./audit/audit.log:type=ANOM_ABEND msg=audit(1589468877.979:114432): auid=4294967295 uid=1000 gid=1000 ses=4294967295 pid=19059 comm="SpectroSERVER" reason="memory violation" sig=11
    ./messages-20200517:May 14 10:07:57 HOSTNAME-SS-SECONDARY-1 kernel: SpectroSERVER[19059]: segfault at 7ff50f7cf000 ip 00007ff8b8c6af00 sp 00007ff3c73856a0 error 4 in libGlobl.so.1[7ff8b8c1c000+d4000]
    ./messages-20200517:May 14 10:07:58 HOSTNAME-SS-SECONDARY-1 abrt-hook-ccpp: Process 19059 (SpectroSERVER) of user 1000 killed by SIGSEGV - dumping core
    ./messages-20200517:May 14 10:10:32 HOSTNAME-SS-SECONDARY-1 abrt-server: Executable '/home/SPECTRUM/SS/SpectroSERVER' doesn't belong to any package and ProcessUnpackaged is set to 'no'

    SS HOSTNAME-SS-SECONDARY-2 Environment 1 Crash
    ./audit/audit.log.3:type=ANOM_ABEND msg=audit(1581442972.218:127312): auid=4294967295 uid=1000 gid=1000 ses=4294967295 pid=114545 comm="SpectroSERVER" reason="memory violation" sig=11
    ./audit/audit.log:type=ANOM_ABEND msg=audit(1589473208.254:191): auid=4294967295 uid=1000 gid=1000 ses=4294967295 pid=3356 comm="SpectroSERVER" reason="memory violation" sig=11
    ./audit/audit.log:type=ANOM_ABEND msg=audit(1589561636.412:1928): auid=4294967295 uid=1000 gid=1000 ses=4294967295 pid=6643 comm="SpectroSERVER" reason="memory violation" sig=11
    ./messages-20200517:May 14 11:20:08 HOSTNAME-SS-SECONDARY-2 kernel: SpectroSERVER[3356]: segfault at 7f471805e000 ip 00007f4784cdcf00 sp 00007f4723d696a0 error 4 in libGlobl.so.1[7f4784c8e000+d4000]
    ./messages-20200517:May 14 11:20:08 HOSTNAME-SS-SECONDARY-2 abrt-hook-ccpp: Process 3356 (SpectroSERVER) of user 1000 killed by SIGSEGV - dumping core
    ./messages-20200517:May 14 11:20:17 HOSTNAME-SS-SECONDARY-2 abrt-server: Executable '/home/SPECTRUM/SS/SpectroSERVER' doesn't belong to any package and ProcessUnpackaged is set to 'no'
    ./messages-20200517:May 15 11:53:56 HOSTNAME-SS-SECONDARY-2 kernel: SpectroSERVER[6643]: segfault at 7f936bf55000 ip 00007f93e5c1ef00 sp 00007f9384dd56a0 error 4 in libGlobl.so.1[7f93e5bd0000+d4000]
    ./messages-20200517:May 15 11:53:56 HOSTNAME-SS-SECONDARY-2 abrt-hook-ccpp: Process 6643 (SpectroSERVER) of user 1000 killed by SIGSEGV - dumping core

    SS HOSTNAME-SS-MLS2 Environment 2 Crash
    ./messages-20200517:May 15 11:54:09 HOSTNAME-SS-MLS2 kernel: SpectroSERVER[10939]: segfault at 7f6e3cf74000 ip 00007f6e7fcc32b0 sp 00007f6e3ebe16a0 error 4 in libGlobl.so.1[7f6e7fc71000+d4000]
    ./messages-20200517:May 15 11:54:09 HOSTNAME-SS-MLS2 kernel: type=1701 audit(1589561649.470:55151873): auid=4294967295 uid=1001 gid=1001 ses=4294967295 pid=10939 comm="SpectroSERVER" reason="memory violation" sig=11
    ./messages-20200517:May 15 11:54:12 HOSTNAME-SS-MLS2 abrt-server: Executable '/home/SPECTRUM/SS/SpectroSERVER' doesn't belong to any package and ProcessUnpackaged is set to 'no'

        SpectroSERVERs have different loads, but especially MLS have no load.
        Virtual machines reside on different ESX.
        Performance is not critically affected.
        The VNM.OUT log do not show much information about the affected process.
        Only one secondary SS  shows the following lines in  VNM.out file.

    may 15 12:54:34 ERROR TRACE at CsIHCrMdlEv.cc(354): Model Name is not set after re-evaluation for mh:0x2157f88
    may 15 12:59:54 WARNING at CsIHOverCapacity.cc(330): SpectroSERVER is over capacity threshold of 95%, generating 5 performance dumps to determine source of overload:
    Saved compact diagnostic file to '/home/SPECTRUM/SS/support/SpectroSERVER_20200515_1259.dmp'
    Saved compact diagnostic file to '/home/SPECTRUM/SS/support/SpectroSERVER_20200515_1300.dmp'
    Saved compact diagnostic file to '/home/SPECTRUM/SS/support/SpectroSERVER_20200515_1301.dmp'
    Saved compact diagnostic file to '/home/SPECTRUM/SS/support/SpectroSERVER_20200515_1303.dmp'
    Saved compact diagnostic file to '/home/SPECTRUM/SS/support/SpectroSERVER_20200515_1304.dmp'
    may 15 14:07:55 ERROR TRACE at CsIHPrtIPLS.cc(1090): Waited 60000ms for IPLS evaluate lock for mh: 0x209294e, continuing without lock
    may 15 14:08:56 ERROR TRACE at CsIHPrtIPLS.cc(1090): Waited 60000ms for IPLS evaluate lock for mh: 0x209294e, continuing without lock
    may 15 14:09:56 ERROR TRACE at CsIHPrtIPLS.cc(1090): Waited 60000ms for IPLS evaluate lock for mh: 0x209294e, continuing without lock
    may 15 14:10:56 ERROR TRACE at CsIHPrtIPLS.cc(1090): Waited 60000ms for IPLS evaluate lock for mh: 0x209294e, continuing without lock
    may 15 14:10:58 ERROR TRACE at CsIHPrtIPLS.cc(1090): Waited 60000ms for IPLS evaluate lock for mh: 0x209294e, continuing without lock
    may 18 17:30:04 WARNING at CsIHOverCapacity.cc(330): SpectroSERVER is over capacity threshold of 95%, generating 5 performance dumps to determine source of overload:
    Saved compact diagnostic file to '/home/SPECTRUM/SS/support/SpectroSERVER_20200518_1730.dmp'
    Saved compact diagnostic file to '/home/SPECTRUM/SS/support/SpectroSERVER_20200518_1731.dmp'
    Saved compact diagnostic file to '/home/SPECTRUM/SS/support/SpectroSERVER_20200518_1732.dmp'
    Saved compact diagnostic file to '/home/SPECTRUM/SS/support/SpectroSERVER_20200518_1733.dmp'
    Saved compact diagnostic file to '/home/SPECTRUM/SS/support/SpectroSERVER_20200518_1734.dmp'


    I have one question:

    1.- Is it possible that one SpectroServer can kill the process of other SpectroSERVERs at the same time in a distributed environment?

    Do you have any idea of this behavior?

    Regards, I would appreciate your help. 



  • 2.  RE: SpectroSERVER process crashed in 4 SpectroSERVERs at the same time
    Best Answer

    Broadcom Employee
    Posted May 24, 2020 03:07 PM
    Edited by Christopher Hackett Jun 09, 2020 06:15 PM
    Hi Manuel,

    Please follow the KB article 191615 to generate the gdb stack trace and raise a ticket at Broadcom's support to investigate this SpectroSERVER crash.
    https://knowledge.broadcom.com/external/article?articleId=191615

    Also provide the output of "getspectruminfo.sh lite" script:
    https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/it-operations-management/spectrum/10-4/administrating/oneclick-administration/troubleshooting-oneclick/using-the-getspectruminfo-script.html

    ------------------------------
    Technical Support Engineer IV
    Broadcom Inc
    ------------------------------



  • 3.  RE: SpectroSERVER process crashed in 4 SpectroSERVERs at the same time

    Posted May 25, 2020 04:28 PM
    Hi Silvio, It is a pleasure to greet you.

    I am working on raising the case with Broadcom.

    Thank you very much for the information.

    Regards


    ------------------------------
    Consultant
    DST
    ------------------------------



  • 4.  RE: SpectroSERVER process crashed in 4 SpectroSERVERs at the same time

    Posted May 25, 2020 03:44 PM
    We started seeing simultaneous SpectroSERVER crashes shortly after we upgraded PM to version 3.7.11. This environment is running on version 10.3.2 though.

    ------------------------------
    Solution Architect
    DICOS GmbH Kommunikationssysteme
    ------------------------------



  • 5.  RE: SpectroSERVER process crashed in 4 SpectroSERVERs at the same time

    Posted May 25, 2020 04:37 PM
    Hi Maik, thanks

    In the current environment it is integrated with CA PM 3.6.0.349 but the synchronization runs at 1 a.m. daily.

    Is there a log file to see integration errors?

    Regards


    ------------------------------
    Consultant
    DST
    ------------------------------