DX Application Performance Management

 View Only
Expand all | Collapse all

Enterprise Manager restarted by administrator error in CEM , EM crashing

  • 1.  Enterprise Manager restarted by administrator error in CEM , EM crashing

    Posted Jan 22, 2022 02:10 AM
    Edited by Shiv Choudhary Jan 22, 2022 10:30 PM
    Our MoM server crashes intermittently , service getting restarted on it's own & comes up.
    Didn't know exactly why this is happening . attaching logs & screenshot from the MoM Server & we are using APM 10.7.0.45 version .  Case has also been raised 32979151 with Broadcom.




    ------------------------------
    Shiv Choudhary
    India
    ------------------------------

    Attachment(s)

    log
    EMService.log   7.34 MB 1 version


  • 2.  RE: Enterprise Manager restarted by administrator error in CEM , EM crashing

    Broadcom Employee
    Posted Jan 24, 2022 03:38 AM

    Hi Shiv,
    this could be linked to the other issue you see with the TIM & Defect Storm.
    The one thing we do not tend to see, is that the aggregator is spawned as its own instance. Hence requiring the same amount of memory as usually the EM/MoM itself.

    This in turn means that if you MoM is using 1Gb, when the aggregator runs (hourly aggregator) it will use one additional GB of Ram.
    If you now have the daily aggregator running because it is catching up (due to the TIM not providing data for a while) to handle the statistics and the defects, this 1Gb will also add up. If in the worst case you are at the end of a Month and have also the Weekly and Monthly aggregators running at the same time, you'll have 4 additional aggregators running, each using another 1Gb of RAM. 

    So - only thing you can do here - if this is a Virtual machine, add enough resources (here RAM and CPU to speed up the process). If not as stated in the other thread (https://community.broadcom.com/enterprisesoftware/communities/community-home/digestviewer/viewthread?MessageKey=32ffb92e-41ea-4af0-ae83-42b19dff0c8a&CommunityKey=be08e336-5d32-4176-96fe-a778ffe72115&tab=digestviewer&bm=32ffb92e-41ea-4af0-ae83-42b19dff0c8a#bm32ffb92e-41ea-4af0-ae83-42b19dff0c8a) make sure you stop the defect storm condition so the MoM/EMs can run normally.

    Note: I also see that you have one of the first releases. Using (applying) SP3 at least would help increase stability.




  • 3.  RE: Enterprise Manager restarted by administrator error in CEM , EM crashing

    Posted Jan 28, 2022 04:05 AM
    Edited by Shiv Choudhary Jan 28, 2022 04:12 AM

    Hi Jorg,

    No error is showing as of now in CEM, only error getting now is Enterprise manager restarted and with every instances of error in CEM getting log message java. lang.OutofMemoryError : Java heap space.


    1) I doubt if initial Java heap memory & maximum java heap memory  setting needs to be done .

    I see that we can change the value of wrapper.java.initmemory & wrapper.java.maxmemory in EMService.conf file. Current value set as below:


    # Initial Java Heap Size (in MB)
    wrapper.java.initmemory=4096

    # Maximum Java Heap Size (in MB)
    wrapper.java.maxmemory=10240

    2) Also i have seen one file is there SmartStorTools.bat.
    Edit JAVA_OPTS to increase the heap size if needed. Current value set as below:


    set JAVA_HOME=%INSTALLDIR%\jre
    set JAVA_OPTS=-Xmx512m

    Currently the value set to 512m should we increase or decrease to resolve Java heap space issue

    3) Also there is one another file Introscope_Enterprise_Manager.lax where i have seen option to set the lax java option additional values like Xms & Xmx setting.

     lax.nl.java.option.additional=-Xms8192m -Xmx8192m -Djava.awt.headless=false -Dmail.mime.charset=UTF-8 -Dorg.owasp.esapi.resources=./config/esapi -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./logs/ -Xss512k

    Plz let me know the exact setting needs to be done to all these to resolve java heap space issue if we have 32GB of RAM available in the server .




    ------------------------------
    Shiv Choudhary
    India
    ------------------------------



  • 4.  RE: Enterprise Manager restarted by administrator error in CEM , EM crashing

    Broadcom Employee
    Posted Jan 28, 2022 04:24 AM

    Hi Shin,

    From the case 32979151 logs I see you have not applied Hotfix SP3 yet. That would help because the condition you run into seems to be due to an issue causing the OOM.

    If after you applied SP3 this problem still occurs, please see https://knowledge.broadcom.com/external/article?articleId=140703 - just in case you do not apply the memory changes at the right location.




  • 5.  RE: Enterprise Manager restarted by administrator error in CEM , EM crashing

    Posted Feb 12, 2022 10:30 PM
    Edited by Shiv Choudhary Feb 12, 2022 10:38 PM
    Hi Jorg ,

    I got the hotfix HF 60 from support . So my question is that we have to implement it in
    1) MoM server only 
    2) or also on Agent collector and TIM collector server
    3) or we have to apply this hotfix on TIM server as well ?

    as we have one MoM server , one Agent collector server & one TIM collector server & 3 TIM servers which are reporting to TIM collector server . All servers are Physical server in our environment .

    I also found in the readme.txt , Configuration steps for the Cluster. So, does this step need to be done?

    If yes, when we need to do this before or after installing hotfix. PFA screenshot below. Also let me know is it to be done after applying the hotfix on all 3 servers i.e., including a MoM & an agent collector and TIM collector server .





    ------------------------------
    Shiv Choudhary
    India
    ------------------------------



  • 6.  RE: Enterprise Manager restarted by administrator error in CEM , EM crashing

    Broadcom Employee
    Posted Feb 14, 2022 10:23 AM
    EM hotfixes must be applied to all EMs in a cluster


  • 7.  RE: Enterprise Manager restarted by administrator error in CEM , EM crashing

    Broadcom Employee
    Posted Feb 16, 2022 03:48 AM

    Hi SHiv,
    the Hotfix is to be applied on the EM (Collectors) and MoM only.

    From your explanation, you have one MoM, one EM (for Agents) and one EM for the TIM collection service. You have to apply the Hotfix to all these 3, and also on the machine (if separate) running the PostgreSQL Server.

    You also have 3 TIM's running. These do not need to have the Hotfix applied. If there are any Hotfixes for the TIMs,  it will come in a separate hotfix package.




  • 8.  RE: Enterprise Manager restarted by administrator error in CEM , EM crashing

    Posted Feb 25, 2022 01:00 AM

    Hi All, 

    We have applied hotfix HF 60 to our MoM server & not applied currently to collectors in Non Prod Env.
    getting below error in WebView 1. Error retrieving permissions. Status code: 500

    Enterprise Manager and WebView start up with no errors. But When logging into Enterprise Manager Team Center, seeing the following error in the IntroscopeWebview.log and there isn't any data showing up at all under metric view, agent view, map
    Also in New APM status console , collectors are not reflecting , I suspect it was due to not applying HF60 patch on collectors or due to some other issue . Also one of our Agent collector is on 10.5 & other TIM collector is on 10.7 version so how do we proceed ?




    ------------------------------
    Shiv Choudhary
    India
    ------------------------------



  • 9.  RE: Enterprise Manager restarted by administrator error in CEM , EM crashing

    Broadcom Employee
    Posted Feb 25, 2022 03:41 AM

    Hi Shiv,
    it becomes difficult to read the thread - as we are dealing with consecutive different situations here.

    So - first of all: Please update all MoM and Collectors with HF60 prior starting them.

    The workstation needs to be of the same version as the MoM. Means, you will have to get the correct HF 60 version of the workstation to have a seamless integration with the updated MoM. Please ask support to provide you the Workstation for your current installation.

    Regards

    Jörg




  • 10.  RE: Enterprise Manager restarted by administrator error in CEM , EM crashing

    Broadcom Employee
    Posted Feb 25, 2022 10:11 AM
    You cannot have a 10.5 collector connected to a 10.7 cluster. THIS IS UNSUPPORTED.

    ALL EMS MUST BE AT THE SAME LEVEL.


  • 11.  RE: Enterprise Manager restarted by administrator error in CEM , EM crashing

    Broadcom Employee
    Posted Jan 31, 2022 12:23 AM
    One issue you have right away is your memory configuration in the wrapper.

    The recommendation for the Enterprise Manager is to have the min and max heap size the same to improve performance. We want the EM to have all of the memory it's allocated at startup because expanding and compacting the heap is resource intensive.

    As Joerg mentions, after applying SP3, the GC policy will change to G1GC, which also requires some additional tuning to get the best performance.

    https://knowledge.broadcom.com/external/article/93176/introscope-enterprise-manager-troublesho.html


  • 12.  RE: Enterprise Manager restarted by administrator error in CEM , EM crashing

    Broadcom Employee
    Posted Feb 01, 2022 11:43 AM
    Also,
    When installed on a Windows device and using the Tanuki JSW, the LAX file is ignored and the CONF file is used instead for the JVM.


  • 13.  RE: Enterprise Manager restarted by administrator error in CEM , EM crashing

    Posted Apr 04, 2022 04:04 AM
    Edited by Shiv Choudhary Apr 04, 2022 04:06 AM
    Dear All, 

    We have applied HF 60 as suggested by support but issue still not resolved . if anyone have any idea Plz comment as this is a long pending issue in our Env.
    Also our data folder is getting filled very rapidly , No idea why how we can reduce the size / control the data generation so that it will not create problem later, PFA screenshot for reference .



    ------------------------------
    Shiv Choudhary
    India
    ------------------------------



  • 14.  RE: Enterprise Manager restarted by administrator error in CEM , EM crashing

    Broadcom Employee
    Posted Apr 04, 2022 04:43 AM
    Do you have an idea on which element is filling up the disk?
    Database filling up? or traces filling up?


  • 15.  RE: Enterprise Manager restarted by administrator error in CEM , EM crashing

    Posted May 03, 2022 12:30 AM
    Edited by Shiv Choudhary May 03, 2022 06:37 AM

    We have troubleshooted it with Broadcom support team & found that ts_defect_meta_values tables creating huge space in the Postgre database ,finally support team suggested the below steps & it worked .

    1) Drop tables
    DROP TABLE ts_defect_meta_values_20130405 -- Drop these first (sample date)
    Then drop tables for the same exact date
    DROP TABLE ts_defects_20130405;
    ;
    DROP TABLE ts_tran_comp_details_20130405

    2) Vacuum Appmap tables
    vacuum full analyze appmap_edges;
    vacuum full analyze appmap_vertices;
    vacuum full analyze appmap_attribs;

    We performed above steps & able to free up the space of 285GB from the drive . Also we ran full vacuum & able to clear another 40GB of space.

    EM crash issue also resolved after removing the file com.wily.apm.webservices_10.7.0.jar from the <EM Home>/product/enterprisemanager/plugins folder as suggested by Broadcom support .

    15GB heap occupied by one jetty thread serving a Web Service call.

     This is a Web Service SOAP call to MetricsDataService.getMetricData(String agentRegex, String metricRegex, ...) with huge regex parameters.

    - agentRegex is a 7k long String:

    dpdcpladash2\\|WebSphere\\|pliprodCell/dbSrv02|dpdcpladash2\\|WebSphere\\|pliprodCell/dbSrv04|dpdcpladash2\\|WebSphere\\|pliprodCell/dbSrv06|dpdcpladash2\\|WebSphere\\|pliprodCell/dbSrv08|BQAPBIZWMV01\\|\\.NET Process\\|PerfMonCollectorAgent\\.exe|DPDCPIACSTPRT1\\|WebSphere\\|SASDmgr03Cell/SASServer1|dpdcpiadbrd2\\|WebSphere\\|fasDmgrCell/fasSrv02|...

    - metricRegex is a 40k long String:

    JSP\\|_newAgentDetails:Average Response Time\\(ms\\)|Backends\\|vpaspr dpdcplvpasdb1-1641\\(Oracle DB\\)\\|Commits:Responses Per Interval|Backends\\|nbfpr dpdcplnbfdb1-1661\\(Oracle DB\\)\\|SQL\\|Prepared\\|Query\\|SELECT ROLEMENUXR0_\\.MENU_ID AS COL_0_0_, MENUDO1_\\.MENU_NAME AS COL_1_0_, ROLEMENUXR0_\\.EFFECTIVE_DATE AS COL_2_0_, MENUDO1_\\.SEQUENCE_NUMBER AS COL_3_0_ FROM QACONFIGADMIN\\.ROLE_MENU_XREF ROLEMENUXR0_, QACONFIGADMIN\\.MENU MENUDO1_ WHERE ROLEMENUXR0_\\.USER_ROLE_ID = \\? AND ROLEMENUXR0_\\.MENU_ID = MENUDO1_\\.MENU_ID AND ROLEMENUXR0_\\.VIEW_IND = \\?:Concurrent Invocations|Servlets\\|_InboxDataEntryLayout2:Concurrent Invocations|JSP\\|_ConversionDataEntry:Average Response Time\\(ms\\)|Servlets\\|_TicketPopup:Errors Per Interval|JSP\\|_revivalProcessData_2D_QualityChecker:Average Response Time\\(ms\\)|Variance\\|Servlets\\|_DeathClaimProcessDataApproval:Average Response Time\\(ms\\) Deviation|JSP\\|_selectRO:Responses Per Interval|Servlets\\|_SurvivalClaimProcessData:Errors Per Interval|Variance\\|Servlets\\|_DuplicatePolicyBondApprover:Average Response Time\\(ms\\) Prediction|JSP\\|...






    ------------------------------
    Shiv Choudhary
    India
    ------------------------------