Clarity Service Management

Expand all | Collapse all

pdm_trace: tracing interruption

  • 1.  pdm_trace: tracing interruption

    Posted 02-08-2018 07:27 AM

    Hello community!

     

    I have a question regarding pdm_trace functionality, in my env pdm_mail_nxd crashes frequently so I wanted to get root cause of the problem. For that reason I used pdm_trace utility but without luck.

    Mail Daemon produces 2 types of crashes:

    - dying without any error;

    - throwing allocator SERVE_ERROR and die;

    I tried to trace error by allocator.c file but every daemon crash interrupts tracing.

    Is there any way to keep object watching even if it fails and restarts? I think about PS script that will run pdm_trace pdm_mail_nxd WATCH depending on pdm_trace pdm_mail_nxd STATUS but still hope there is more correct way.

     

    PS: this env still is running 12.7CP2 and can't move forward.

     

    Regards,

    Timur Alimov



  • 2.  Re: pdm_trace: tracing interruption

    Posted 02-08-2018 08:13 AM

    Hi Timur,

    The type of crash that occurs with the allocator error is fixed starting with a patch in 12.9 and is also patched in 14.1.  There is no patch for 12.7 for the problem.   So unfortunately without an upgrade I dont think there is much that can be done for that part.  Here is the article regarding the known issue: PDM_MAIL_NXD AND SPELSRVR PROCESS HANG WITH ALLOCATOR ERROR 

    Your idea of having a script monitor the log for that allocator message and then recycle the pdm_mail process would be a good idea. I am not sure how to do that myself, but there maybe someone out here on the community who has something like this that they are willing to share with you.

    Can we also ask you what is preventing you from upgrading this 12.7 system to a supported version of the product?

    Let us know,

    Thanks,

    Jon I.



  • 3.  Re: pdm_trace: tracing interruption

    Posted 02-08-2018 08:59 AM

    Thank you for your input, Jon!
    I'll publish an idea about pdm_trace tool enhancement to keep tracking restarted daemons.


    I tried to initiate product update procedure many times but they were rejected without any clarifications.


    Regards.



  • 4.  Re: pdm_trace: tracing interruption

    Posted 02-09-2018 03:29 AM

    Hi Timur,

     

    You can use bop_logging instead of pdm_trace in this scenario. 

     

    bop_logging command collects the traces till the process gets crash.

    If the process crash occurs, pdm_trace may not collect any traces.

    Example command: 

    bop_logging pdm_mail_nxd -f c:\mail_nxd.log -n 10 -m 20000000 ON

     

    Thanks & Regards,

    Hema.



  • 5.  Re: pdm_trace: tracing interruption

    Posted 02-09-2018 10:40 AM

    Thank you!

    I'll try and let you know.

     

    Regards,

    Timur Alimov



  • 6.  Re: pdm_trace: tracing interruption

    Posted 02-14-2018 04:20 AM

    Hi Timur,

    Although the subject of this thread  is related to "pdm_trace", you also mentioned: "I tried to initiate product update procedure many times but they were rejected without any clarifications.".  What sort of errors are you getting during the upgrade? Are you already following up on that, maybe via a support ticket?

    Regards,

    Karen



  • 7.  Re: pdm_trace: tracing interruption

    Posted 02-14-2018 05:15 AM

    Hi Karen,

    problems are on organizational level not on software...

    Regards,

    Timur Alimov



  • 8.  Re: pdm_trace: tracing interruption

    Posted 02-12-2018 02:04 PM

    Timur........

     

    Did the information provided by Hema address your concern?

     

    If so, please mark her answer as correct so that this thread can be closed.



  • 9.  Re: pdm_trace: tracing interruption

    Posted 02-13-2018 06:06 AM

    Hi Paul,

     

    I tried simhe02's suggestion but it didn't work for me too 

    Maybe I missed something, I ran it as:

    pdm_logstat -n pdm_mail_nxd 200
    bop_logging pdm_mail_nxd -f c:\mail_nxd.log -n 10 -m 20000000 ON

    mail_nxd.log created, confirmed some traces and then stopped updating on next service fail without any additional info.

    I think I should run logstat on Verbose (250) level but afraid it can overwhelm prodcutive env and overwrite all stdlogs.

     

    Regards,

    Timur Alimov



  • 10.  Re: pdm_trace: tracing interruption

    Posted 02-13-2018 09:31 AM

    Hey Timur,

     

    pdm_logfile -b 100000000 

     

    That would set it the stdlog file size to 100MB (the above command takes a value in bytes).  While it might be too much, it might achieve what you are looking for.

     

    I'm afraid troubleshooting the 12.7 release might turn out to be rough. 

     

    What is the exact error you see in stdlog that you are trying to trace, I can provide you a pdm_trace watch command for that, which might be a bit more helpful than this verbose approach?

     

    _R



  • 11.  Re: pdm_trace: tracing interruption

    Posted 02-13-2018 10:48 AM

    Hi, yeah, we already have 128mb stdlogs.

     

    check this out:

    1. appears without any other messages (even MILESTONES):

    pdm_d_mgr           10008 ERROR        daemon_obj.c          1924 Daemon pdm_mail_nxd died: restarting

    2. as Jon said it's knows issue and was fixed in 12.9+:

    pdm_mail_nxd         3904 SEVERE_ERROR allocator.c            527 Allocator String32 free chain pointer 00C87470 references a corrupted allocator

     

    Regards,

    Timur Alimov



  • 12.  Re: pdm_trace: tracing interruption

    Posted 02-13-2018 12:44 PM

    Normally such 

     

    REM Trace_ON

    pdm_trace pdm_mail_nxd ON

    pdm_trace pdm_mail_nxd attach pdm_text_cmd
    pdm_trace pdm_mail_nxd attach domsrvr

    pdm_trace pdm_mail_nxd attach bpvirtdb_srvr

    pdm_trace pdm_mail_nxd watch allocator.c 527 5

     

    The above should watch for line# 527 error that you are seeing in allocator.c program, upto 5 times. So if the error happens more than that, the trace should not be impacting you.

     

    REM Trace_OFF

    pdm_trace pdm_mail_nxd OFF

    pdm_trace pdm_mail_nxd detach pdm_text_cmd
    pdm_trace pdm_mail_nxd detach domsrvr

    pdm_trace pdm_mail_nxd detach bpvirtdb_srvr

    pdm_trace pdm_mail_nxd unwatch allocator.c 527 5

    ## dont think the last unwatch makes a diff.

     

     

    Try it out in Dev/Test and then in prod.

     

    It should create NX_ROOT/log/pdm_trace**   files 

     

    Did you already raise a support case for this by any chance? Feels like we're hitting a low level memory issue that we're not able to be handle, that's why the error.   What is the time stamp of the pdm_mail_nxd.exe program that you have  (SDM history file should show that too)

     


    _R

     

     



  • 13.  Re: pdm_trace: tracing interruption

    Posted 02-14-2018 05:17 AM

    Thank you for your input! I'll try it in our env next weekend.

    Could you provide a littile more information about ATTACH option and how works?

     

    Regards,

    Timur Alimov



  • 14.  Re: pdm_trace: tracing interruption

    Posted 02-14-2018 09:53 AM

    From a high level,  you're starting the trace on the daemon you want to trace.  The attach lets us get trace information from attached daemons in context too, which gives us a better idea on what those daemons were doing around the same time.

     

    The watch is like a trigger,  it starts the traces only when it hits the condition(s) being watched for.  So in this case, we're waiting on allocator.c  line#527   to trigger on pdm_mail_nxd program.  Hope this helps

     

    pdm_mail_nxd         3904 SEVERE_ERROR allocator.c            527 

     

     

    _R