DX Unified Infrastructure Management

  • 1.  Data Engine restart issue

    Posted Aug 05, 2011 06:55 PM

    I have an issue in which the only way that I can assure my data_engine will successfully connect and perform QoS inserts into the database is to restart the Nimbus services. A restart of the probe itself does not always work. When attempting to perform maintenance tasks that would cause the data_engine to perform what I will call a soft restart (change log level, delete some QoS definitions, etc) the probe will attempt to connect to that database -> timeout after 30 seconds -> attempt to connect -> repeat. 

     

    I believe I am experiencing a timeout issue when the data_engine attempts load and/or query the s_qos_data table. Due to the shear number of QoS definitions I have in this table (~800000) it takes about 20 secs for this query to complete from SQL management studio. I have to believe that there is some key that I could place in the raw_config that would increase this timeout. Something similar to command_timeout (which I have tried in this case)  Restarting the service is the only way that I can get the data_engine to reconnect.  

     

    I am running data_engine 7.56.  I believe that there has to be setting that would allow me to increase this timeout. Any one have any suggestions? I have checked the raw_config and do not have any values set to 30 seconds and command_timeout does not have an effect on this behavior. I have already trimmed rows in the s_qos_table.

     

    Thanks in advance



  • 2.  Re: Data Engine restart issue

    Posted Aug 05, 2011 10:49 PM

    Have you tried deactivating the Group Server and Report Engine, then restart Data Engine and give it about 10 mins or so?



  • 3.  Re: Data Engine restart issue

    Posted Aug 05, 2011 10:56 PM

    I tried deactivating all probes that have a database/data_engine dependency. I did not wait 10 mins however. Are you basing that duration on some setting/timeout value I could go look at?



  • 4.  Re: Data Engine restart issue

    Posted Aug 06, 2011 12:48 AM

    Are you willing to share a copy of data_engine.log when the probe is restarting? That would help make it clear exactly where in the process the data_engine fails.

     

    I assume the data_engine comes up fine sometimes, right? Does this problem typically happen at certain times of the day?

     

    -Keith



  • 5.  Re: Data Engine restart issue

    Posted Aug 06, 2011 02:56 AM

    Keith, 

     

    Can you access support cases? - 00055870 if not I can send you the log directly. 

     

    Here is the a snipet - I did not want to share the whole detail on the forums. 

     

    Aug 2 12:16:46:837 [1452] de: NimBUS Data Engine 7.56 [Jul 6 2010]

    Aug  2 12:16:14:064 [7520] de: [QoSData] has connected to database
    Aug  2 12:16:14:066 [7520] de: SLM version 4.61
    Aug  2 12:16:15:298 [6652] de: [Timezone] has disconnected from database
    Aug  2 12:16:37:318 [6652] de: get_stats - from 10.129.177.147/54147
    Aug  2 12:16:44:573 [7520] de: [QoSData] ExecuteNoRecords - 1 errors
    Aug  2 12:16:44:573 [7520] de: (1) ExecuteNoRecords [Microsoft OLE DB Provider for SQL Server] Query timeout expired
    Aug  2 12:16:44:573 [7520] de: COM Error [0x80040e31] IDispatch error #3121 - [Microsoft OLE DB Provider for SQL Server] Query timeout expired
    Aug  2 12:16:44:573 [7520] de: qos_check - InitializeOnce failed ...
    Aug  2 12:16:45:067 [2924] de: [LSV] has disconnected from database
    Aug  2 12:16:45:067 [2924] de: LastSamplevalue - exit worker thread
    Aug  2 12:16:45:073 [7520] de: [QoSData] has disconnected from database
    Aug  2 12:16:45:073 [7520] de: qos_data_thread - exit thread
    Aug  2 12:16:45:319 [6652] de: qos_data_thread - stop received ...
    Aug  2 12:16:45:319 [6652] de: Before WaitForSingleObject
    Aug  2 12:16:45:319 [6652] de: Before destroy slm obj
    Aug  2 12:16:45:319 [6652] de: Before destroying qos report object
    Aug  2 12:16:45:319 [6652] de: [main] has disconnected from database
    Aug  2 12:16:45:320 [6652] de: Before db lib close
    Aug  2 12:16:45:320 [6652] de: Before nimUnRegisterProbe
    Aug  2 12:16:45:321 [6652] de: Before nimSessionFreeList
    Aug  2 12:16:45:321 [6652] de: Before CleanUp
    Aug  2 12:16:45:321 [6652] de: ######################## STOP #########################
    Aug  2 12:16:46:830 [1452] de: Reading Configuration: data_engine.cfg
    Aug  2 12:16:46:834 [1452] de: Setting locale=English (English_United States.1252)
    Aug  2 12:16:46:837 [1452] de: ReadIntermediateServerCertificate - using C:\Program Files (x86)\Nimsoft/robot/robot.pem
    Aug  2 12:16:46:837 [1452] de: sockServer:00C552D8: 
    Aug  2 12:16:46:837 [1452] de: ######################## START ######################## 


  • 6.  Re: Data Engine restart issue

    Posted Aug 06, 2011 02:58 AM

    And this only happens if I manually restart the data_engine or like to change the log file setting or delete QoS definition detail. It does not happen randomly. Again - I just think I need to increase a timeout value. 



  • 7.  Re: Data Engine restart issue
    Best Answer

    Posted Aug 08, 2011 11:22 PM

    So I expanded on the previous post and disabled all the probes with a data_engine as a dependancy. I was not able to re-create my issue. So I began to enable one probe at time and test. I appears in my enviroment if I disable the QoS_Engine probe the issue goes away -> the data_engine connects. Once connected I enable the QoS_engine. 

     

    Thanks for all the advice