CA Service Management

 View Only
Expand all | Collapse all

SDM problems when restarting a database cluster node

  • 1.  SDM problems when restarting a database cluster node

    Posted Jul 10, 2018 09:01 AM

    Hi all.

    I'm using SDM 14.1 with CUM3 running over Windows Server 2012 R2 and accessing Oracle 12cRAC.

    Oracle 12cRAC works within a cluster of servers.

    We have noticed each time an Oracle cluster node is reseted, SDM stop to work and we need to restart the Windows service.


    At SDM log file we found entries like this:

    SIGNIFICANT
    orcl_agent.c
    SHUTDOWN of orcl_agent:mdbadmin:bpvirtdb_srvr

     

    We think SDM may continue working because other Oracle cluster nodes are still working.

    How can I avoid the need of SDM restart? Do you know some workaround or setting? Is this an expected behaviour and there is nothing to do?

     

    All comment is welcome. Thanks in advance.

     

    Regards,

    Fabio.



  • 2.  Re: SDM problems when restarting a database cluster node

    Broadcom Employee
    Posted Jul 10, 2018 10:29 AM

    Fabio, what happens for OTHER client connection to the cluster when it is reset? For example, from SDM server, manually run a sql plus connection using the same parameters(cluster Oracle name, same user credential). If this other

    client connection lost its connection when the Oracle cluster is reset, then you should work with your DBA to see why and that should not be a SDM issue but rather Oracle cluster configuration. By the way, as a "workaround", you can try to run "pdm_d_refresh" instead of recycle SDM. Thanks _Chi



  • 3.  Re: SDM problems when restarting a database cluster node

    Posted Jul 10, 2018 11:52 AM

    Thank you Chi. DBA say sother applications stay working after a cluster node reset. I think SDM store or use some kind of caching of database connections and this produces the problem.

    The next time a new reset ocurrs, I will use "pdm_status" and "pdm_d_refresh" as you suggest.



  • 4.  Re: SDM problems when restarting a database cluster node

    Posted Jul 10, 2018 09:40 PM

    We have a similar setup with sql server always on. We have noticed that when you switch between database nodes SDM may lose connection if it takes longer than 45 seconds to move nodes. After that point SDM will stop trying to reconnect and pdm_d_refresh is necessary.



  • 5.  Re: SDM problems when restarting a database cluster node

    Posted Jul 10, 2018 11:02 PM

    Thank you Grant. Your environment looks similar to mine. As an addition, we have other CA products like PAM, Service Catalog and USS running in the same Oracle database (PDB) with different schemas. This behaviour has only been seen with SDM. The other CA products stay running without problems after the node restart; the same with other non CA applications.



  • 6.  Re: SDM problems when restarting a database cluster node

    Posted Jul 11, 2018 12:12 AM

    Yep we have seen the same. Service catalog and Pam can always recover. SDM recovers most of the time.  



  • 7.  Re: SDM problems when restarting a database cluster node

    Broadcom Employee
    Posted Jul 11, 2018 04:30 AM

    Hello Fabio,

    please remind, C Service Desk Manager is cluster tolerant but not cluster aware.

    So this implies, when a failover happen in the deb server you must restart the application (to keep it simple) to stop sdm sql_connections and processes.

     

    _r Ferdinand



  • 8.  Re: SDM problems when restarting a database cluster node

    Posted Jul 11, 2018 01:24 PM

    OK. In an AA environment, do you think running the pdm_d_refresh command on background server is sufficient to restore the operation?



  • 9.  Re: SDM problems when restarting a database cluster node

    Posted Jul 11, 2018 01:36 PM

    You would need to run that command on each server. In AA servers have their own connections to the database.



  • 10.  Re: SDM problems when restarting a database cluster node

    Broadcom Employee
    Posted Jul 13, 2018 10:20 AM

    Fabio,

     

    SDM architecture basically involves in a whole bunch of native client based SQL connections we have. Not all of them are active (though all of them are connected to the database).

     

    So take an example here.  you're trying to look at a list of tickets, that might have forced SDM to use DBAgent#1 to run a query against the database. 

     

    When there is a loss of DB connectivity at this time, our use of db client API's will make DBAgent#1 recognize the loss of connection and then attempt to reconnect immediately. So, your query is re-issued again.

     

    However, DBagent#2, 3, 4... they maybe in idle/snooze as there's not much activity on your system.  Another SDM user tries to do another query, which lets say is sent to DBAgent#2.  Now, DBAgent#2 at that particular time will detect loss of connection to the db and retry it again.

     

    So, yes, we do have enough support to detect loss of connection and retry the queries. But this might happen over a period of time depending on how the agents are being used.

     

    Hope this gives you an idea.


    The same applies both to SQL and Oracle too

     

    _R



  • 11.  Re: SDM problems when restarting a database cluster node

    Posted Jul 14, 2018 01:37 PM

    I understand DBAgents will try to reconnect for both cases: agent active or agent idle. But what determinates the retry time depends on agent usages. So, increase the agents number can help to reduce the probability of database connection problems?



  • 12.  Re: SDM problems when restarting a database cluster node

    Broadcom Employee
    Posted Jul 16, 2018 09:48 AM

    In a way, yes. If you have 50 agents, now 50 agents have to through the retry connection.

     

    _R