I'm using SDM 14.1 with CUM3 running over Windows Server 2012 R2 and accessing Oracle 12cRAC.
Oracle 12cRAC works within a cluster of servers.
We have noticed each time an Oracle cluster node is reseted, SDM stop to work and we need to restart the Windows service.
At SDM log file we found entries like this:
SIGNIFICANT orcl_agent.c SHUTDOWN of orcl_agent:mdbadmin:bpvirtdb_srvr
We think SDM may continue working because other Oracle cluster nodes are still working.
How can I avoid the need of SDM restart? Do you know some workaround or setting? Is this an expected behaviour and there is nothing to do?
All comment is welcome. Thanks in advance.
Fabio, what happens for OTHER client connection to the cluster when it is reset? For example, from SDM server, manually run a sql plus connection using the same parameters(cluster Oracle name, same user credential). If this other
client connection lost its connection when the Oracle cluster is reset, then you should work with your DBA to see why and that should not be a SDM issue but rather Oracle cluster configuration. By the way, as a "workaround", you can try to run "pdm_d_refresh" instead of recycle SDM. Thanks _Chi
Thank you Chi. DBA say sother applications stay working after a cluster node reset. I think SDM store or use some kind of caching of database connections and this produces the problem.
The next time a new reset ocurrs, I will use "pdm_status" and "pdm_d_refresh" as you suggest.
We have a similar setup with sql server always on. We have noticed that when you switch between database nodes SDM may lose connection if it takes longer than 45 seconds to move nodes. After that point SDM will stop trying to reconnect and pdm_d_refresh is necessary.
Thank you Grant. Your environment looks similar to mine. As an addition, we have other CA products like PAM, Service Catalog and USS running in the same Oracle database (PDB) with different schemas. This behaviour has only been seen with SDM. The other CA products stay running without problems after the node restart; the same with other non CA applications.
Yep we have seen the same. Service catalog and Pam can always recover. SDM recovers most of the time.
please remind, C Service Desk Manager is cluster tolerant but not cluster aware.
So this implies, when a failover happen in the deb server you must restart the application (to keep it simple) to stop sdm sql_connections and processes.
OK. In an AA environment, do you think running the pdm_d_refresh command on background server is sufficient to restore the operation?
You would need to run that command on each server. In AA servers have their own connections to the database.
SDM architecture basically involves in a whole bunch of native client based SQL connections we have. Not all of them are active (though all of them are connected to the database).
So take an example here. you're trying to look at a list of tickets, that might have forced SDM to use DBAgent#1 to run a query against the database.
When there is a loss of DB connectivity at this time, our use of db client API's will make DBAgent#1 recognize the loss of connection and then attempt to reconnect immediately. So, your query is re-issued again.
However, DBagent#2, 3, 4... they maybe in idle/snooze as there's not much activity on your system. Another SDM user tries to do another query, which lets say is sent to DBAgent#2. Now, DBAgent#2 at that particular time will detect loss of connection to the db and retry it again.
So, yes, we do have enough support to detect loss of connection and retry the queries. But this might happen over a period of time depending on how the agents are being used.
Hope this gives you an idea.
The same applies both to SQL and Oracle too
I understand DBAgents will try to reconnect for both cases: agent active or agent idle. But what determinates the retry time depends on agent usages. So, increase the agents number can help to reduce the probability of database connection problems?
In a way, yes. If you have 50 agents, now 50 agents have to through the retry connection.