DX Unified Infrastructure Management

 View Only
  • 1.  SQL 2012 always on best way to monitor

    Posted Jan 22, 2015 11:40 AM

    Hi,

    i have a sql 2012 always on cluster formed by 4 nodes. What's the best way to monitor using sql probe?

    Should I monitor also the listener? Should I use cluster probe?

    thank you for help.



  • 2.  Re: SQL 2012 always on best way to monitor

    Posted Jan 22, 2015 01:07 PM

    Hi,

     

    Cluster doesn't make sense for Always On group. You could use it to monitor the WSFC cluster, but not the SQL services etc. I don't use it in my case as it usually seems to just make a mess of things.

     

    What I've done is I created some SQL queries and ask them through the AlwaysOn listener IP using sql_response.

     

    edit: my AG is in synchronous commit mode

     

    In addition I use sqlserver, but that will depend on how your AlwaysOn group is configured. I have a two node AG where the other node is "passive", meaning you can't even do reads to it. Therefore I am using sqlserver through the listener IP as well. Since you have 4 servers I assume you allow at least read access to some of them: those I'd recommend you monitor through that instance's IP. This especially if they have DB's that do not participate in the AlwaysOn group.

     

    Here are my specifications for sql_response. Some of these might overlap somewhat.

     

    • select serverproperty('HadrManagerStatus') (not in use currently)
      • major if != 1
    • select quorum_state from sys.dm_hadr_cluster
      • major if quorum_state != 1
    • select member_state from sys.dm_hadr_cluster_members
      • major if member_state !~ /UP/
    • select primary_replica, synchronization_health_desc from sys.dm_hadr_availability_group_states
      • major if synchronization_health_desc  !~ /HEALTHY/
    • select role_desc, connected_state_desc from sys.dm_hadr_availability_replica_states
      • major if connected_state_desc !~ /CONNECTED/
    • select role_desc, recovery_health_desc from sys.dm_hadr_availability_replica_states where is_local = 1
      • major if recovery_health_desc !~ /ONLINE/
    • select role_desc, synchronization_health_desc from sys.dm_hadr_availability_replica_states
      • major if synchronization_health_desc !~ /HEALTHY/
    • select * from sys.dm_hadr_auto_page_repair where page_status > dateadd(MINUTE, -7, getdate())
      • minor if count(*) > 0
    • select replica_id, synchronization_state_desc from sys.dm_hadr_database_replica_states
      • major if synchronization_state_desc =~ /NOT\sSYNCHRONIZING/
      • information if synchronization_state_desc =~ /SYNCHRONIZING|REVERTING|INITIALIZING/
    • select CAST(listener_id AS varchar(2)) + ':' + CAST(port as varchar(5)) as 'supp', state_desc from sys.dm_tcp_listener_states
      • major if state_desc !~ /ONLINE/

     

    -jon



  • 3.  Re: SQL 2012 always on best way to monitor

    Posted Feb 03, 2016 06:11 AM

    You know you can create "custom checkpoints" in sqlserver probe, using these queries, right? You have to use the "classical" IM GUI though.

    The handling is maybe little more complicated then sql_response, but you have it all in one place...



  • 4.  Re: SQL 2012 always on best way to monitor

    Posted Feb 03, 2016 11:06 AM

    I do but honestly I don't have much love for the sqlserver probe at the moment. So many problems with it especially in bigger environments, for me.



  • 5.  Re: SQL 2012 always on best way to monitor

    Posted Feb 03, 2016 11:19 AM

    Hmm, that hurts me to hear, being the original developer of the probe. Can you elaborate what kind of problems? Maybe I can offer some tips. But generally it should perform better then sql_response, as it is using native SQL client libraries and is working multi-threaded instead of sequentially (one thread per profile).



  • 6.  Re: SQL 2012 always on best way to monitor

    Posted Feb 03, 2016 12:46 PM

    I believe the biggest thing for me has been that it didn't alarm on some critical jobs that had failed. I tested with the query it uses to get them directly, which displayed them alright. Still, when the jobs failed I didn't see any alarms.

     

    I have a vague recollection that I had some serious issues with one of the more recent releases with regards to stability or the probe starting at all, so I had to do a rollback at least in some environments.

     

    Other than that, If I remember correctly, I've been seeing a lot of profile delayed alarms and don't remember seeing much options on configuring the threading.

     

    Also I've found it very hard to figure out how to reasonably configure backup monitoring with backup age, since manually creating maintenance schedules to the probe for each instance can be a bit difficult. It would be nice if that was made easier. You can always monitor jobs / external agents, but that's often pretty tricky as well, especially in an environment with many different operators.

     

    -jon



  • 7.  Re: SQL 2012 always on best way to monitor

    Posted Feb 04, 2016 05:00 AM

    Hi Jon, I cannot help with the stability or bugs, but I will try to point this to someone who can.

    Regarding threading, the only thing you can do is split profiles, if they run too long into multiple ones - one profile = one thread. As the "source" is taken from the "connection", you can have several profiles monitoring the same instance without a problem and achieve parallelism this way.

    I am not sure if I understand your issue with "maintenance schedule" correctly, but I will try. There are two call-backs in the probe "suspend_profile" and "resume_profile", you can use them to stop and restart monitoring by profile name, without influencing the other profiles. You can e.g. use the pu.exe utility in a backup script to achieve that:

     

    pu -u <uid) -p <password> /domain/hub/robot/sqlserver suspend_profile <profile name>

    your backup script...

    pu -u <uid) -p <password> /domain/hub/robot/sqlserver resume_profile <profile name>

     

    Or you can stop the whole probe:

     

    pu -u <uid) -p <password> /domain/hub/robot/controller probe_deactivate sqlserver 0 1

    pu -u <uid) -p <password> /domain/hub/robot/controller probe_activate sqlserver

     

    Always use "probe_deactivate", if you use "probe_stop" the controller will immediately restart the probe, which is not what you want here.

     

    I hope I could help at least a little.

     

    -richard



  • 8.  Re: SQL 2012 always on best way to monitor

    Posted Feb 04, 2016 10:49 AM

    What I meant with the maintenance schedule is the maintenance jobs in the sql server. Say, for example, that for one instance you run backup every midnight mon-thu. Then you might set backup age monitoring to 32h, which would then alarm during the weekend. Then in the same instance you might have a database for which backup runs every 2h through the weekend. I know I can solve this with multiple profiles, but this is a very time consuming and kind of complex effort prone to errors. Also those schedules are subject to change. I know achieving automatic "maint plan sniffing" and adjust backup monitoring for that is not a simple thing, but if I were to have one feature (or even specific probe) request for SQL, that would probably be it

     

    -jon



  • 9.  Re: SQL 2012 always on best way to monitor

    Posted Feb 23, 2016 10:47 AM

    Hello,

     

        We have the same problem with trying to monitor AlwaysOn environments for clients. We need the sqlserver probe to have default check points for monitoring AlwaysOn. At a minimum there needs to be a check point for detecting when a failover happens. We have created a custom check point to detect this but after upgrading from probe version 4.91 to 4.94 it no longer works.

     

    Thanks,

     

    Joe H.