We are running UIM 9.0.2 and are monitoring several external systems.The sql_response-probe is causing us some problems because it will not clear alarms. An example is a profile that runs a simple query that counts the number of rows in a table with a given status.Under "No Record Returned" the suppression key is set to value, and the severity to clear. The alarm is set in the Value-tab. Comparison = numeric and operator is >=. The value in high threshold is set to 1, severity = minor.The value in low threshold is empty and severity = inactive.Clear is of course set with severity = clear.
The alarm is triggered correctly, but when the query returns no records, the alarm is not cleared. Restarting the probe clears the alarm. I've been racking my brain trying to figure out what the problem is, since this happens on all our monitored systems.
Does anyone have any idea what could be causing this?
What is the version of the probe ?
test it out with robot_update 7.97.
Do you have the option to set the Suppression key to count instead of value?
From the sql_response Help doc:
Note: In the Query Response time and Row count sections, you must specify a low threshold Value when the following features are enabled:
- Alarm and QoS in the Response time and Row count tabs- The No Record Returned section in the <Query Name> node
***Specify 0 as the low threshold Value in these sections if the Suppression key is set as count in the No Record Returned section in the <Query Name> node. If you do NOT specify any value, the probe CLEARS any alarm generated for no records.
You can run a bogus query against a CA UIM table to simulate the no records returned result such as:
select * from CM_DEVICE where dev_name = 'me'
Test it and you should get no rows/nothing returned. If that is not the case alter the query.
For the row count I left the Low Threshold field empty.
Then every time the query ran, the alarm was generated but then cleared as expected as you can tell from the nas transaction history view.
Can you include the query and configuration?
You state "An example is a profile that runs a simple query that counts the number of rows in a table with a given status." but it sounds like you are talking about the "rows returned" portion of the configuration.
A query that counts rows should always be returning a value. And from the descritpion of your question you should be using only the value tab settings.
The sql_response probe value alarm will only be cleared if the probe has received a valid value as per the specified range, and in case there exist no rows in the table, then in that scenario, as well the required condition not being met, as a result - the value alarm doesn’t get cleared.
A value alarm will only be cleared when the sql_response probe receives a value in the specified range - this behavior is expected and as per the probe's current design. So, in effect, they are treated as 2 separate alarms. One for the no rows returned and one for the value compared against the threshold.
Right - but "select count(*) from table where status = x", a simple query that counts the number of rows in a table with a given status, will always return a result. So why mess with the rows returned section of the configuration?
If all you care about is the number of matching rows, do the count in the database engine and return that count instead of returning all matching rows and coutning them in the probe.
Thanks for the input guys!Had a bit of a noob moment when I realized that it was indeed the construction of the queries that was the problem.Doing a simple count worked fine, but in some cases we need to extract more info so that we can use columns as variables in the alarm message. And some of those queries would return an empty result and thus the alarms were not cleared.So we just need to construct better queries