I am using the SSMRETRY rule in Stateman and have had several instances where tasks have reached their retry limit. I looked at the rule and found that when the action type changes the counter resets. What if you have a task which goes down and Stateman tries to restart it but it keeps crashing. Eventually it hits the retry limit. At this point aside from editing the task via 4.11.2 or the column in the table, how do you reset the counter for the retry limit?
Here are the basics of my scenario:
TASKABC is running.
User issues command: C TASKABC (not correct for them to do so but that is another topic)
TASKABC comes down, State of TASKABC becomes TERM_UP
Stateman takes action, Creates SSMRETRY, issues the S TASKABC command and waits.
Task starts but fails in 4 secs.
Stateman tries again
Task again starts but fails in 3 secs.
Stateman tries again
Task again starts but fails in 3 secs.
TASKABC has now been started 3 times, the next thing Stateman does is issue a message that the task has reached its retry limit and will not be restarted. This is the "Something is wrong with this task. Fix it and try again." message.
Ok so I fix whatever is wrong, and go to start the task again. Same retry message.
As far as I can tell there is nothing that resets the counter even after the time limit has expired on SSMRETRY. When I look at the STCTBL table, the column with the counter is always >1 for all tasks which use the SSMRETRY logic (I think all of mine do). How would you handle this scenario?