Workload Automation1

Expand all | Collapse all

Agent disk space monitoring?

  • 1.  Agent disk space monitoring?

    Posted 09-07-2018 10:39 AM

    We have our agents configured to monitor disk space per this info:  Configure the Agent to Monitor Available Disk Space - CA Workload Automation System Agent - 11.3 - CA Technologies Docum… 

     

    We have had an agent stop due to critical disk space, less than 20M.  However we are not seeing the System Admin alerts from our internal monitoring indicating a space issue.  Space for that volume still showed 30GB available.  The volume where the Agent is installed is not shared with /tmp, /home, etc.

     

    I suspected that a file was placed temporarily there and then removed.  I have not been able to find any job running just prior that has moved a file larger than 300MB.

     

    How is the agent calculating space and is it looking at space only on the volume it is installed?

    What about mounted volumes also using the /opt/... directory?  Could they effect the calculation?

    Has anyone else seen this behavior?

     

    We are running 11.3.6 SP5 (I know - we need to do some updates) on RHEL7.  Agents are: 11.3, Service Pack 6, Maintenance Level 0, Build 946

     

    Thanks.



  • 2.  Re: Agent disk space monitoring?

    Posted 09-07-2018 01:58 PM

    you need to turn on the mibs and alert send then tell your SNMP guys about the alerts.

    However instead of it doing the checking most firms have separate monitors. you should just have that team monitor your machines.

    I believe in using the entire infrastructure not just autosys centric. 



  • 3.  Re: Agent disk space monitoring?

    Posted 09-07-2018 03:00 PM

    Steve,

    We don't use SNMP alerting any longer.  We use a combination of Splunk, Nagios, and other tools for monitoring/alerting.

     

    My question is; what is the process that the agent uses to determine a shut down.  How does autosys or the agent determine the current free space.  Is that free space only on the root volume or the volume the CA agent is installed.  Is it querying the OS using a du or other background query.

     

    We don't see anything else other than the autosys logs reporting this critical low disk space issue on this server.  We have the default agent setting and the shut down is triggered when free space falls below 20-10MB. 

     

    # Health Disk Resource Monitoring parameters
    #
    agent.resourcemon.threshold.disk.warning.notice=30M
    agent.resourcemon.threshold.disk.warning.severe=20M
    agent.resourcemon.threshold.disk.critical=10M



  • 4.  Re: Agent disk space monitoring?

    Posted 09-07-2018 03:39 PM


  • 5.  Re: Agent disk space monitoring?

    Posted 09-11-2018 09:23 AM

    It's still unclear how the directory would ever reach only 10 or 20MB free space when the volume in use (/opt)shows 73GB available and the low has been ~30GB

     

    What is queried to determine the directory or disk space available?

     

    Is this still a useful option to have enabled if it results in failures when we do not see high disk space utilization?



  • 6.  Re: Agent disk space monitoring?

    Posted 09-11-2018 09:27 AM

    If you ask me, i wouldn't use it at all. i would talk to the system guys and ask them what they use to monitor disk space and use that ... i feel the remote agent should do its job. let something else worry about monitoring if its actually working.. 

    just my 3 cents 

     

    Steve C.



  • 7.  Re: Agent disk space monitoring?

    Posted 09-11-2018 09:35 AM

    I tend to agree with you Steve.  We have been running this for over a year before migrating to our new Data center.  It did indeed alert us when something consumed the space on the old servers.  However, now it seems to be more of a false alarm and cause prod stoppages in processing job on a specific server.



  • 8.  Re: Agent disk space monitoring?

    Posted 09-17-2018 09:30 AM

    Hi,

    I believe you are looking for some very specific functions in the agent.  The CA WA Agent uses simple get disk I/O functions.  These functions are part of the OS and languages in which the application is written.  You can check to this external link for reference where simple I/O function provides disk free space.

     

    The agent depends on the OS to provide the exact free space.  If for some reason you are not getting desired results then there may be some other underlining issue.  Normally, any disk monitoring must only be done on local mounts or drives.

     

    Thank you,

    Nitin Pande

    CA Technologies 



  • 9.  Re: Agent disk space monitoring?

     
    Posted 09-13-2018 09:11 AM

    Hi Gene-LM

     

    You can adjust the Monitoring Thresholds with the parameters you specified above as specified in the documentation:

    To modify the monitoring thresholds, change the agentparm.txt file as described in the following steps:

    Ensure that monitoring is enabled (true is the default):
    agent.resourcemon.enable=true

    Set the following parameters to adjust the monitoring thresholds:

    agent.resourcemon.threshold.disk.warning.notice=n
    Where n is the limit at which the agent sends a message, but the agent keeps running. By default, the agent sends a message if there is less than 21 MB remaining.
    Default: 21 MB

    agent.resourcemon.threshold.disk.warning.severe=n
    Where n is the limit at which the agent issues a severe warning message. The agent continues to run but it stops receiving messages. When the free disk space is higher than this limit, the agent resumes normal processing.
    Default: 20 MB

    agent.resourcemon.threshold.disk.critical=n
    When this limit is reached, the agent shuts down.
    Default: 10 MB

    Shut down and restart the agent.

     

     

    Best regards,

     

    Faouzia



  • 10.  Re: Agent disk space monitoring?

    Posted 09-13-2018 09:18 AM

    Thanks.  However no one has been able to tell me how the agent monitors space.  Can CA provide that?

     

    Our agent shut down based on a critical low disk space threshold (defaults).  However the volume in use never triggered any alerting from our other monitoring.  Free space available was at least 30GB.

     

    Seems this feature is not worth using if it causes false alerts and production agent outages.



  • 11.  Re: Agent disk space monitoring?



  • 12.  Re: Agent disk space monitoring?

    Posted 09-13-2018 10:12 AM

    Thanks again.  I have read the first 2 links previously while researching the issue.  The 3rd focuses more on a job configuration.

    Still unsure how the agent determines the space for the agentparm configuration. 

     

    I will disable this setting and rely on other outside monitoring tools.   

     

    Thanks to those that have replied.



  • 13.  Re: Agent disk space monitoring?

    Posted 09-13-2018 10:33 AM

    I'm not sure how it monitors the space, but we have had the issue several times where the agent suddenly reports 0 bytes free on the disk (never even hitting the warning levels first) and shuts down.  This has happened on both Windows and Linux.  By the time we log in a few minutes later, there is multiple GB free.

     

    However, our server engineering teams research has shown that other applications also complained about being unable to allocate space at the exact same time.  So we are convinced it happened and it seems that none of our monitoring tools detected anything.  As AutoSys is actively running an agent directly on the server versus the monitoring tools which only poll the server periodically, it is able to "see" the sudden use of all of the resources on the disk.

     

    While we have yet to ever identify what has caused this due to a lack of any evidence once we get on the servers, our server engineering teams are saying that none of them have been false alarms.  But only AutoSys detected it and alerted us.  Several happened while AutoSys was actively running jobs.  We suspect an application crashed, attempted to write out a dump/core file, the disk filled, everything stopped and the OS deleted the incomplete dump/core file.