AutoSys Workload Automation

 View Only
  • 1.  autosys KPI

    Posted Jul 18, 2022 01:40 PM
    i'm looking for KPI ideas for autosys.
    1. EP uptime
    2. EP latency
    3. number of jobs
    4. number of runs
    5. percentage of failed vs total runs
    6. number of machines
    7. number of job owners
    what else we can measure?

  • 2.  RE: autosys KPI

    Posted Jul 19, 2022 03:44 AM

    jobs defined maybe be measured also.


  • 3.  RE: autosys KPI

    Broadcom Employee
    Posted Jul 19, 2022 03:53 AM
    Perhaps another measure worth considering would be number of manual interventions, e.g. sendevents.

  • 4.  RE: autosys KPI

    Posted Jul 19, 2022 09:18 AM
    * Number of job runs per hour of the day
    * Number of jobs that won't run (no-exec, on-ice, on-hold)
    * Number of stale jobs (haven't run since x)
    * Number of cred owners with no jobs
    * Number of machines with no jobs

  • 5.  RE: autosys KPI

    Posted Jul 19, 2022 12:30 PM
    Hey Pavel.  I ran through an exercise like this not too long ago.  I agree with everything you've mentioned and the things others have mentioned too.  In our case my prober (prometheus, written in python) also checks...

    How far behind is DB - how old is the oldest unprocessed event in ujo_event, how many unprocessed events are there
    Skew time for each event - Since the last time scanned how long did each event take in ujo_proc_event to go from init_status_stamp to que_status_stamp, create a histogram and measure SLI skew over time.  Good for an SLO.
    Discrepancy between what config file thinks about DB status vs. what alamode in each db thinks about its own status
    Blackbox user journey - how long does it take to go end-to-end from force starting a /bin/true job to reach an end state
    We're also using a dashboard that reports migration ratio of jobs in old instance/jobs in new instance for instances under migration.
    I'm sure there's stuff you can monitor for your GUI as well like authentication failures and the usual lot.

    Good luck!

  • 6.  RE: autosys KPI

    Broadcom Employee
    Posted Aug 09, 2022 09:15 PM
    Hi All,

    These are all good items to track.  Should it be divided between the 'service' and what I would call usage data. 
    For the service, I would consider uptime for the components, db, agents (going offline/missing), the performance of those.
    The usage KPI are the throughput, failure rate, manual sendevent, job Alarms (different than a job failure, which is not always an alarm).