VMware Aria

 View Only
  • 1.  Aria Automation Monitoring

    Posted Jan 07, 2025 09:11 AM

    How are people monitoring their on-prem deployments of Aria Automation and Orchestrator? I have connected them to Operations and Operations For Logs - it seems Logs gives the best ability to monitor the service for problems but the out of the box dashboards and alerts provide too much information. I can see I have lots of events, warnings and critical alerts but everything is still running without issues.

    For example some of the Alerts I am getting out of Logs:

    CRITICAL: JDBC Connection Error

    WARNING - Garbage Collection Failed

    CRITICAL - Failed To Establish Connection

    WARNIGN - Configuration File Error

    The dashboard shows lots of events but so what?

    Operations seems more focussed on providing a view of things deployed via Automation and grouped in a project/deployment rather than monitoring the Automation Service itself - am I wrong?



  • 2.  RE: Aria Automation Monitoring

    Posted Jan 08, 2025 05:37 AM

    VRA creating tons of log entries is nothing new, sadly:

    We just create alerts on events that actually did or could soon cause a service interruption, for example: 

    • when heap memory usage is too high
    • certain vRO/ABX workflows fail
    • storage on vRA nodes gets low, due to too many java heap dumps following a workflow failure
    • problems with user validation/authentication

    There is also a limited number of pre-defined alerts under Alert Definitions, some of which we have activated.




  • 3.  RE: Aria Automation Monitoring

    Posted Jan 08, 2025 08:46 AM

    Thanks for your reply. I'm being cheeky but could you post the alerts you've actually got enabled?

    And what are the actions you take when these alerts occur:

    Heap memory usage too high

    Storage on VRA nodes gets low - how do you monitor this as we cant install the telegraf agent on the appliances?




  • 4.  RE: Aria Automation Monitoring

    Posted Jan 09, 2025 04:17 AM

    Prelude logs contain an event for low storage space:

    For out of memory events I just use a generic lookup for "java.lang.OutOfMemoryError" on the "vco-app" app. This triggers when vco pods crash. Only recently, with one of the Aria releases, you are able to finally set the heap size for vco, so I hope I will not see this event often: