Automic Workload Automation

 View Only

 Best way to integrate LLM call for error notifications

Alex Wiles's profile image
Alex Wiles posted Dec 09, 2024 04:59 AM

Hi everyone,

I want to enhance standard error notifications using LLM-generated analysis and formatting. I'm looking for implementation recommendations that would be both easy to drop-in and reliable. Ideal requirements are 

  • Minimal manual configuration - need a solution that doesn't require modifying individual jobs or workflows as we have 100's of them, with different people and teams responsible for them
  • Must fallback to standard notifications if LLM service is unavailable

Has anyone attempted something similar or can suggest other approaches? Particularly interested in solutions that could be implemented at a higher level to avoid manual job modifications.

Joel Wiesmann's profile image
Joel Wiesmann

Hi Alex

hands up - I have some experience. First you should spend some thoughts on where you want to deploy the LLM. If you can access the internet, OpenAI would be interesting. They have their API (and pricing) documented on their homepage. AWS Bedrock / Azure / GCP might be an option too. Last but not least you could also use a locally deployed huggingface model. However, if you deploy it on common server hardware, the speed might not be satisfying.

Im mostly experienced with OpenAI / Bedrock models, these are really easy to use. OpenAI has REST webservices, Bedrock can be easily accessed using the Boto3 python client. It depends on what you plan to use to access the models - REST JOBS objects or OS JOBS with some scripting (my favorite is #2, you have more possibilities and you're just faster in implementation).

Usually, when you create automations, you implement some kind of escalation workflow that is triggered in case of issues. You can use this workflow to extend the escalation with LLM capabilities. What I did is gathering the object definitions & failed job report via the Automic REST API and then inject it into the prompt for error analysis. There are many more possibilities to extend this, like RAG with a knowledge database or agents (many LLMs feature tools support / agents). 

If you're into the topic, check out my YouTube video where I'm demonstrating the usage of the ChatGPT Plugin to interact with Automic (https://www.youtube.com/watch?v=Oj3hI7iQiBc&t). It's in German, but Youtube has automated english subtitles. The video is from June 2023, in the meantime you would use agents instead of plugins.

Regards
Joel

Alex Wiles's profile image
Alex Wiles

@Joel Wiesmann Thanks for the quick reply Joel. We have our own local server running the latest LLMs. I watched your video (at this point I'm sure 50% of the Automic userbase is German). I think the Automic REST API does exactly what I'm looking for, but I haven't tested it yet. I am happy to know that it works though. How often did you check the execution lists? Once a day? Every few seconds? I want to avoid unnecessary API calls and make it more event-based. My solution, which I came up with yesterday, is to intercept the error notification email (e.g. by sending it to some technical/microservices email) and then either send it to the LLM or forward it to the recipients if the server is unavailable. Still thinking about it...

Joel Wiesmann's profile image
Joel Wiesmann

Hello Alex

I can't reply to your reply as I think this thread was opened as a ‘question’ and these types of threads don't expect people to have a discussion. So I'll just add another reply and hope it reaches you.

You mention an ‘error notification email’ - this is possibly a call operator object triggered somewhere in some escalation logic. This escalation logic is probably the area that should be extended with LLM functions and would be event-based, which is how you envision the solution.

Sending mails to a microservice which then forwards it sounds improvised to me. Better trigger the LLM logic as a separate job in Automic and if something would not work out properly, you can make use of all the nice features Automic offers you (like restarting tasks or read reports). That's the central, orchestrated approach. If you involve external systems, you need to build and monitor them, making the lifecycle and operations unnecessary complicated. 

I am curious which LLM you are using and would be happy if you would share this.

Jeah, the German speaking (I'm Swiss) userbase is pretty high. I'm currently interested in extending my Automic network in the english speaking world, so feel free to connect to me on LinkedIn. https://www.linkedin.com/in/joel-wiesmann/

Alex Wiles's profile image
Alex Wiles

@Joel Wiesmann

> You mention an ‘error notification email’ - this is possibly a call operator object triggered somewhere in some escalation logic. This escalation logic is probably the area that should be extended with LLM functions and would be event-based, which is how you envision the solution.

I really do mean an email. To be exact, I have a call object, which sends an email if called + attaches logs from the job that called it. 

> Better trigger the LLM logic as a separate job in Automic and if something would not work out properly, you can make use of all the nice features Automic offers you (like restarting tasks or read reports).

See, this is the problem. Creating workflows is possible, but then I would have to go and manually change postconditions on every workflow from "old call object" to "new workflow". So I'm kinda aiming for something quick and dirty with low complexity, easy-to-convince kind of thing. My idea was that the volunteers could just add the email address I'll provide to the VARA object and thats it -- no overhead. I've sent you an invite :)  

Joel Wiesmann's profile image
Joel Wiesmann

If you want to *pitch* the idea, you can use ACTIVATE_UC_OBJECT (AE script) in the call operator object to trigger the "LLM workflow". If you want to go for a sustainable solution, I'd recommend to replace the CALL object with a JOBP anyway. It gives you way more flexibility. There are ways to do this efficiently.

Michael Dolinek's profile image
Broadcom Employee Michael Dolinek

Hi @Alex Wiles

you might consider using the new AE-script function ASK_AI which was introduced in Automic Automation v24.4 (see https://docs.automic.com/documentation/webhelp/english/AA/24.4/DOCU/24.4/Automic%20Automation%20Guides/Content/_Common/ReleaseHighlights/WhatsNew_24_4_0.htm )

Michael