DX Operational Observability

 View Only

Get Started with GenAI & LLM Observability with DX Operational Observability

By Ashish Aggarwal posted Apr 11, 2026 11:13 AM

  

Generative AI (GenAI) is quickly transitioning from experimental chatbots to enterprise-critical  applications. As organizations integrate Large Language Models (LLMs) into their products and services, and core workflows—often leveraging the robust Spring AI framework—new challenges are emerging which involve AI performance, costs and quality of output. By taking advantage of data available to IT operations, enterprises can make well-informed decisions to avoid pitfalls common to helpful but disruptive technologies. 

Welcome to the world of LLM Observability

Traditional metrics like CPU and memory are no longer enough to explain why an AI-driven service is slow, expensive, or failing to deliver quality output.

With the DX Operational Observability (DX O2) 26.3.1 (SaaS), Broadcom leads the charge by introducing dedicated APM for GenAI support. This enhancement is a critical first step that helps organizations with comprehensive LLM observability. Described below with tips to get started, this enhancement provides SREs and DevOps teams with "single pane of glass" visibility to AI-specific data that will help drive AI success. 


The Technical Core: Instrumented Spring GenAI Extension

At the heart of this release is the new Spring GenAI extension for the DX O2 Java Agent. Designed specifically for applications built on Spring Boot 3.0.x and 3.5.x, this extension provides deep, code-level instrumentation of the Spring AI ecosystem.

Image source

By hooking into core Spring AI interfaces, DX O2  automatically captures telemetry from critical components of your AI pipeline. These include:

  • ChatModel and ChatClient: Tracks every interaction with the LLM, from the moment a prompt is submitted to the final completion.

  • VectorStore: Provides visibility into the latency and success rates of retrieving context from databases like Milvus, Pinecone and PostgreSQL.

Value-Driven Visibility: Beyond Traditional APM

Fundamental to monitoring an LLM are:

  • Unit economics 

  • Performance

DX O2 26.3.1 addresses these requirements through the new APM GenAI Dashboards which align to three pillars of value.

 

1. Token Economics and Cost Control

LLM pricing is based on input and output tokens. While few organizations formally budgeted for AI initially, usage is skyrocketing. By monitoring token consumption, IT operations teams are well-positioned to help the broader organization understand costs in near-realtime. This is a significant benefit since a rogue prompt or an inefficient RAG pipeline can lead to unexpected cloud expenses. Within DX O2, the GenAI extension is able to report:

  • Prompt Tokens vs. Completion Tokens: Identify where actual consumption is highest.

  • Usage by Model and Provider: Compare the cost-efficiency of OpenAI, Azure OpenAI, Google Vertex AI, and Anthropic side-by-side.

2. Performance and Latency Triage

LLM calls are notoriously latent compared to traditional REST APIs. DX O2 allows you to isolate the precise source of the delay. Delays may include:

  • Model Latency: The time taken by the provider to generate a response.

  • Orchestration Overhead: Time spent in the Spring AI framework or RAG retrieval steps.

  • Time-to-First-Token (TTFT): Crucial for user experience in streaming chat applications.

3. Reliability in a Non-Deterministic World

Unfortunately, unlike traditional code, LLMs can fail in "creative" ways. Rate limits may cause unexpected behaviors; safety filters may impact user experience; and hallucinations and inappropriate use of source data may generate unusable output. Each of these issues can impact AI adoption, business results and add risk to enterprises. 

DX O2 captures information that can help IT identify and diagnose these issues.

Model-Specific Errors: Distinguish between network timeouts and provider-side errors.

 

KPI Name

Source

Purpose

gen_ai.client.operation.duration

Spring AI

Measures the full round-trip of the LLM call. High duration without a finish_reason suggests a network hang.

gen_ai.client.token.usage

Spring AI

If input_tokens > 0 but output_tokens == 0, the failure happened after the provider received the request but before generation finished.

http.client.requests

Spring Boot

Standard Micrometer metric. Use this to catch 408 (Timeout) or 504 (Gateway Timeout) before the GenAI layer even processes a response.

 

Technical Tips to Isolate the Source of Issues

In DX O2, begin by creating  a "Triage Map" using the following attributes to isolate the root cause:

1. Identifying Network Timeouts

If the error is network-related, the GenAI-specific attributes will often be missing or null because the request never successfully completes the handshake.

  • KPI Filter: http.client.requests where status is 408 or 504.

  • Observation Attribute: Look for error.type = java.net.SocketTimeoutException or io.netty.handler.timeout.ReadTimeoutException.

DX O2 Signal: In the Service Topology view, the link between your Spring Boot service and the LLM endpoint (e.g., api.openai.com) will turn red, but the "Response ID" in the trace will be empty.

2. Identifying Provider-Side Errors

If the provider (OpenAI, Anthropic, etc.) is reachable but rejects the request, you will see specific HTTP status codes and semantic reasons.

  • KPI Filter: gen_ai.client.operation.duration where http.response.status_code is 429 (Rate Limit) or 503 (Overloaded).

  • The "Golden Attribute": gen_ai.response.finish_reasons.

    • content_filter: Model-side error (Safety/Refusal).

    • length: Model-side error (Context window exceeded).

DX O2 Signal: In the Alarm Inspector, you will see a success at the HTTP level (200 OK) but a "Business Error" flag because the LLM returned a refusal instead of content.

Fallback Monitoring
Follow these steps to validate if your application is successfully failing over to secondary models when the primary fails.
1. The "Model Divergence" KPI

The most reliable way to validate a successful failback is to track the mismatch between the Requested Model and the Responding Model.

  • KPI Name: gen_ai.failover.divergence_rate

  • Logic: Count occurrences where gen_ai.request.model $\neq$ gen_ai.response.model.

  • DX O2 Implementation: Use a Custom Metric Expression in DX O2 dashboards comparing these two high cardinality attributes from the gen_ai.client.operation trace.

Significance: If your code is designed to switch from gpt-4 to gpt-3.5 on failure, a spike in this metric confirms your fallback logic is executing.

2. Cross-Model Performance KPI (The "Experience Gap")

When a fallback occurs, the user experience is likely affected  (e.g., lower quality or higher latency). To measure the impact to user experience, follow these steps. 

  • KPI Name: gen_ai.fallback.latency_overhead

  • Calculation: {Latency_Total} -{Latency_Primary_Attempt}

  • DX O2 Visualization: Create a Multi-Series Chart.

    • Series A: Average latency where resilience4j.fallback was NOT triggered.

    • Series B: Average latency where resilience4j.fallback WAS triggered.

Actionable Insight: If (Fallback Latency) > Twice ( Normal Latency), your secondary model may be too slow or your timeout settings are too high.

3. Trace-Level Validation in DX O2

In the DX O2 Trace Explorer, a successful failover will look like a "waterfall" of spans. You can validate this by looking for the following sequence in a single Trace ID:

  1. Span 1 (Primary): gen_ai.client.operationerror=true, gen_ai.request.model="gpt-4".

  2. Span 2 (Resilience): resilience4j.circuitbreakerattribute: fallback_executed.

  3. Span 3 (Secondary): gen_ai.client.operationerror=false, gen_ai.request.model="gpt-3.5-turbo".

Proactive Alerting: In DX O2, set an alarm on the ratio of Total Requests to Fallback Successes. If fallbacks are triggering but returning 4xx/5xx errors, your "Safety Net" is broken.

Why LLM Observability Matters for SRE and DevOps Teams

Getting started with  LLM observability isn't just about more charts; it’s about reducing mean time to resolution (MTTR) in a complex stack.

When a user reports that a GenAI feature is "broken," an SRE using DX O2 can now trace the transaction from the frontend, through the Spring Boot service, into the specific Spring AI ChatClient call, and finally to the LLM provider's response. This detailed trace provides IT and business stakeholders with a wealth of information regarding AI usage, prompts submitted, calls made, overall AI performance and even quality of output. 

With LLM observability, IT can correlate insights end-to-end to ensure AI is not a "black box" in your production environment.

 

Getting Started

LLM observability starts with a simple upgrade. By deploying the 26.3.1 (or subsequent)  Java Agent and enabling the Spring GenAI extension via the Agent Command Center (ACC), you can transform your AI-native applications from experimental to enterprise-ready.

As GenAI continues to evolve, DX Operational Observability from Broadcom continues to expand its observability footprint—helping you manage the costs, performance, and reliability of the intelligent future.

Ready to start your journey? Refer to the Broadcom TechDocs for detailed configuration steps.

0 comments
8 views

Permalink