My Summer on Broadcom's Watchtower: Monitoring a Monitor
I had the amazing opportunity this summer to join the Broadcom Mainframe AIOps value
stream to work on Watchtower™—an observability platform that helps users monitor the health
and behavior of mainframe environments through real-time insights.
My summer project focused on Watchtower’s self-observability and building a health-checking
tool to monitor and assess Watchtower’s internal state. The goal - prototype and begin
productizing a full-stack application that surfaces meaningful metrics, performance bottlenecks,
and Kubernetes-level events. The result empowers internal developers with faster debugging
capabilities and lays the groundwork for external users to troubleshoot more effectively in
mainframe-integrated environments.
Getting Started: From Onboarding to Building
After initial onboarding and training on Broadcom’s architecture and tools, I started building
the Watchtower Health Platform from scratch. The early stages were challenging—grappling
with Kubernetes, application-level permissions (like keystores and certificates), and the
“institutional knowledge” of Broadcom’s DevOps pipeline took time to master.
What kept me going was knowing I was contributing something meaningful, and being
surrounded by fantastic mentors. Eventually, I got up to speed and began full-stack
development—handling everything from Kubernetes pod status ingestion to front-end
visualization in React, powered by ClickHouse analytics and a secure Node.js backend.
As the platform matured, it became clear that visibility into service registration and recent
metric sightings from ClickHouse was just as crucial as raw pod health. This insight led to
building out the APIML Service Tree, along with metrics tables sourced from Alert Insights and
ML Insights, two key components within Watchtower. The Service Tree now offers a clean,
filtered view of registered Watchtower services and their associated credentials—essential for
developers working in highly regulated mainframe environments.
Core technologies used:
● Kubernetes: For collecting pod statuses, events, logs, and container metadata
● ClickHouse: High-performance OLAP database for aggregating time-series metrics
● React + Precision Design System: For responsive, accessible UI
● Docker: For containerizing and deploying services
● Node.js: Backend APIs secured with Watchtower’s auth model
● Jenkins: CI/CD automation
● SonarQube + Black Duck: For static analysis and secure dependency management
Solving Real Engineering Problems
1. Navigating Kubernetes RBAC
One of the most technically demanding aspects of the project was configuring Kubernetes Role-
Based Access Control (RBAC). I needed to ensure the Health app had read-only observability
into Watchtower pods, logs, and events—without any elevated or write permissions. This
required a deep understanding of Roles, ClusterRoles, and service accounts, as well as the
Kubernetes API.
2. Scaling ClickHouse Queries
Another big challenge involved optimizing ClickHouse performance. The first version of my
metrics table struggled with load times due to the massive volume of time-specific entries—these
were feeding Watchtower’s ML Insights engine, which analyzes customer-specific activity and
raises alerts on anomalies.
To address this, I implemented a backend caching system using hashed keys and redesigned
several queries dynamically. The result: API volume dropped significantly, and frontend
responsiveness improved by over 5x.

Figure 1: The individual metrics backend fetch shows a 5x response time improvement on repeated requests with
identical headers.
3. Building the APIML Service Tree
This presented a different kind of complexity. I had to parse deeply nested JSON payloads from
secure endpoints, implement user-specific filtering, and organize the data into a tree structure
that developers could use. Success meant mastering Zowe’s API Mediation Layer and rendering
the hierarchy using Precision UI components—resulting in a personalized, secure, and intuitive
display.

Figure 2: The APIML Service Tree organizes Watchtower services and highlights relevant service account credentials.
Making Scalable and Shareable Technology
From the beginning, I didn’t want this to just be a prototype. I set out to build a tool that could
scale with Watchtower and be adopted by other teams.
The codebase, hosted in GitHub, includes:
● Modular React components built for reuse
● Replicable API controllers for easily adding new data sources
● Structured documentation to onboard future contributors
● Authentication aligned with Broadcom’s mainframe security protocols
● A Jenkins pipeline integrated with SonarQube and Black Duck for secure builds and
dependency scans
Looking Ahead
The Health checking tool allows Watchtower to monitor itself. When pods crash, metrics
disappear, or components misbehave, the system raises a flag. Internal developers can respond
quickly—and, external users will gain that same observability through a tailored UI.
The APIML Service Tree became one of the most impactful features. Before, the APIML web
interface showed all registered services in one long, unfiltered list—making debugging difficult.
Now, developers see only the services they own, understand what service accounts are in use,
and trace interactions confidently across Watchtower.
Together, the metrics dashboard, alerts viewer, pod health table, and service tree form a robust
feedback loop. They provide insight into what Watchtower is doing—and what it needs to keep
doing to remain healthy.

Figure 3: The Pod Health Table uses Kubernetes data to track real-time status within the Watchtower namespace.
This project is just the starting point, Broadcom is modernizing the mainframe space by
integrating cloud-native tools like Kubernetes into mission-critical systems used in industries
such as banking and aviation.
Through this experience, I gained hands-on exposure to:
● Kubernetes service accounts, secrets, and container orchestration
● Building secure, scalable APIs with real-time frontend rendering
● CI/CD pipelines, static analysis, and dependency scanning
● Agile engineering practices: modularity, code hygiene, and version control
Conclusion
I’m deeply grateful to my mentors and teammates—Santiago Ortega, Diane Volkmar, Mihir
Shah, and the entire Watchtower crew—for their guidance and support throughout this journey.
This summer taught me how modern observability, legacy mainframes, and AI-driven insights
can come together to power something truly future-proof. I’m proud to have contributed to that
vision.