How Network I/O Delays Suffocate Your Containers in the world of microservices and Kubernetes, containers promise agility, scalability, and efficiency.
We package our applications, deploy them, and expect lightning-fast performance. But often, there's a silent killer lurking beneath the surface, especially when our applications start to scale: network-induced I/O delays.
It's easy to assume I/O is all about disk reads and writes. However, in a distributed container environment, a significant portion of "I/O" is actually network-bound. Every time your application needs to talk to another microservice, a database, or a persistent storage volume like NFS, it is performing network I/O.
The Real-World Performance Gap: Local vs. Network I/O
To understand the tangible impact of network-induced I/O delays, we compared local disk performance against NFS-attached storage. The data below illustrates how shifting from local bus communication to a network protocol can significantly degrade your IOPS (Input/Output Operations Per Second) and throughput. We made the test so that the differences are well visible.
|
Access Type
|
Metric
|
Broadcom Recommended
|
Local Disk (Raid 0)
|
NFS Test Performance
|
|
Random Read
|
IOPs
|
122k
|
163k
|
14k
|
|
Random Read
|
Speed
|
476 MiB/s
|
636 MiB/s
|
56 MiB/s
|
|
Random Write
|
IOPs
|
29k
|
75k
|
13k
|
|
Random Write
|
Speed
|
115 MiB/s
|
295 MiB/s
|
50 MiB/s
|
|
Seq. Read (64kb)
|
IOPs
|
82k
|
58k
|
9k
|
|
Seq. Read (64kb)
|
Speed
|
5172 MiB/s
|
3638 MiB/s
|
562 MiB/s
|
|
Seq. Write (64kb)
|
IOPs
|
41k
|
2k
|
8k
|
|
Seq. Write (64kb)
|
Speed
|
2622 MiB/s
|
174 MiB/s
|
505 MiB/s
|
You now need to take into account the impact this will have down the chain of applications. Each delay will propagate like in a car-traffic.
Key Observations from the Data
-
Devastating Random I/O Latency: In random read scenarios—typical for database workloads—NFS performance dropped to just 14k IOPs, a massive decrease compared to the 163k IOPs achieved by Local Disk Raid 0.
-
Failure to Meet Recommendations: While the local disk setup comfortably exceeds Broadcom’s recommended 122k IOPs for random reads, the NFS environment fails to reach even 12% of that target.
-
Sequential Read Impact: Sequential read speeds also suffer heavily, with NFS reaching only 562 MiB/s, compared to the 3638 MiB/s seen on Local Disk Raid 0 and the Broadcom recommendation of 5172 MiB/s.
Why Network I/O Becomes a Bottleneck in Containers
-
Increased Hops & Indirection: Every microservice call or sidecar proxy in a service mesh adds processing time and latency.
-
Shared Resources: Containers on a single node share the same physical network interface; a "noisy neighbor" can starve others of bandwidth.
-
Remote Storage Latency: As shown in the data, moving storage to the network (NFS) transforms high-speed local disk operations into much slower network calls.
What Can You Do?
-
Local Caching: Use in-memory caching to reduce the frequency of network calls.
-
Network-Aware Scheduling: Use Kubernetes features to co-locate interdependent services on the same node.
- When developing applications, make sure the data is requested only once through a remote connection and work with a local cache for further processing.
-
Observability is Key: Deploy robust full-stack observability solutions like Broadcom DX O2. By leveraging eBPF probes, you can track network latency and I/O wait times at the kernel level with zero-touch, identifying these hidden bottlenecks before they sabotage your applications.