Welcome back! So far we have addressed all the alternative storage networking protocols based on Ethernet/IP that have attempted to displace Fibre Channel as the gold standard for storage networking in the data center. But let’s address today the claims that Fibre Channel is somehow an incredibly complex technology that requires terribly advanced skills to deploy and operate, and that Ethernet is simple, affordable and everyone knows how to deploy and operate it simply by virtue of being Ethernet. These are, as I mentioned at the beginning of this article, half-truths. It is true that there is a much larger install based of Ethernet switch ports out in the marketplace than there is of Fibre Channel switch ports. Therefore, it is logical to assume that there are a lot more people trained and used to managing Ethernet networks than there are people capable of managing a Fibre Channel SAN. But the fact that there are more people trained in managing and configuring a certain technology doesn’t really mean that said technology is somehow inherently simpler.
Before Brocade was acquired by Broadcom, we spent a number of years making a name for ourselves in the Ethernet/IP switching and routing industry, following our acquisition of Foundry Networks in 2009. The way we differentiated ourselves in the marketplace, particularly in the data center switching space, was by dramatically simplifying the deployment and operations of Ethernet networks because — let’s face it — they have always been incredibly hard to manage and incredibly manual and laborious to configure. In an Ethernet network, every single switch port needs to be told exactly what it needs to do, whether it’s an access port — to connect end devices — or a trunk port — to connect other switches and form a network — and in the latter case you must manually specify which VLANs are allowed to be carried over said trunk port. These aren’t the only port properties that need to be specified manually: if a port is to be part of a link aggregation group or LAG — because when has the bandwidth of a single link been enough to carry all the traffic you need to carry between two devices? — it must also be specified manually, lest you accidentally create a loop and your entire network melts down because, you see, Ethernet is a layer 2 flood-and-learn protocol and if there is a loop you have big, very big, problems. And to avoid loops in the first place you must — manually — configure STP and give away half of your network’s bandwidth, and be able to cope with seconds of network downtime if a link goes down, or learn and — manually — configure its different variations like PVST or RSTP to make things only slightly better, or try to avoid using STP altogether by using multi-chassis link aggregation (MLAG), which you also have to configure manually and it works completely differently for each and every vendor because there is no single standard for MLAG, and it is also incredibly manual and laborious to set up, and you better pray that it works well all the time, or you better have set up STP properly underneath it just in case.
An apt visual representation of the Ethernet/IP protocol stack
Once you’re done setting your layer 2 domain, you have to start thinking about layer 3. Will you use IPv4 or IPv6? Do your switches have a big enough ARP cache? What routing protocol are you going to use? OSPF or BGP? Do you know how to properly configure ECMP for your routing protocol? How are you going to provide redundancy and high availability for your routing services? VRRP or HSRP? Do you need multicast services? Can you even spell IGMP? Do your switches support IGMP snooping? Do you know how to configure it? Do you want to completely get rid of L2 and STP and deploy a layer 3 fabric? Are you ready to manually assign an IP address to every single switch port? How are you going to make your virtualization layer believe it’s still running on an L2 domain so that things like VM migrations work? Will you be using some kind of network virtualization technology based on VXLAN? Can your switches terminate VXLAN so you can extend your virtual L2 domains into the physical realm? Do you know how to configure that? Do you even know what VXLAN is?
And I haven’t even addressed running storage traffic over the network yet! The proponents of SoE (Storage over Ethernet — yes, I just made this one up) will just tell you “well, it’s just Ethernet! It’s simple! It’s interoperable! It’s scalable!”, but once again these are all half-truths. Paraphrasing the quote from the NVMe-oF specification that I quoted in the last article
and adapting it for storage in general, we can safely say that “obviously, transporting storage flows across a network requires special considerations over and above those that are determined for local storage”, and therefore, ‘just Ethernet™’ just doesn’t cut it. In fact, if you run a flavor of SoE — like FCoE or NVMe/RoCE — that doesn’t rely on some upper-level protocol to ensure all packets get delivered to the destination — like TCP — then you’ll need to run a lossless Ethernet network, and for that, you’ll need to run Data Center Bridging (DCB). Do all the switches in your network support DCB? Do your NIC cards on your servers support DBC? DCB relies on Priority-based Flow Control (PFC) to provide flow control at a granular level so that it is only applied to the traffic that needs it — like storage — while not to the traffic that would be hampered by it, but it is still based on the use of pause frames and therefore it is a reactive mechanism that needs to wait for the receiver to detect that their buffer capacity is below a certain low threshold and to send a notification (PAUSE) to the transmitter for it to stop sending data and avoid dropping frames, unlike the proactive mechanism that technologies based on buffer-to-buffer (B2B) flow control deliver — yes, the ones that the ‘ideal’ underlying network should support to run NVMe-oF. In addition, now your storage traffic is sharing the entire network and its available bandwidth with all the rest of your network traffic, and you will need to configure Enhanced Transmission Selection or ETS, another part of DCB, to ensure that your storage traffic always has a minimum amount of guaranteed bandwidth available to it. But all your storage flows
— potentially thousands
of them — will be sharing the same ‘lane’, and you won’t be able to differentiate them, protect them, isolate them, prioritize them… In addition, RoCEv2 is routable, because it runs on top of UDP, and while this means that it can potentially scale better than FCoE because it can span across VLAN boundaries, it so happens that the underlying flow control protocol that is supposed to guarantee frame delivery (PFC) is a L2 protocol itself and therefore cannot
span across VLAN boundaries. How do we ensure lossless delivery between end nodes in this case? Well, just add another piece to your Jenga tower and configure — once again, manually — Explicit Congestion Notification (ECN), and make sure all your end-nodes support it, and your switches too, and that you actually know how to configure it and how to troubleshoot it, and that it works well and reliably across devices from different vendors. Suddenly this doesn’t sound like ‘just Ethernet’ to me.
If you run a flavor of SoE that relies on TCP for flow control and guaranteed delivery — mainly iSCSI, but also NVMe/iWARP or NVMe/TCP — then, well, you’ll have to deal with TCP’s well-known and widely-accepted performance problems when packets are dropped and need to be resent (slow-start) and other issues. In fact, TCP is widely acknowledged to not
be a good flow control protocol for low-latency, high-performance applications, and that is exactly what storage is. Will TCP work well enough for many use cases? Of course. But that doesn’t mean it’s the right protocol for storage environments demanding reliable, deterministic low-latency and high performance. In fact, this has been so acknowledged by the storage industry that there are attempts to replace the TCP layer in storage for alternatives based on RDMA, such as the iSCSI Extensions for RDMA (iSER) or the SCSI RDMA Protocol (SRP), none of which have ever gained any significant traction, perhaps because of the added complexity of the RDMA layer, the need for specialized adapters called RDMA NICs (RNICs) or switches that support DCB and ECN, or perhaps because they fail to show significant performance benefits over traditional iSCSI over TCP — if they don’t perform even worse — as evidenced by the RoCE Deployment Guide
Performance comparison between iSCSI and iSER
Whether you run your storage directly over Ethernet or over TCP/IP (or UDP/IP in the case of RoCEv2), there’s still the issue of storage device discovery to be dealt with. An old comparison
by EMC’s Erik Smith
on his own personal blog between FCoE, iSCSI and Fibre Channel concluded that it takes a lot more configuration steps to provision storage on Ethernet-based fabrics because there is no centralized name server or similar repository that can be used by end nodes for discovery, and therefore the storage resources need to be manually configured on every server somehow:
Ease of storage resource provisioning comparison between FC, iSCSI and FCoE
I’ll admit right from the start that I don’t know if anything has been done in particular for NVMe/RoCE to aid in this — iSER is just iSCSI running on top of RoCE (or iWARP), but it’s essentially just iSCSI — so perhaps things look a little better there, but my suspicion is that they don’t, and since RoCEv2 runs on IP — albeit with UDP instead of TCP — I’d wager you still have to manually enter the IP address of the target device in every initiator, which can be a huge operational burden in large environments with thousands of initiators. And while it is true that for iSCSI there exists a service called iSNS (Internet Simple Name Server) that can automate target device discovery for iSCSI initiators, the reality is that this is hardly ever implemented, because to the best of my knowledge there are no Ethernet switches that have an embedded iSNS server — Brocade had released an embedded iSNS servers in our VDX switches just before the Broadcom acquisition — so users would have to deploy it in an external server, and there simply aren’t any enterprise-class software iSNS implementations out there.
Suddenly, you have an overwhelming amount of different, interrelated protocols that create an incredibly complex protocol stack that you need to be able to provision, configure, manage, monitor and troubleshoot for when something goes wrong. This isn’t necessarily a bad thing, mind you. This is, in fact, part of the beauty of Ethernet/IP: it can serve a tremendous amount of purposes and support a tremendous amount of applications with varying degrees of service levels. But to pretend even for a moment that somehow Ethernet/IP automatically equates simplicity and ease of use or management is simply “ever the blackest of lies”.
On the other hand, what do you need to do to provision a Fibre Channel network? In Fibre Channel, every switch port automatically detects what you connect to it and configures itself accordingly, whether it’s another switch or an end device. If it’s another switch, it will automatically detect whether it’s the first or subsequent link (Inter-Switch Link or ISL) between the two switches, and in the latter case it will automatically figure out the best way to optimize load balancing between those ports: either at the physical layer (L1) with frame-based load balancing — if you have Brocade switches — or at L2 with FSPF with exchange-based load balancing. We could describe Fibre Channel as a routed L2 network, and therefore there is no such thing as a loop, there are just multiple ways to get from A to B, and load balancing happens automatically when multiple routes have the same cost. If it is an end device, internal fabric services that run in a distributed fashion among all the switches in the fabric will help it determine which other devices it can communicate with or, in storage parlance, which storage devices are available to each server, based on permissions configured centrally within the fabric by way to a technology called zoning
. Not only that, the fabric will even enforce said permissions at a hardware level, automatically blocking and throwing away frames from unauthorized flows. The only pre-requisite to be able to connect two Fibre Channel switches together is for them to have had a unique identifier — called Domain ID or DID — assigned to them by an administrator. But this operation is performed only once over a Fibre Channel switch’s lifetime, and it can even be automated. In general, the only
thing that needs to be configured on an ongoing basis on a Fibre Channel SAN is zoning — which, by the way, is a process that can be also automated with technologies like peer zoning or by using RESTful APIs. Why exactly this is considered ‘complex’ and requiring ‘specialized skills’ is literally beyond me. Of course, there are more advanced features that could be deployed on a Fibre Channel network, but few of these are really required for basic operations. There is a myriad of advanced monitoring, analytics, performance management, and proactive monitoring features, to name a few, that advanced users can take advantage of that are specifically designed and developed for storage.
In fact, at Brocade we developed an Ethernet Fabric technology that we came to call VCS (Virtual Cluster Switching) which borrowed a lot of the features that make Fibre Channel so easy to deploy and operate, such as auto-discovery of switches and fabric topology or completely automated multi-pathing at the physical layer — with our frame-based trunking that made Ethernet networking experts wow
— as well as L2 — by leveraging the same FSFP routing protocol running over a TRILL network. Our entire message was articulated around simplifying network deployment and operations, and it resonated well with customers.
But that is another story, and I am digressing again, so I’ll leave it here for today and I hope you join me again in the next entry to this series where I will talk about why Fibre Channel is the best technology to transport storage flows because, well, it was purpose-built for storage… or was it?
If you want to learn more about how you can make Fibre Channel even easier to deploy and operate through advanced automation technologies, check out the following links:
If you missed the previous entries into this series, make sure to check them out here:
Click here to read the next entry: Fibre Channel Was Built For Storage – Or Was It?