In the previous article, I talked about the historical contexts in which iSCSI and FCoE—in 2002 and 2008 respectively—tried to replace Fibre Channel as the storage networking protocol of choice in the datacenter. In this post, I will address the latest attack Fibre Channel is already fending off from a pure ‘protocol wars’ point of view. This latest one comes on the heels of one of the most interesting and disruptive transitions we have seen in the storage industry in the past few decades: the emergence of flash storage, the transition away from SCSI to NVMe and the imminent irruption of a new generation of non-volatile memory technology for storage that has come to be known as Storage-Class Memory (SCM).
In 2019, we all know what flash storage is. We use it daily in our smartphones, tablets, smartwatches and all other sorts of wearable computing devices. We have it in our laptops — who remembers the times when your laptop’s hard disk drive would just keep spinning and making noise while your computer’s performance ground to a halt? — and it comes in most desktop computers one can buy today — does anyone still buy those? However, it wasn’t that long ago when even these portable devices — or at least the ones that existed at the time — were still using traditional, magnetic, spinning HDDs — original iPod anyone?
Flash-based solid-state drive (SSD)
In the data center, flash emerged at least a decade ago, but as new technologies that provide several orders of magnitude better performance than their predecessor often do, it came at a hefty price premium that relegated it to niche applications that were the only ones that could justify the extra cost, or to a sort of ‘cache’ layer to accelerate access to ‘hot’ data, while the bulk of the storage capacity continued to reside in spinning drives — these were called ‘hybrid’ storage arrays. But as is usually the case with these sorts of technology transitions, as time went by and adoption ramped up — slowly albeit surely — the price came down and it fueled a feedback loop of increased adoption and commoditized pricing which paved the way for what we have today. We transitioned from installing solid-state drives (SSDs) that looked and behaved exactly like HDDs in traditional storage arrays to developing entirely new storage array architectures designed from the ground up to take full advantage of the performance characteristics of flash storage — the so-called ‘All-Flash Arrays’ (AFAs), which left our dear and old spinning drives behind for good, as remnants of a bygone era.
Redesigning the architectures of storage arrays alone, however, has not been sufficient to fully leverage all the performance that flash storage can provide. As we moved along this timeline, we realized that the traditional storage protocol we had been using to address these storage devices — our good and trustworthy SCSI — was ill-equipped to fully unleash the performance benefits of the transition to an all-flash storage world. In addition, a new and exciting development in flash memory technology was on the horizon, one that would make it even more obvious that SCSI would simply not cut it going forward. The ‘Small Computer System Interface’ was developed in the late ’70s and standardized in 1986, and it was a parallel interface designed to be able to internally or externally connect a small number of diverse peripheral devices to a computer over a ‘ribbon’ cable. SCSI was not designed to connect just HDDs, but all sorts of computer peripherals like floppy disks, scanners, printers or CD drives. It was most certainly not designed with flash storage in mind, a technology which would not come into the world until many years later, so it was no surprise to anyone when we realized that it just couldn’t keep up with the performance of the new storage devices it was being used to address.
That is why the industry came together under the Non-Volatile Memory Express (NVMe) organization to develop a new interface and protocol to — as stated by themselves in the organization’s website — “fully expose the benefits of non-volatile memory in all types of computing environments from mobile to data center”. NVMe is not exactly new. Work on a specialized interface for accessing non-volatile memory (flash) began in late 2007 at the Intel Developer Forum, and from there the first standard was released in 2011, with commercial devices shipping in 2013. Since then, NVMe has consolidated itself as the new standard interface inside devices that use flash storage, including laptops — the MacBook Pro in which I’m typing this has an NVMe SSD — and more recently desktop PCs and servers, typically running over a PCIe interface.
Information about my MacBook’s NVMe SSD
The SAN déjà vu
But just as in the late ’90s with SCSI and internal server storage, the industry started to realize that there could be serious benefits to moving the NVMe storage outside of the servers, centralizing it, consolidating it and accessing it over a networked interface. Could you imagine all the applications and use cases that could be enabled by deploying a shared storage network, along with the improved storage utilization efficiencies? Once again, the sense of déjà vu is hard to shake off. This time around, the NVM Express organization itself had already anticipated this and published in 2016 a specification titled NVMe over Fabrics (NVMe-oF), in which they detailed how NVMe could be transported over “any suitable storage fabric technology”. The NVMe-oF specification is purposefully agnostic about the underlying fabric, but does lay out some key characteristics the “ideal underlying network or fabric technology” should have, because, as per the specification itself, “obviously, transporting NVMe commands across a network requires special considerations over and above those that are determined for local, in-storage memory”. Some of those key characteristics that the ‘ideal’ underlying fabric should have include “a reliable, credit-based flow control and delivery mechanism” that can “guarantee delivery at the hardware level without the need to drop frames or packets due to congestion”, the fact that the fabric should “impose no more than 10µs of latency end-to-end, including the switches”, or that “the fabric should be able to scale to tens of thousands of devices or more”. Could you think, off the top of your head, of a storage fabric technology that easily meets all of these ‘ideal’ requirements, and that has proven so for over two decades? Yeah, me too.
All while remaining agnostic, the NVMe-oF specification discusses two distinct types of fabric technologies that could transport NVMe over a network: those based on Remote Direct Memory Access (RDMA) on one side and those not based on RDMA on the other. Among the former group are Infiniband — the ‘native’ networked RDMA fabric technology — and its Ethernet-based alternatives, RDMA over Converged Ethernet in its second version (RoCE v2), and Internet Wide Area RDMA Protocol (iWARP). For those not versed in the High-Performance Computing (HPC) vernacular, RDMA is a protocol that allows a host to directly access shared memory space on another host, typically part of a supercomputing cluster. To briefly understand what RoCE and iWARP are, we could say that RoCE is to Infiniband what FCoE is to Fibre Channel — and when I say Fibre Channel in this context I mean SCSI over Fibre Channel — and iWARP is to Infiniband what iSCSI is to Fibre Channel, that is, two Ethernet/IP-based alternatives to the dominant, native fabric in their space — which in the case of HPC is Infiniband — that have tried, to different degrees of success, to position themselves against a technology that is often described as complex, expensive, requiring specialized skills and a dedicated infrastructure… wait, is that déjà vu again?
Between a RoCE and a hard place
Since the time leading to the release of the NVMe-oF specification in 2016, the proponents of Ethernet-based options to transport NVMe, particularly Mellanox — the main RDMA vendor proposing RoCE — went full-force on a marketing campaign to make it seem like RoCE was the only viable (or even existing) technology that could be used to connect NVMe devices over a fabric, and to claim that (once again) Fibre Channel was, yeah, you got it… dead. From blog posts in reputable publications making the bold claim (but hey, “this time it’s real”), to blog posts in their own website making the same claim again, to blog posts announcing the release of the NVMe-oF specification that completely ignore Fibre Channel other than as a side mention as part of the author’s experience to even much more recent blog posts trying to pretend that Fibre Channel doesn’t even exist with statements such as “Simply, NVMe-oF stands for NVME over Fabrics […] NVMe over Fabrics is essentially NVMe over RDMA”. In other forums, they will bring up arcane topics such as zero-copy that non-expert audiences know nothing about and imply that it was something essential to the high performance of NVMe-oF — it is — and that only RDMA can provide — this is not so — while ignoring the fact that Fibre Channel has supported it since the day the technology was invented. It’s no wonder, then, how RDMA — and RoCE in particular — took the spotlight and most of the media and analyst pundit attention when it came to NVMe-oF and it seemed like, once again, Fibre Channel was an old, legacy technology that would not be there to support the new, exciting innovations in storage that were coming to market. The message was the same it had always been: Fibre Channel is expensive, Fibre Channel is complex, it requires dedicated infrastructure and specialized skills; whereas IP/Ethernet is affordable, ubiquitous and everyone knows how to manage an Ethernet network, plus everyone already has an Ethernet network! Yeah, I know…
Where was Brocade during this initial time, then? Perhaps slightly distracted with, you know, being acquired by Broadcom… but that’s another story. In any case, Brocade wasn’t dormant when it came to actual product research and development and the emergence of NVMe-oF in the technology landscape didn’t go unnoticed. For one, little development was actually required to support running NVMe over Fibre Channel (NVMe/FC) on Brocade Gen 5 (16 Gbps) and Gen 6 (32 and 128 Gbps) switches and directors. In fact, technically, no development was required whatsoever just to be able to switch frames containing NVMe data, since Fibre Channel was developed as a transport protocol and NVMe is just another upper-level protocol (ULP) that is mapped onto it just like SCSI or FICON — which is essentially ESCON running over Fibre Channel (more on that in one the next few articles in this series). An NVMe/FC frame is no different than a SCSI/FC or an ‘ESCON/FC’ frame from a Fibre Channel point of view, and technically any Brocade switch going back to the first generation would be able to switch it. The only development required was for the name server to support devices registering NVMe as a supported ULP, which would enable NVMe initiator devices to easily discover available NVMe targets in the fabric, extending the benefits of the distributed fabric services to NVMe devices by virtue of running over Fibre Channel and connecting to the same fabric as other devices running other ULPs. This is, in fact, one of the great advantages of running NVMe over a Fibre Channel fabric: it can easily coexist and be deployed alongside existing devices, whether open systems (SCSI) or mainframe (FICON), without requiring the deployment of new infrastructure and without having to learn new ways of provisioning storage over an unknown fabric, therefore requiring the least amount of investment into either hardware or skills if you already run a Fibre Channel SAN. In addition, running new NVMe devices alongside your existing install base of SCSI-based storage devices enables really appealing use cases such as easy data migration from SCSI-based to newly-deployed NVMe-based arrays, snapshotting of existing databases running on SCSI-based arrays onto NVMe namespaces for fast big data analytics based on Machine Learning (ML) and Artificial Intelligence (AI), or extending SAN-based backup services to new NVMe-based storage devices.
In any case, all of the Ethernet-based transport options for NVMe and their attempts to displace Fibre Channel are based on the same set of half-truths that I have outlined from the beginning. In the next article I will address one of the primary arguments used against Fibre Channel by the proponents of said alternatives: that it is somehow incredibly complex while Ethernet is oh so simple and easy to deploy simply by virtue of being Ethernet.
If you would like more information on why Fibre Channel just makes sense for flash and NVMe, check out these papers: