VMware vSphere

 View Only
Expand all | Collapse all

NetApp FAS & NFS Multi-path

  • 1.  NetApp FAS & NFS Multi-path

    Posted Jul 23, 2014 02:15 PM

    Hello,

    I have read NetApp storage best practise, http://www.netapp.com/us/media/tr-3749.pdf - and other supporting documentation, I am wondering if port-channels are worth the hassle to achieve NFS MPIO, is there a better to do it? OR does it come down to multiple subnets export or port-channel with multiple exports? 



  • 2.  RE: NetApp FAS & NFS Multi-path

    Posted Jul 23, 2014 04:17 PM

    Hey vfk,

    From the reading I have done on it, the only way to get MPIO to NFS is either different subnets or you createa  etherchannel port to your ESXi hosts nics that will be used for NFS.  However to get true NFS mulitpath this way you will need to do etherchannel to the NetApp as well and I don't remeber if the bond on the NetApp will do etherchannel or not.  Either way if you etherchannel to an ESXi host it reduces the single point of failure at that point.  However if you are not running ESXi 5.5 and don't setup the etherchannel group with the new Web Gui option where it lets you select weather its active/passive, hash forumal, ect it won't be true LACP, as prior to the new LACP in 5.5, it was just using a VERY basic IP hash which will probably still only pick one path out of the etherchannel for each send in a round robin kind of fashion.

    Do you want LACP for the throughput purpose or stictly just for failover?


    Hope this has helped.



  • 3.  RE: NetApp FAS & NFS Multi-path

    Posted Jul 23, 2014 04:33 PM

    We have 10Gbe links, there is no issue of throughput it is more for redundancy and load sharing. I am on esxi 5.5 right now, but I plan to upgrade in the next couple months.   Chris Wahl What do you think?  Do you have any input on this?  I have read your book (very nice), but on the NFS design, you mention LAG on the esxi upstream.  My understanding is same JPM300, LAG on esxi uplink and LAG storage ports.



  • 4.  RE: NetApp FAS & NFS Multi-path

    Posted Jul 23, 2014 04:37 PM

    Yes LAG is the way to go, however check out this free hands on lab on the new LACP protocol on 5.5:

    http://labs.hol.vmware.com/HOL/catalogs/catalog/130

    Click on the Focus: Networking
    Select vSphere Distributed Switch from A to Z and just skip to the LACP part.  It has some pretty neat new interesting options

    However this only applies if you are using VDS



  • 5.  RE: NetApp FAS & NFS Multi-path

    Posted Jul 23, 2014 04:57 PM

    We have a license for enterprise plus, I know the guys here have tried implementing it but run into issues and went back to VSS.  I want to stabilise the environment and bring everything up-to-date before I start changing the architecture.



  • 6.  RE: NetApp FAS & NFS Multi-path

    Posted Jul 23, 2014 05:30 PM

    Yeah then LAG and your standard IP hash on the VSS would be the way you would want to do it assuming you go with the etherchannel option instead of the subnet way.



  • 7.  RE: NetApp FAS & NFS Multi-path

    Posted Jul 24, 2014 12:56 AM

    To be 100% clear, there is no such thing as multipathing with NFS v3. Each NFS session will always take one path through the network. If you want multipathing (session trunking), you'll need NFS v4.1 - which is not supported by ESXi 5.5.

    The real question, then, is how do I use all of my network interfaces for NFS traffic? On the storage array, we pretty much assume a LAG has been created, because it is an aggregation point for multiple NFS sessions. Specifically in the case of your NetApp storage array, it's pretty common to LAG the VIFs (LIFs) to help distribute IO across multiple interfaces to clients wanting to mount the export.

    This has no appreciable affect on the ESXi hosts, nor the number of paths they will use for NFS sessions. Unless you're bumping into a saturation point on your 10 GbE vmnics, there's very little reason to bother with trying to tinker with your NFS vmkernel traffic. If you are in that boat, then yes - you'll have to either set up multiple subnets and exports on the array, or set up a LAG on the hosts (either static EtherChannel or dynamic LACP, both will work in this use case with the same effect - although I prefer LACP when possible) along with additional VIFs on the storage array.

    I hope this helps.



  • 8.  RE: NetApp FAS & NFS Multi-path

    Posted Jul 24, 2014 09:21 AM

    OK, that reaffirms my understanding NFS implementation in vSphere.  In the past I have only worked with iSCSI or FC and we have only NFS as ISO, template, vm archive datastore.  Anyway, here is what I have inherited in the environment:  All 10GbE, four in total..two for storage and two VM Network.

    • NetApp FAS (7-mode)  typical config, each controller owning half of the disk, two aggr...and vols on top and so on .....
    • Trunked interfaces (src-dst-ip hash)  VIF (LIF) and muliple exports on different IPs - single subnet.
    • On the esxi side, no trunk, single vmk interface, two uplinks, default vswitch settings (Route based on the originating port ID).  ONLY one uplink is ever used. 
    • I am no where near maxing the the 10Gb most of the time, but we do have peaks around end of the month, and usually get to 30-40% utilisation.


    Give then above, I would like to make use of both links, if possible, without making the environment complicated unnecessarily.  I want to avoid using multiple subnets, I know the team will frown on this.  So this mean LAG is the only option to get some sort of load sharing or am I missing something and by the sounds of it LAG is not worth the hassle. 

    I have four 10GbE, I plan to implement vDS, possibly hybrid and use onboard server nic for vCenter management network, and everything else on vDS.  I am planning to keep things close the current implementation, two nics for data, vmotion etc etc and two nics for nfs storage.  What do you guys think?  Chris Wahl  JPM300



  • 9.  RE: NetApp FAS & NFS Multi-path

    Posted Jul 24, 2014 02:02 PM

    Hey vfk,

    Much like you I've mostly used NFS for ISO Datastores and Dev stuff and have mainly stuck with iSCSI and FCP in the past.  One thing I know is NFS doesn't bind to any one VMK port and usally just pics the lowest number in that particular subnet.  So lets say you have management on 192.168.1.x and you have four vmk's on 10.0.1.X(vmk2 - 10.0.1.52, vmk3 - 10.0.1.53, vmk4 -10.0.1.54, vmk5 - 10.0.1.55) which is your storage IP range.   When a traffic request goes to NFS it will probably pick vmk2 for most things until it can't get that path.  It will then pick vmk3 and so on.  However if you had your NFS storage on 192.168.1.x it would probably go through your management VMK.  I see this happen a lot when people don't put there NFS storage IP's in different networks from there management IP, so traffic typically goes out the management vmk as its the lowest number.  However if you where to isolate your vmk's like this:

    vmk2 - 10.0.1.52

    vmk3 - 10.0.2.52

    vmk4 - 10.0.3.52

    vmk5 - 10.0.4.52

    You would have more controll over where the traffic is going.  Chris also did a good lab test on leveraging the VDS's new Load Balancing Team protocol(LBT) with vmk's / NFS in this article:

    NFS on vSphere Part 4 – Technical Deep Dive on Load Based Teaming | Wahl Network

    His findings where LBT acutally load balanced the NFS vmk's when loads where high as the LBT protocol doesn't care about portrgroups as a delimiting factor.  Meaning you may be able to get the load balacning your looking for with NFS without having to use a LAG/LACP trunk.

    I'm a huge fan of LBT in the VDS as I think it does a great job and is REALLY easy to install as it requires no configuration at the switch level.  It is essentially Orig Port ID however vCenter tracks where it is putting things as far as the Orig Port goes and if that port gets to saturated it moves its assignment.  You can even adjust the % level at which LBT moves things around as well.  Also if your moving to VDS another thing to look into is the Network IO controll.  In 5.X you can now create your own personalized profiles and assign them to port groups for more finite controll over how much speed you want to give it.  with 10GB becoming more popular and people just throwing everything into 1 big 10GB bucket I can see this becoming MUCH more usefull.

    With that said I would like to see what Chris thinks as he has more experience with NFS and could probably better direct you.

    However your idea moving forward with the VDS is good, I too keep my management on a VSS and move everything else to the VDS.  It's not that you can't have your management on VDS as I've done it a few times, but every now and then you run into a quarky problem where having your management on a VDS is a bit of a pain, aka something goes wrong and you need to reset your management network, you messed up on a VDS migration and lost access to your host, moving from 1 VDS to another, or moving from one vCenter to another makes it a bit trickier.   For those reasons I personally keep my management on a standard VSS still, however different folks different strokes and its really a prefference.

    Hope this has helped



  • 10.  RE: NetApp FAS & NFS Multi-path
    Best Answer

    Posted Jul 24, 2014 03:40 PM

    Since you want to use multiple uplinks on the ESXi hosts for NFS storage traffic, all within a single subnet, your only option is to use a LAG on the hosts and multiple exports on the storage array. Each export should correspond to an IP (VIF) with a unique least significant bit (LSB).

    Follow pages 280-283 from the Networking for VMware Administrators book, specifically with the configuration below, and pay special attention to the LSB.



  • 11.  RE: NetApp FAS & NFS Multi-path

    Posted Jul 24, 2014 03:42 PM

    LBT or LAG ??  Where do you stand?



  • 12.  RE: NetApp FAS & NFS Multi-path

    Posted Jul 24, 2014 03:47 PM

    Using LBT would not meet your requirements. It requires multiple subnets.



  • 13.  RE: NetApp FAS & NFS Multi-path

    Posted Jul 24, 2014 04:41 PM

    This has been a learning curve, thanks guys, I know more confident about NFS.  10GbE is definitely the way to go.  I think I will use LAG and single vmk.



  • 14.  RE: NetApp FAS & NFS Multi-path

    Posted Jul 24, 2014 01:04 AM

    JPM300 wrote:

    However if you are not running ESXi 5.5 and don't setup the etherchannel group with the new Web Gui option where it lets you select weather its active/passive, hash forumal, ect it won't be true LACP, as prior to the new LACP in 5.5, it was just using a VERY basic IP hash which will probably still only pick one path out of the etherchannel for each send in a round robin kind of fashion.

    This is technically incorrect. LACP is LACP. :smileyhappy:

    The use of Enhanced LACP, introduced in 5.5, simply provides additional load distribution methods beyond src-dst-ip.

    Also, I've bolded part of your response to clear something up. There is no concept of round robin traffic flows for NFS provided by any sort of LAG. The load distribution methods are all deterministic based on source and destination identities (IP, MAC, Port, VLAN). When storage is concerned, nothing changes - it's always the same host talking to the same storage array. Because of this, the variables remain identical each time the LAG chooses an uplink, and thus the same uplink will be picked time and time again. This holds true for any type of LAG (either a static EtherChannel or a dynamic LACP) because it is the load distribution method that chooses the uplink. There are a few cases where a new uplink might be chosen (think port hashing and an NFS session is disconnected and reconnected to a new port), but not a typical event.



  • 15.  RE: NetApp FAS & NFS Multi-path

    Posted Jul 24, 2014 02:08 AM

    Thanks for clearing that up Chris :smileyhappy: also thanks for the DCA check list, it came in handy when I wrote my DCA :smileyhappy:  I don't suppose you have one for the DCD?

    However even though LACP isn't going to give him true load balancing, it will still give him the automated failover he was looking for no?  Or could the same just be achieved with another VMK port and VMware will just pick the next VMK port in line in the event of a failure.  I typically stick with iSCSI and FCP, however with the cost of 10GB droping and becoming more mainstream I'm seeing NFS pickup in popularity a lot for the bulk of people's storage due to its simplicity.



  • 16.  RE: NetApp FAS & NFS Multi-path

    Posted Jul 24, 2014 03:27 PM

    A DCD study sheet is on my list, along with updating the DCA for 5.5.

    However even though LACP isn't going to give him true load balancing, it will still give him the automated failover he was looking for no?

    Failover is provided by the teaming policy on the port group. If a vmnic becomes disconnected, the VMK will be migrated to the next active or standby vmnic. A LAG is typically a bit faster to converge, but failover does not require a LAG.

    Read these two posts for more details on pro's and con's for both setups: