VMware NSX

  • 1.  Replication Modes and ARP Supression

    Posted Apr 13, 2018 10:09 PM

    Based on the replication modes. I see that multicast does no use the NSX controller but instead BIM traffic is just sent out via multicast.

    But the other modes instead use ARP suppression to query the NSX controller first, therefore eliminating the broadcast to the VMs.

    Questions:

    • Why would you enable hybrid or unicast if the ARP requests will be suppressed anyway?
    • If the Controller knows the ARP, will it not copy it to each of the hosts? Rather then the host querying the controller?
    • Why does multicast not use ARP suppression?


  • 2.  RE: Replication Modes and ARP Supression

    Posted Apr 14, 2018 12:42 PM

    Without Hybrid or unicast modes, the only option is Multicast, this option has no arp suppression, though a an arp cache could be helpful for other VMs asking for the same Mac address would be replied from this cache instead of re-multicasting the for the same arp broadcast again.

    If the controller would push and synch all of  its arp table entries, this would be cpuwise and memorywise could be tedious for the resources and could not scale for large number of VMs and hosts. As the query takes in msec range, asking the controller would not create too much delay. (After the response host caches the arp entry for some time idle-timeout)

    the links in this thread could help detail on the tables on hosts and controllers

    NSX Replication Modes and the NSX Controller

    Also this link very detailed on arp tables

    Solving the mystery of VTEP,Mac and Arp tbles

    https://m.youtube.com/watch?v=sPu04PSpBts



  • 3.  RE: Replication Modes and ARP Supression

    Posted Apr 14, 2018 07:48 PM

    Thanks. But Im still unclear why multicast has no arp supression whilst unicast and hybrid do?



  • 4.  RE: Replication Modes and ARP Supression

    Posted Apr 15, 2018 05:38 AM

    Vxlan Control Plane is how the Mac and Arp Tables are formed, and the mechanism may change according to the technology. The Arp Suppression Mechanism may change for each of these Control Plane technologies.

    The VXLAN Control Planes may be :

    • Multicast

    The reason why Multicast has no arp suppression, could be because there is no inherent mechanismon Multicast  to replicate or synchronize the Arp tables on different VTEP Hosts. It is a technology to deliver packets to more than one destination similar to radio stations, tv channels. Arp Protocol is designed to learn MAC address of a known IP address in the same subnet such as default gateway. Since VXLAN needs arp protocol learning because it is a Overlay Tunneling Protocol over L3 subnets, without any enhancement there is no way of creating a VXLAN packet to a remote VTEP on another IP subnet.

    Multicast is a development for BUM Packets because sending a singleMulticast  BUM Frame and receiving of this frame by only "interested" hosts provides scalability and efficiency wrt to blindly broadcasting or flooding this packet to every other VTEP Host.

    https://blogs.vmware.com/vsphere/2013/04/vxlan-series-different-components-part-1.html

    https://blogs.vmware.com/vsphere/2013/05/vxlan-series-multicast-basics-part-2.html

    https://blogs.vmware.com/vsphere/2013/05/vxlan-series-multicast-usage-in-vxlan-part-3.html

    https://blogs.vmware.com/vsphere/2013/05/vxlan-series-multiple-logical-networks-mapped-to-one-multicast-group-address-part-4.html

    https://blogs.vmware.com/vsphere/2013/05/vxlan-series-how-vtep-learns-and-creates-forwarding-table-part-5.html

    • MP-BGP EVPN  VXLAN Control Plane

    MP-BGP acts as the "Controller Cluster", similar to NSX Controller-Cluster. In this mode every VTEP joins the BGP Protocol, and similar to distribution of IP Routing table, MAC, VTEP and ARP tables are Synchronized or Replicated through MP-BGP Protocol. So in this mode of control plane every VTEP (In that case TOR Physical Switches) has the same view of these tables. Since the VTEP is part of the "Controller Cluster", there is no need for other specific Controller Nodes deployed as whole of the table is already on the VTEP. In this mode VTEP-1 knows the Arp Request asked by VTEP-2 and received through VTEP-3 because VTEP-3 has redistributed this Arp entry during first visibility either by itself broadcasting arp or for silent hosts after first asked for.  If there are 10 VTEPs, all of them has the same ARP table.

    In this mode Arp Suppression is an enhancement over Multicast similar to Controllers.

    • IS-IS Protocol
    • Controllers

    Since Physical switches need special ASICs for VXLAN Processing and MP-BGP, managing MPBGP EVPN control plane is complex and expensive on hardware side. NSX decouples this complexity by providing Controllers to handle these mechanisms and allows the Underlying Physical Switch Fabric to be simpler and more scalable. The Underlying Physical switches don't have to support  Protocols such as Multicast PIM, VXLAN, MP-BGP EVPN. This makes it possible to provide new features and innovation through a software upgrade without changing the Physical Switches. (It is way  faster and easier to deploy NSX on an existing Vsphere Infrastructure rather than changing the Underlying Physical switches and Cabling). It also allows to use different physical hardware on different sites rather than relying on a single hardware solution.

    These links could provide more detailed abouth these mechanisms:

    http://www.routetocloud.com/2014/12/nsx-v-ip-discovery/

    The UWA will send out query to NSX controller and ask if he know MAC2, since controller already know this Controller will  send unicast message back to VM1 with MAC2, the ARP broadcast message will not send out to all VM’s  in VXLAN 5001.

    Note: There is 3 min timer in NSX controller for ARP query, if host send same query in this time frame the controller ignore this request and broadcast message will be send out to all VM in the logical switch

    https://www.cisco.com/c/en/us/products/collateral/switches/nexus-7000-series-switches/white-paper-c11-735015.html#_Toc423120404

    ARP suppression is an enhancement provided by the MP-BGP EVPN control plane to reduce network flooding caused by broadcast traffic from ARP requests.

    When ARP suppression is enabled for a VNI, its VTEPs each maintain an ARP suppression cache table for known IP hosts and their associated MAC addresses in the VNI segment. As illustrated in Figure 10, when an end host in the VNI sends an ARP request for another end-host IP address, its local VTEP intercepts the ARP request and checks for the ARP-resolved IP address in its ARP suppression cache table. If it finds a match, the local VTEP sends an ARP response on behalf of the remote end host. The local host then learns the MAC address of the remote host in the ARP response. If the local VTEP doesn’t have the ARP-resolved IP address in its ARP suppression table, it floods the ARP request to the other VTEPs in the VNI. This ARP flooding can occur for the initial ARP request to a silent host in the network. The VTEPs in the network don’t see any traffic from the silent host until another host sends an ARP request for its IP address, and an ARP response is sent back. After the local VTEP learns about the MAC and IP addresses of the silent host, the information is distributed through the MP-BGP EVPN control plane to all other VTEPs. Any subsequent ARP requests do not need to be flooded.

    Because most end hosts send GARP or RARP requests to announce themselves to the network immediately after they come online, the local VTEP immediately has the opportunity to learn their MAC and IP addresses and distribute this information to other VTEPs through the MP-BGP EVPN control plane. Therefore, most active IP hosts in VXLAN EVPN should be learned by the VTEPs either through local learning or control-plane-based remote learning. As a result, ARP suppression reduces the network flooding caused by host ARP learning behavior

    https://adamraffe.com/2013/06/24/enhanced-vxlan-who-needs-multicast/

    https://docs.cumulusnetworks.com/display/DOCS/Ethernet+Virtual+Private+Network+-+EVPN

    Ethernet Virtual Private Network (EVPN) is a standards-based control plane for VXLAN defined in RFC 7432 and draft-ietf-bess-evpn-overlay that allows for building and deploying VXLANs at scale. It relies on multi-protocol BGP (MP-BGP) for exchanging information and is based on BGP-MPLS IP VPNs (RFC 4364). Hence, it has provisions to enable not only bridging between end systems in the same layer 2 segment but also routing between different segments (subnets). There is also inherent support for multi-tenancy. EVPN is often referred to as the means of implementing controller-less VXLAN.

    Cumulus Linux fully supports EVPN as the control plane for VXLAN, including for both intra-subnet bridging and inter-subnet routing. Key features include:

    • VNI membership exchange between VTEPs using EVPN type-3 (Inclusive multicast Ethernet tag) routes.
    • Exchange of host MAC and IP addresses using EVPN type-2 (MAC/IP advertisement) routes.
    • Support for host/VM mobility (MAC and IP moves) through exchange of the MAC Mobility Extended community.
    • Support for dual-attached hosts via VXLAN active-active mode. MAC synchronization between the peer switches is done using MLAG.
    • Support for ARP/ND suppression, which provides VTEPs with the ability to suppress ARP flooding over VXLAN tunnels.
    • Support for exchange of static (sticky) MAC addresses through EVPN.
    • Support for distributed symmetric routing between different subnets.
    • Support for distributed asymmetric routing between different subnets.
    • Support for centralized routing.
    • Support for prefix-based routing using EVPN type-5 routes (EVPN IP prefix route)
    • Support for layer 3 multi-tenancy.

    https://www.arista.com/assets/data/pdf/Whitepapers/VXLAN_Scaling_Data_Center_Designs.pdf

    VXLAN Implementation The network infrastructure must support the following to support VXLANS: • Multicast support: IGMP and PIM • Layer 3 routing protocol: OSPF, BGP, IS-IS For the most part, networking devices process VXLAN traffic transparently. That is, IP encapsulated traffic is switched or routed as any IP traffic would be. VXLAN gateways, also called Virtual Tunnel End Points (VTEP), provide the encapsulating/de-encapsulating services central to VXLAN. VTEPS can be virtual bridges in the hypervisor, VXLAN aware VM applications or VXLAN capable switching hardware. VTEPs are key to virtualizing networks across the existing data center infrastructure.

    https://eos.arista.com/vxlan-without-controller-for-network-virtualization-with-arista-physical-vteps/

      With VXLAN, BUM traffic still exists and still needs to be sent to the unknown destination(s) in the Layer2 domain. As previously discussed in the fundamentals section about unicast replication (HER), there are two ways to populate the unicast HER flood list: manually (in CLI), or automatically with CloudVision (CVX with the VXLAN service).   In the below illustration, Host A sends a frame destined to Host D, but MAC D is unknown by VTEP1. VTEP1 will therefore follow the flooding behaviour expected for BUM traffic, and replicate to the VTEP IP addresses listed in VTEP1’s flood-list: VTEP2 and VTEP3. 



  • 5.  RE: Replication Modes and ARP Supression

    Posted Apr 17, 2018 05:58 PM

    I believe one of the key reasons ARP suppression is not used in that situation is because the NSX control cluster is not needed/required for VXLAN networking in a purely multicast setup. ARP suppression requires controller lookups and caching. Although it's not very common to see, you could deploy logical switches using multicast replication and not have any controllers at all. Most still deploy them though - they are still needed for DLR purposes.

    Regards,

    Mike