VMware NSX

Expand all | Collapse all

Wrong BGP next hop programming

  • 1.  Wrong BGP next hop programming

    Posted Dec 02, 2017 04:35 PM

    Hi,

    Please consider below topology:

    DLR1(AS65001)             --->           ESG1(AS65001)        ------------------------------> Physical Router(AS65002)

    192.168.1.1/30                    192.168.1.2/30                      192.168.100.3/24            192.168.100.1/24

    - Static route 172.16.0.0/19 is configured on DLR1 and redistributed via BGP to ESG1

    - ESG1 advertises stiatic route 172.16.0.0/29 to Physical Router

    - static route 172.16.0.0/19 is advertised from ESG1 to Physical router with 192.168.1.1 as Next Hop which is wrong and Physical router didn't install the static route

    I believe the Next Hop attribute programming is incorrect when ESG1 advertise the route to physical router, NEXT Hop must be 192.168.100.3 instead of physical Router IP address (ie 192.168.1.1)

    Please find attached output from ESG, the output includes:

    - Routes advertised to Physical router 192.168.100.1

    - Routes received from DLR1 : 192.168.1.1

    - directly connected routes on ESG1

    Best Regards

    Abdelfatah ELARFAOUI



  • 2.  RE: Wrong BGP next hop programming

    Posted Dec 02, 2017 04:40 PM

    to reproduce the issue please consider the order of vaporization as below:

    - Creates static route then redistribute the static route

    or

    - Activate the redistribution and check static route/connected then create static route then clear bgp peer with the physical router



  • 3.  RE: Wrong BGP next hop programming

    Posted Dec 02, 2017 04:57 PM

    Sorry for typo:

    Correct description:

    - Static route 172.16.0.0/19 is configured on DLR1 and redistributed via BGP to ESG1

    - ESG1 advertises stiatic route 172.16.0.0/29 to Physical Router

    - static route 172.16.0.0/19 is advertised from ESG1 to Physical router with 192.168.100.1 as Next Hop which is wrong and Physical router didn't install the static route

    I believe the Next Hop attribute programming is incorrect when ESG1 advertise the route to physical router, NEXT Hop must be 192.168.100.3 instead of physical Router IP address (ie 192.168.100.1)

    Please find attached output from ESG, the output includes:

    - Routes advertised to Physical router 192.168.100.1

    - Routes received from DLR1 : 192.168.1.1

    - directly connected routes on ESG1



  • 4.  RE: Wrong BGP next hop programming

    Posted Dec 03, 2017 11:07 AM

    I couldn't find the attachment on this reply



  • 5.  RE: Wrong BGP next hop programming

    Posted Dec 02, 2017 05:06 PM

    If I remove static route then recreate it, the NEXT HOP programming is correct and route is advertised to Physical Router with ESG1 IP address 192.168.100.3 Thus Route is installed on Physical router. Please check attached output from ESG1

    Once the clear bgp session with physical router, ESG1 will advertise the static route with 192.168.100.1 as NEXT HOP which is wrong



  • 6.  RE: Wrong BGP next hop programming

    Posted Dec 03, 2017 09:02 AM

    There is probably a routing loop where the physical router advertise back the static route to the ESG or static route with that next hop.

    Could you share your static routes and the route filtering configuration?



  • 7.  RE: Wrong BGP next hop programming

    Posted Dec 03, 2017 10:55 AM

    There is no loop my friend, the topology is really a Flat topology!!

    I have included the procedure to reproduce this strange behavior which is not aligned with BGP RFC

    Best Regards

    Abdelfatah ELARFAOUI

    IP/MPLS Expert | CCIE R&S

    https://www.linkedin.com/in/elarfaouiabdelfatah/



  • 8.  RE: Wrong BGP next hop programming

    Posted Dec 03, 2017 08:53 AM

    Hi, BGP next hop is not changed on iBGP sessions.

    If we refer to the VMware® NSX for vSphere Network Virtualization Design Guide ver 3.0​, it has explanation on this topic too on page 68

    In eBGP/iBGP route exchange, when a route is advertised into iBGP, the next hop is carried unchanged into the iBGP domain.

    This may create dependencies on external routing domain stability or connectivity.

    To avoid external route reachability issues, the BGP next-hop-self feature or redistribution of a connected interface from which the next hop is learned is required.

    The BGP next-hop-self is not supported in current implementation, thus it is necessary to redistribute the ESG uplink interface (e.g., two VLANs that connect to physical routers) into the

    iBGP session towards the DLR. Proper filtering should be enabled on the ESG to make sure the uplinks’ addresses are not advertised back to physical routers as this can cause loops/failures.

    The solution is to redistribute the ESG1 uplink 192.168.100.3/24 into BGP towards the DLR so DLR can reach the physical router 192.168.100.1

    If you need more info on BGP around this specific topic, see these links

    BGP: Frequently Asked Questions - Cisco

    http://blog.ipspace.net/2011/08/bgp-next-hop-processing.html

    http://www.getnetworking.net/bgp/bgp-next-hop-self

    Question, what is the requirement behind the static route on DLR1?
    If you are going to configure static route to summarise the logical switch networks behind DLR, this normally done on the ESG as per design guide



  • 9.  RE: Wrong BGP next hop programming

    Posted Dec 03, 2017 10:50 AM

    Hi,

    Please review my topology, ESG1-Physical router is an EBGP session!! so as per RFC and normal EBGP behavior NEXT HOP will be changed!

    Best Regards

    Abdelfatah ELARFAOUI

    IP/MPLS Expert, CCIE R&S



  • 10.  RE: Wrong BGP next hop programming

    Posted Dec 03, 2017 11:05 AM

    Sorry I missed the ASN# detail in your first post. Which NSX version are you using? I'll try to simulate too in my lab



  • 11.  RE: Wrong BGP next hop programming

    Posted Dec 03, 2017 11:28 AM

    Hi,

    Thanks for your reply, NSX version 6.3.1

    To reproduce this issue please follow below procedure:

    - Topology:

    DLR2 (with default GW and directly connected subnets 172.16.1.0/24, 172.16.2.0/24,172.16.3.0/24) [192.168.1.1/30]----------> [192.168.1.2/30] DLR1 [192.168.13.1/24]-------------> [192.168.13.2/24]  ESG1 [192.168.100.3/24]---------------------------------------> [192.168.100.1/24] Physical Router

    - DLR2 uses default GW 0.0.0.0/0 pointing to DLR1 which in fact an ESG

    - DLR1 and ESG1 are on the same AS 65001

    - Physical Router on AS 65002

    - DLR1 is configured with summary route 172.16.0.0/21 pointing to DLR2

    - DLR1 redistribute static route and directly connected routes to ESG1

    - ESG1 redistribute directly connected routes

    To reproduce :

    - create static route Then redistribute the static route

    as Workaround: I am deleting the static route and recreate it then NEXT HOP is reprogrammed correctly by ESG1 but once I clear BGP sessions, the issue is reproduced

    Best Regards

    Abdelfatah ELARFAOUI

    IP/MPLS Expert , CCIE R&S

    https://www.linkedin.com/in/elarfaouiabdelfatah/



  • 12.  RE: Wrong BGP next hop programming

    Posted Dec 03, 2017 08:37 PM

    Before I continue to test the scenario, are you saying that you have DLR2 behind DLR1?

    Please note that building a multi-tier topology using only DLR instances is not supported and connecting multiple DLRs to a single ESG on shared VXLAN logical switch is also not supported

    See VMware® NSX for vSphere Network Virtualization Design Guide ver 3.0 page 73 on Unsupported Topologies



  • 13.  RE: Wrong BGP next hop programming

    Posted Dec 04, 2017 04:28 PM

    Please consider DLR1 as ESG,

    Best Regards

    Abdelfatah ELARFAOUI



  • 14.  RE: Wrong BGP next hop programming

    Posted Dec 08, 2017 03:24 AM

    Hi,

    Did you get the chance to simulate your Lab?

    Thanks



  • 15.  RE: Wrong BGP next hop programming

    Posted Dec 11, 2017 07:19 AM

    Hi,

    Cloud you please share your finding, I can help with debug if you couldn't successfully reproduce the issue

    Thanks



  • 16.  RE: Wrong BGP next hop programming

    Posted Dec 13, 2017 08:28 PM

    Not yet, probably this weekend. Will let you know how it goes



  • 17.  RE: Wrong BGP next hop programming

    Posted Dec 25, 2017 05:15 PM

    Hi,

    Did you get the chance to reproduce the issue?

    Thanks

    Abdelfatah ELARFAOUI

    IP/MPLS Expert, CCIE R&S, JNCIE-SP

    https://www.linkedin.com/in/elarfaouiabdelfatah/



  • 18.  RE: Wrong BGP next hop programming

    Posted Jan 05, 2018 12:05 PM

    Last try ... If no answer from your side I will consider it as a Bug!!

    Thanks



  • 19.  RE: Wrong BGP next hop programming

    Posted Jan 07, 2018 05:37 AM

    EBPG and IBGP next-hop processing is different and as in the previous messages next-hop self solves need for reachability of next-hop address since in that case static (or another Routing Protocol as OSPF for the unreachable next-hops from the physical to the DLR Ip address.

    For Optimal routing, next-hop announced to IBGP Peers is not changed, so if the next-hop (which is the DLR IP address of DLR-PLR transit link)  is not reachable on the Physical Router,  then routes (Logical Switch Subnets) announced from DLR to the Edge are not put into to routing table of the physical router being flagged as unreachable.

    EBGP has different next-hop announcement than IBPG, the next-hop announced by the Edge to the Physical router is the Edge itself instead of DLR IP address, and in that case next-hop self may not be necessary since Physical router already knows the directly connected link Edge-Physical connected IP address.

    One exception may be if the routes are redistributed into EBGP (static or ospf)  instead of originating locally on the Edge, or being learned from another BGP neighbor. In that case EBGP may behave similar to IBPG (i,e announcing the DLR IP address as next-hop instead of itself). Although again for route optimality, this may create problems again without next-hop self.

    This link may be helpful about the next-hops on BGP different scenarios:

    http://blog.ipspace.net/2011/08/bgp-next-hop-processing.html

    BGP next hop of a locally originated routes

    When a router originates a BGP route configured with a network router configuration command or through route redistribution (redistribute router configuration command), it sets the BGP next hop to the IGP next hop (the same value you’d find in the IP routing table). BGP next hop is set to 0.0.0.0 for routes with unknown next hops – connected interfaces, static routes to null 0 or summary routes configured with aggregate-address router configuration command.

    When a BGP route with missing next hop is sent to BGP neighbors, the BGP next hop is set to the source IP address of the BGP session.

    If this is per RFC every BGP implementation should behave like this, but if left optional then this may change from different products.

    So if the same test is done by configuring BGP between DLR and Edge  instead of static route redistribution on the Edge, the next-hop may change to the Edge IP address. DLR-Edge may be IBGP as well as EBGP, since the Edge in both cases learns these routes from BGP instead of locally originating due to static routes.

    On SDDC Reference some designs Ospf is recommended, and for some designs EBGP or IBGP is recommended:

    https://pubs.vmware.com/vmware-validated-design-41/topic/com.vmware.ICbase/PDF/vmware-validated-design-41-sddc-ospf-bgp-routing.pdf

    The VMware Validated Design documentation supports the following configuration:

    • Use eBGP between the physical environment (ToR) and ECMP-enabled NSX Edge (ESG) devices.
    • Use iBGP between NSX ESGs and UDLRs and DLRs.
    • On the NSX ESGs, configure route redistribution between the physical and software-defined infrastructure


  • 20.  RE: Wrong BGP next hop programming

    Posted Jan 07, 2018 11:00 AM

    Hi,

    Please refer to previous replies if you are interested in reproducing this issue!!!

    Best Regards

    Abdelfatah ELARFAOUI

    IP/MPLS Expert CCIE/JNCIE/VCIX6-NV certified



  • 21.  RE: Wrong BGP next hop programming

    Posted Jan 08, 2018 04:24 PM

    After rereading noticed the current topology is different than the first one, but for clarification on the first one, NSX-edge-3-0 is advertising 172.16.0.0/19 network to physical router 192.168.100.1 with next hop again 192.168.100.1  which is physical router itself, which looks like there is a routing loop or similar.

    Also about the second topology with DLR<-->ESG(DLR1)<-->ESG1<-->Physical Router as DLR1 advertises 172.16.0.0/21 route to ESG1, since it is IBGP it should advertise it with next-hop 192.168.1.1. How does ESG1 know about next-hop 192.168.1.1 if there is no other static route or IGP as Ospf on ESG1? If ESG1 does not know 192.168.1.1 in its routing table then it may mark the route as inaccassable and may not advertise it to Physical router?

    Is it possible to send the Routing Table, BGP Table, advertised to and received BGP Updates from neighbors for ESG(DLR1), ESG1 and Physical router for both of  wrong advertising  situation, after creating the Next Hop reprogrammed correctly, and after clearing the Bgp sessins again reproducing the wrong annnouncement?

    Also during the wrong announcement phase, are the routing and Bgp tables stable or do they change transiently like every 30 sedonds?

    create static route Then redistribute the static route

    as Workaround: I am deleting the static route and recreate it then NEXT HOP is reprogrammed correctly by ESG1 but once I clear BGP sessions, the issue is reproduced



  • 22.  RE: Wrong BGP next hop programming

    Posted Jan 08, 2018 05:15 PM

    Dear,

    Please try to understand the issue before posting any reply. Kindly refer to previous replies for your understanding!! Also below are answers to your concerns:

    - There is only one topology I have just clarified the naming!!

    - Could  you please give me one scenario can lead to loop :smileygrin: taking into considerateness that the topology is a Flat topology :smileyhappy:.  Wondering how you are suspecting loop on Flat topology!!

    - ESG 1 is directly connected to DLR1 (for info DLR1 is in fact an ESG I named it as DLR1) so you can conclude how BGP NextHop is resolved :smileyhappy:

    For your reference, please find below details of this issue:

    To reproduce this issue please follow below procedure:

    - Topology:

    DLR2 (with default GW and directly connected subnets 172.16.1.0/24, 172.16.2.0/24,172.16.3.0/24) [192.168.1.1/30]----------> [192.168.1.2/30] DLR1 [192.168.13.1/24]-------------> [192.168.13.2/24]  ESG1 [192.168.100.3/24]---------------------------------------> [192.168.100.1/24] Physical Router

    - DLR2 uses default GW 0.0.0.0/0 pointing to DLR1 which is in fact an ESG

    - DLR1 and ESG1 are on the same AS 65001

    - Physical Router on AS 65002

    - DLR1 is configured with summary route 172.16.0.0/21 pointing to DLR2

    - DLR1 redistribute static route and directly connected routes to ESG1

    - ESG1 redistribute directly connected routes

    To reproduce :

    - create static route Then redistribute the static route

    as Workaround: I am deleting the static route and recreate it then NEXT HOP is reprogrammed correctly by ESG1 but once I clear BGP sessions, the issue is reproduced

    Respectfully Yours!!