VMware NSX

Back to discussions

Expand all | Collapse all

Wrong BGP next hop programming

1. Wrong BGP next hop programming

Recommend
Taupin
Posted Dec 02, 2017 04:35 PM

Reply Reply Privately
Hi,
Please consider below topology:
DLR1(AS65001) ---> ESG1(AS65001) ------------------------------> Physical Router(AS65002)
192.168.1.1/30 192.168.1.2/30 192.168.100.3/24 192.168.100.1/24
- Static route 172.16.0.0/19 is configured on DLR1 and redistributed via BGP to ESG1
- ESG1 advertises stiatic route 172.16.0.0/29 to Physical Router
- static route 172.16.0.0/19 is advertised from ESG1 to Physical router with 192.168.1.1 as Next Hop which is wrong and Physical router didn't install the static route
I believe the Next Hop attribute programming is incorrect when ESG1 advertise the route to physical router, NEXT Hop must be 192.168.100.3 instead of physical Router IP address (ie 192.168.1.1)
Please find attached output from ESG, the output includes:
- Routes advertised to Physical router 192.168.100.1
- Routes received from DLR1 : 192.168.1.1
- directly connected routes on ESG1
Best Regards
Abdelfatah ELARFAOUI
2. RE: Wrong BGP next hop programming

Recommend
Taupin
Posted Dec 02, 2017 04:40 PM

Reply Reply Privately
to reproduce the issue please consider the order of vaporization as below:
- Creates static route then redistribute the static route
or
- Activate the redistribution and check static route/connected then create static route then clear bgp peer with the physical router
3. RE: Wrong BGP next hop programming

Recommend
Taupin
Posted Dec 02, 2017 04:57 PM

Reply Reply Privately
Sorry for typo:
Correct description:
- Static route 172.16.0.0/19 is configured on DLR1 and redistributed via BGP to ESG1
- ESG1 advertises stiatic route 172.16.0.0/29 to Physical Router
- static route 172.16.0.0/19 is advertised from ESG1 to Physical router with 192.168.100.1 as Next Hop which is wrong and Physical router didn't install the static route
I believe the Next Hop attribute programming is incorrect when ESG1 advertise the route to physical router, NEXT Hop must be 192.168.100.3 instead of physical Router IP address (ie 192.168.100.1)
Please find attached output from ESG, the output includes:
- Routes advertised to Physical router 192.168.100.1
- Routes received from DLR1 : 192.168.1.1
- directly connected routes on ESG1
4. RE: Wrong BGP next hop programming

Recommend
bayupw
Posted Dec 03, 2017 11:07 AM

Reply Reply Privately
I couldn't find the attachment on this reply
5. RE: Wrong BGP next hop programming

Recommend
Taupin
Posted Dec 02, 2017 05:06 PM

Reply Reply Privately
If I remove static route then recreate it, the NEXT HOP programming is correct and route is advertised to Physical Router with ESG1 IP address 192.168.100.3 Thus Route is installed on Physical router. Please check attached output from ESG1
Once the clear bgp session with physical router, ESG1 will advertise the static route with 192.168.100.1 as NEXT HOP which is wrong
6. RE: Wrong BGP next hop programming

Recommend
bayupw
Posted Dec 03, 2017 09:02 AM

Reply Reply Privately
There is probably a routing loop where the physical router advertise back the static route to the ESG or static route with that next hop.
Could you share your static routes and the route filtering configuration?
7. RE: Wrong BGP next hop programming

Recommend
Taupin
Posted Dec 03, 2017 10:55 AM

Reply Reply Privately
There is no loop my friend, the topology is really a Flat topology!!
I have included the procedure to reproduce this strange behavior which is not aligned with BGP RFC
Best Regards
Abdelfatah ELARFAOUI
IP/MPLS Expert | CCIE R&S
https://www.linkedin.com/in/elarfaouiabdelfatah/
8. RE: Wrong BGP next hop programming

Recommend
bayupw
Posted Dec 03, 2017 08:53 AM

Reply Reply Privately
Hi, BGP next hop is not changed on iBGP sessions.
If we refer to the VMware® NSX for vSphere Network Virtualization Design Guide ver 3.0, it has explanation on this topic too on page 68
In eBGP/iBGP route exchange, when a route is advertised into iBGP, the next hop is carried unchanged into the iBGP domain.
This may create dependencies on external routing domain stability or connectivity.
To avoid external route reachability issues, the BGP next-hop-self feature or redistribution of a connected interface from which the next hop is learned is required.
The BGP next-hop-self is not supported in current implementation, thus it is necessary to redistribute the ESG uplink interface (e.g., two VLANs that connect to physical routers) into the
iBGP session towards the DLR. Proper filtering should be enabled on the ESG to make sure the uplinks’ addresses are not advertised back to physical routers as this can cause loops/failures.
The solution is to redistribute the ESG1 uplink 192.168.100.3/24 into BGP towards the DLR so DLR can reach the physical router 192.168.100.1
If you need more info on BGP around this specific topic, see these links
BGP: Frequently Asked Questions - Cisco
http://blog.ipspace.net/2011/08/bgp-next-hop-processing.html
http://www.getnetworking.net/bgp/bgp-next-hop-self
Question, what is the requirement behind the static route on DLR1?
If you are going to configure static route to summarise the logical switch networks behind DLR, this normally done on the ESG as per design guide
9. RE: Wrong BGP next hop programming

Recommend
Taupin
Posted Dec 03, 2017 10:50 AM

Reply Reply Privately
Hi,
Please review my topology, ESG1-Physical router is an EBGP session!! so as per RFC and normal EBGP behavior NEXT HOP will be changed!
Best Regards
Abdelfatah ELARFAOUI
IP/MPLS Expert, CCIE R&S
10. RE: Wrong BGP next hop programming

Recommend
bayupw
Posted Dec 03, 2017 11:05 AM

Reply Reply Privately
Sorry I missed the ASN# detail in your first post. Which NSX version are you using? I'll try to simulate too in my lab
11. RE: Wrong BGP next hop programming

Recommend
Taupin
Posted Dec 03, 2017 11:28 AM

Reply Reply Privately
Hi,
Thanks for your reply, NSX version 6.3.1
To reproduce this issue please follow below procedure:
- Topology:
DLR2 (with default GW and directly connected subnets 172.16.1.0/24, 172.16.2.0/24,172.16.3.0/24) [192.168.1.1/30]----------> [192.168.1.2/30] DLR1 [192.168.13.1/24]-------------> [192.168.13.2/24] ESG1 [192.168.100.3/24]---------------------------------------> [192.168.100.1/24] Physical Router
- DLR2 uses default GW 0.0.0.0/0 pointing to DLR1 which in fact an ESG
- DLR1 and ESG1 are on the same AS 65001
- Physical Router on AS 65002
- DLR1 is configured with summary route 172.16.0.0/21 pointing to DLR2
- DLR1 redistribute static route and directly connected routes to ESG1
- ESG1 redistribute directly connected routes
To reproduce :
- create static route Then redistribute the static route
as Workaround: I am deleting the static route and recreate it then NEXT HOP is reprogrammed correctly by ESG1 but once I clear BGP sessions, the issue is reproduced
Best Regards
Abdelfatah ELARFAOUI
IP/MPLS Expert , CCIE R&S
https://www.linkedin.com/in/elarfaouiabdelfatah/
12. RE: Wrong BGP next hop programming

Recommend
bayupw
Posted Dec 03, 2017 08:37 PM

Reply Reply Privately
Before I continue to test the scenario, are you saying that you have DLR2 behind DLR1?
Please note that building a multi-tier topology using only DLR instances is not supported and connecting multiple DLRs to a single ESG on shared VXLAN logical switch is also not supported
See VMware® NSX for vSphere Network Virtualization Design Guide ver 3.0 page 73 on Unsupported Topologies
13. RE: Wrong BGP next hop programming

Recommend
Taupin
Posted Dec 04, 2017 04:28 PM

Reply Reply Privately
Please consider DLR1 as ESG,
Best Regards
Abdelfatah ELARFAOUI
14. RE: Wrong BGP next hop programming

Recommend
Taupin
Posted Dec 08, 2017 03:24 AM

Reply Reply Privately
Hi,
Did you get the chance to simulate your Lab?
Thanks
15. RE: Wrong BGP next hop programming

Recommend
Taupin
Posted Dec 11, 2017 07:19 AM

Reply Reply Privately
Hi,
Cloud you please share your finding, I can help with debug if you couldn't successfully reproduce the issue
Thanks
16. RE: Wrong BGP next hop programming

Recommend
bayupw
Posted Dec 13, 2017 08:28 PM

Reply Reply Privately
Not yet, probably this weekend. Will let you know how it goes
17. RE: Wrong BGP next hop programming

Recommend
Taupin
Posted Dec 25, 2017 05:15 PM

Reply Reply Privately
Hi,
Did you get the chance to reproduce the issue?
Thanks
Abdelfatah ELARFAOUI
IP/MPLS Expert, CCIE R&S, JNCIE-SP
https://www.linkedin.com/in/elarfaouiabdelfatah/
18. RE: Wrong BGP next hop programming

Recommend
Taupin
Posted Jan 05, 2018 12:05 PM

Reply Reply Privately
Last try ... If no answer from your side I will consider it as a Bug!!
Thanks
19. RE: Wrong BGP next hop programming

Recommend
cnrz
Posted Jan 07, 2018 05:37 AM

Reply Reply Privately
EBPG and IBGP next-hop processing is different and as in the previous messages next-hop self solves need for reachability of next-hop address since in that case static (or another Routing Protocol as OSPF for the unreachable next-hops from the physical to the DLR Ip address.
For Optimal routing, next-hop announced to IBGP Peers is not changed, so if the next-hop (which is the DLR IP address of DLR-PLR transit link) is not reachable on the Physical Router, then routes (Logical Switch Subnets) announced from DLR to the Edge are not put into to routing table of the physical router being flagged as unreachable.
EBGP has different next-hop announcement than IBPG, the next-hop announced by the Edge to the Physical router is the Edge itself instead of DLR IP address, and in that case next-hop self may not be necessary since Physical router already knows the directly connected link Edge-Physical connected IP address.
One exception may be if the routes are redistributed into EBGP (static or ospf) instead of originating locally on the Edge, or being learned from another BGP neighbor. In that case EBGP may behave similar to IBPG (i,e announcing the DLR IP address as next-hop instead of itself). Although again for route optimality, this may create problems again without next-hop self.
This link may be helpful about the next-hops on BGP different scenarios:
http://blog.ipspace.net/2011/08/bgp-next-hop-processing.html
BGP next hop of a locally originated routes
When a router originates a BGP route configured with a network router configuration command or through route redistribution (redistribute router configuration command), it sets the BGP next hop to the IGP next hop (the same value you’d find in the IP routing table). BGP next hop is set to 0.0.0.0 for routes with unknown next hops – connected interfaces, static routes to null 0 or summary routes configured with aggregate-address router configuration command.
When a BGP route with missing next hop is sent to BGP neighbors, the BGP next hop is set to the source IP address of the BGP session.
If this is per RFC every BGP implementation should behave like this, but if left optional then this may change from different products.
So if the same test is done by configuring BGP between DLR and Edge instead of static route redistribution on the Edge, the next-hop may change to the Edge IP address. DLR-Edge may be IBGP as well as EBGP, since the Edge in both cases learns these routes from BGP instead of locally originating due to static routes.
On SDDC Reference some designs Ospf is recommended, and for some designs EBGP or IBGP is recommended:
https://pubs.vmware.com/vmware-validated-design-41/topic/com.vmware.ICbase/PDF/vmware-validated-design-41-sddc-ospf-bgp-routing.pdf
The VMware Validated Design documentation supports the following configuration:
Use eBGP between the physical environment (ToR) and ECMP-enabled NSX Edge (ESG) devices.
Use iBGP between NSX ESGs and UDLRs and DLRs.
On the NSX ESGs, configure route redistribution between the physical and software-defined infrastructure
20. RE: Wrong BGP next hop programming

Recommend
Taupin
Posted Jan 07, 2018 11:00 AM

Reply Reply Privately
Hi,
Please refer to previous replies if you are interested in reproducing this issue!!!
Best Regards
Abdelfatah ELARFAOUI
IP/MPLS Expert CCIE/JNCIE/VCIX6-NV certified
21. RE: Wrong BGP next hop programming

Recommend
cnrz
Posted Jan 08, 2018 04:24 PM

Reply Reply Privately
After rereading noticed the current topology is different than the first one, but for clarification on the first one, NSX-edge-3-0 is advertising 172.16.0.0/19 network to physical router 192.168.100.1 with next hop again 192.168.100.1 which is physical router itself, which looks like there is a routing loop or similar.
Also about the second topology with DLR<-->ESG(DLR1)<-->ESG1<-->Physical Router as DLR1 advertises 172.16.0.0/21 route to ESG1, since it is IBGP it should advertise it with next-hop 192.168.1.1. How does ESG1 know about next-hop 192.168.1.1 if there is no other static route or IGP as Ospf on ESG1? If ESG1 does not know 192.168.1.1 in its routing table then it may mark the route as inaccassable and may not advertise it to Physical router?
Is it possible to send the Routing Table, BGP Table, advertised to and received BGP Updates from neighbors for ESG(DLR1), ESG1 and Physical router for both of wrong advertising situation, after creating the Next Hop reprogrammed correctly, and after clearing the Bgp sessins again reproducing the wrong annnouncement?
Also during the wrong announcement phase, are the routing and Bgp tables stable or do they change transiently like every 30 sedonds?
create static route Then redistribute the static route
as Workaround: I am deleting the static route and recreate it then NEXT HOP is reprogrammed correctly by ESG1 but once I clear BGP sessions, the issue is reproduced
22. RE: Wrong BGP next hop programming

Recommend
Taupin
Posted Jan 08, 2018 05:15 PM

Reply Reply Privately
Dear,
Please try to understand the issue before posting any reply. Kindly refer to previous replies for your understanding!! Also below are answers to your concerns:
- There is only one topology I have just clarified the naming!!
- Could you please give me one scenario can lead to loop :smileygrin: taking into considerateness that the topology is a Flat topology :smileyhappy:. Wondering how you are suspecting loop on Flat topology!!
- ESG 1 is directly connected to DLR1 (for info DLR1 is in fact an ESG I named it as DLR1) so you can conclude how BGP NextHop is resolved :smileyhappy:
For your reference, please find below details of this issue:
To reproduce this issue please follow below procedure:
- Topology:
DLR2 (with default GW and directly connected subnets 172.16.1.0/24, 172.16.2.0/24,172.16.3.0/24) [192.168.1.1/30]----------> [192.168.1.2/30] DLR1 [192.168.13.1/24]-------------> [192.168.13.2/24] ESG1 [192.168.100.3/24]---------------------------------------> [192.168.100.1/24] Physical Router
- DLR2 uses default GW 0.0.0.0/0 pointing to DLR1 which is in fact an ESG
- DLR1 and ESG1 are on the same AS 65001
- Physical Router on AS 65002
- DLR1 is configured with summary route 172.16.0.0/21 pointing to DLR2
- DLR1 redistribute static route and directly connected routes to ESG1
- ESG1 redistribute directly connected routes
To reproduce :
- create static route Then redistribute the static route
as Workaround: I am deleting the static route and recreate it then NEXT HOP is reprogrammed correctly by ESG1 but once I clear BGP sessions, the issue is reproduced
Respectfully Yours!!

VMware NSX

Wrong BGP next hop programming

TaupinDec 02, 2017 04:35 PM

TaupinDec 02, 2017 04:40 PM

TaupinDec 02, 2017 04:57 PM

bayupwDec 03, 2017 11:07 AM

TaupinDec 02, 2017 05:06 PM

bayupwDec 03, 2017 09:02 AM

TaupinDec 03, 2017 10:55 AM

bayupwDec 03, 2017 08:53 AM

TaupinDec 03, 2017 10:50 AM

bayupwDec 03, 2017 11:05 AM

TaupinDec 03, 2017 11:28 AM

bayupwDec 03, 2017 08:37 PM

TaupinDec 04, 2017 04:28 PM

TaupinDec 08, 2017 03:24 AM

TaupinDec 11, 2017 07:19 AM

bayupwDec 13, 2017 08:28 PM

TaupinDec 25, 2017 05:15 PM

TaupinJan 05, 2018 12:05 PM

cnrzJan 07, 2018 05:37 AM

TaupinJan 07, 2018 11:00 AM

cnrzJan 08, 2018 04:24 PM

TaupinJan 08, 2018 05:15 PM

1. Wrong BGP next hop programming

2. RE: Wrong BGP next hop programming

3. RE: Wrong BGP next hop programming

4. RE: Wrong BGP next hop programming

5. RE: Wrong BGP next hop programming

6. RE: Wrong BGP next hop programming

7. RE: Wrong BGP next hop programming

8. RE: Wrong BGP next hop programming

9. RE: Wrong BGP next hop programming

10. RE: Wrong BGP next hop programming

11. RE: Wrong BGP next hop programming

12. RE: Wrong BGP next hop programming

13. RE: Wrong BGP next hop programming

14. RE: Wrong BGP next hop programming

15. RE: Wrong BGP next hop programming

16. RE: Wrong BGP next hop programming

17. RE: Wrong BGP next hop programming

18. RE: Wrong BGP next hop programming

19. RE: Wrong BGP next hop programming

BGP next hop of a locally originated routes

20. RE: Wrong BGP next hop programming

21. RE: Wrong BGP next hop programming

22. RE: Wrong BGP next hop programming