IPv6 Connections fail after changing NSX Routing (Global Interface) MTU; Benefits of setting Routing MTU to >1500?

Back to discussions

Expand all | Collapse all

IPv6 Connections fail after changing NSX Routing (Global Interface) MTU; Benefits of setting Routing MTU to >1500?

1. IPv6 Connections fail after changing NSX Routing (Global Interface) MTU; Benefits of setting Routing MTU to >1500?

Recommend
left_right
Posted Feb 17, 2025 07:47 AM
Edited by left_right Feb 17, 2025 11:41 AM

Reply Reply Privately
Hello,

recently we have begun to roll out dual stack networks for our workload vms. Currently ipv6 addresses are provisioned using RA, by the NSX gateways; the default ND profile is used.

The current MTU settings are:

physical fabric: 9216

vsphere dvs: 9000

NSX Tunnel Endpoint: 9000, RTEP not used

Following a few recommendations I have set the Global Interface MTU to 1700 (+200 from default). The MTU value on single Gateway Interfaces are not set, I want to use the global configuration.

Those settings work fine for ipv4, however after running some tests with ipv6 issue regarding fragmentation occur.

For tests I have deployed two Ubuntu Server 24 systems (src_system and dst_system), each placed in a different subnet.

After deployment, the ip configuration looks like this:

root@src_system:~# ip a | grep mtu 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 root@src_system:~# ip -6 route <src_system_ip6_net>::/64 dev ens192 proto ra metric 1024 expires 2591994sec mtu 1700 hoplimit 64 pref medium

For a simple connectivity test I have set up nginx running on one of the systems (dst system), which is listening on port 80 and 443 (ssl).

The connection works for ipv4 http, ipv4 https and ipv6 http, but fails for ipv6 https:

root@src_system:~# curl -k -v https://[<remote_ip6>]:443 * Trying [<remote_system_ip6>]:443... * Connected to <remote_system_ip6> (<remote_ip6>) port 443 * ALPN: curl offers h2,http/1.1 * TLSv1.3 (OUT), TLS handshake, Client hello (1): (timeout here)

tcpdump on the remote systems shows received packets:

root@<dst_system>:~# tcpdump -v -i ens192 src <src_system_ip6> tcpdump: listening on ens192, link-type EN10MB (Ethernet), snapshot length 262144 bytes 12:28:32.763934 IP6 (flowlabel 0x676c2, hlim 63, next-header TCP (6) payload length: 40) <src_system_ip6>.57112 > <dst_system>.https: Flags [S], cksum 0x9479 (correct), seq 3800441539, win 63960, options [mss 1640,sackOK,TS val 1120252730 ecr 0,nop,wscale 7], length 0 12:28:32.764197 IP6 (flowlabel 0x676c2, hlim 63, next-header TCP (6) payload length: 32) <src_system_ip6>.57112 > <dst_system>.https: Flags [.], cksum 0x6dc9 (correct), ack 4042874102, win 500, options [nop,nop,TS val 1120252731 ecr 69467120], length 0 12:28:32.766115 IP6 (flowlabel 0x676c2, hlim 63, next-header TCP (6) payload length: 549) <src_system_ip6>.57112 > <dst_system>.https: Flags [P.], cksum 0x0cc0 (correct), seq 0:517, ack 1, win 500, options [nop,nop,TS val 1120252733 ecr 69467120], length 517

Next step was to change the MTU on the announced route:

root@<src_system>:~# ip -6 route replace <src_system_ip6_net>::/64 dev ens192 mtu 1500 hoplimit 64 pref medium root@<src_system>:~# ip -6 r <src_system_ip6_net>::/64 dev ens192 metric 1024 mtu 1500 hoplimit 64 pref medium fe80::/64 dev ens192 proto kernel metric 256 pref medium fe80::/64 dev ens224 proto kernel metric 256 pref medium default proto static metric 1024 pref medium nexthop via <src_system_ip6_net>::1 dev ens192 weight 1 nexthop via fe80:<link_local> dev ens192 weight 1

This change resulted curl being able to build a ssl connection to the destination server:

root@<src_system>:~# curl -k -v https://[<dst_system_ip6>]:443 * Trying [<dst_system_ip6>]:443... * Connected to <dst_system_ip6> (<dst_system_ip6>) port 443 * ALPN: curl offers h2,http/1.1 * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): * TLSv1.3 (IN), TLS handshake, Certificate (11): * TLSv1.3 (IN), TLS handshake, CERT verify (15): * TLSv1.3 (IN), TLS handshake, Finished (20): ...

The setting is not permanent. After rebooting the announced MTU changed back, This time I changed the MTU setting of the network adapter on the source system. Per default the MTU is 1500, I changed it using netplan and rebooted the system:

network: version: 2 renderer: networkd ethernets: ens192: mtu: 1700 dhcp4: no dhcp6: no addresses: - <src_system_ipv4>/24 - <src_system_ipv6>/64 routes: - to: default via: <src_system_ipv4_gw> - to: ::/0 via: <src_system_ipv6_gw>

Right after rebooting the curl command worked.

The question is, what the optimal setting would be, hence the title of this thread:

Let the Gateway MTU sit at the default value of 1500, since most systems use a default adapter mtu of 1500?

Change the adapter MTU of deployed systems to a higher value? this would be distruptive though

If possible, do not provide a gateway MTU at all and let the hosts handle the MTU size? I do not think this is possible though.

I have read a great write up by @Francois Tallet on all the MTU setting in NSX: https://community.broadcom.com/applications-networking-security/blogs/francois-tallet/2024/11/21/mtus-in-nsx/?CommunityKey=b76535ef-c5a2-474d-8270-3e83685f020e

In the past, as is recommended in the document, we have set the Gateway MTU to 8800 in one of our environments, however this was ipv4 only.

In a paper on MTU path discovery Cisco recommends using 1500 on ipv6 links: ip6-mtu-path-disc.pdf

Could anyone expand on this?

-----------------------------------------------------------------------------------

Edit:

Forgot to add the relevant mtu system settings. The system should accept the RA MTU:

net.ipv6.conf.all.accept_ra_mtu = 1 net.ipv6.conf.all.mtu = 1280 net.ipv6.conf.default.accept_ra_mtu = 1 net.ipv6.conf.default.mtu = 1280 net.ipv6.conf.ens192.accept_ra_mtu = 1 net.ipv6.conf.ens192.mtu = 1500 net.ipv6.conf.ens224.accept_ra_mtu = 1 net.ipv6.conf.ens224.mtu = 1500 net.ipv6.conf.lo.accept_ra_mtu = 1 net.ipv6.conf.lo.mtu = 65536 net.ipv6.route.mtu_expires = 600

I did some more troubleshoting. For tests with ping6, it seems that the problem lies with the destination server not being able to reply with fragmentet packets, so that the reply does not even get sent out.

Configuration:

src_server & dst_server: adapter MTU 1500, RA (SLAAC) MTU: 1700. Here it seems that the MTU sent by the router is ignored, the adapters are still on 1500.

gateway: MTU 1700 set on L3 interface

root@<source_server>:~# ping6 -c 3 -s 1500 <dst_server_ip6> PING <dst_server_ip6> (<dst_server_ip6>) 1500 data bytes --- <dst_server_ip6> ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2084ms

You can see on the destination server, that the packets are fragmented and transported to the destination, but the reply is not fragmented, is bigger than the adapter MTU of 1508 and as a result is not sent out:

root@<dst_server>:~# tcpdump -i ens192 -n host <src_server_ip6> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on ens192, link-type EN10MB (Ethernet), snapshot length 262144 bytes 16:44:22.003989 IP6 <src_server_ip6> > <dst_server_ip6>: frag (0|1448) ICMP6, echo request, id 1459, seq 1, length 1448 16:44:22.004026 IP6 <src_server_ip6> > <dst_server_ip6>: frag (1448|60) 16:44:22.004065 IP6 <dst_server_ip6> > <src_server_ip6>: ICMP6, echo reply, id 1459, seq 1, length 1508 << does not reach source server

I suspect that on the source server the MTU advertised by the gateway is ignored, so that the server sends out fragmented packets. But, on the destination server, the advertised MTU it is applied somehow, hence the destination tries to respond with a 1508B replay and gets stuck on the 1500 adapter MTU.

Similar to the above workaround, when the adapter mtu is set to be equal to the router interface MTU, I applied the MTU using

ip link set mtu 1700 dev ens192

so that the adapter MTU is equal to the router MTU.

In this configuarion, the echo request as well as the echo replies get properly fragmented, sent and returned:

root@<src_server>:~# ping6 -c 3 -s 1900 <dst_server_ip6> PING <dst_server_ip6> (<dst_server_ip6>) 1900 data bytes 1908 bytes from <dst_server_ip6>: icmp_seq=1 ttl=63 time=1.12 ms 1908 bytes from <dst_server_ip6>: icmp_seq=2 ttl=63 time=0.592 ms 1908 bytes from <dst_server_ip6>: icmp_seq=3 ttl=63 time=0.646 ms --- <dst_server_ip6> ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2004ms rtt min/avg/max/mdev = 0.592/0.787/1.123/0.238 ms TCP dump on destination server: 16:54:24.275703 IP6 <src_server_ip6> > <dst_server_ip6>: frag (0|1648) ICMP6, echo request, id 1475, seq 1, length 1648 16:54:24.275723 IP6 <src_server_ip6> > <dst_server_ip6>: frag (1648|260) 16:54:24.275756 IP6 <dst_server_ip6> > <src_server_ip6>: frag (0|1648) ICMP6, echo reply, id 1475, seq 1, length 1648 16:54:24.275794 IP6 <dst_server_ip6> > <src_server_ip6>: frag (1648|260)

The woraround is somewhat valid for newly deployed systems.

However, should ipv6 addresses get provisioned using SLAAC in NSX with a MTU not matching the "default" MTU on workload servers, users would not be able to utilize the ipv6 addressing without making changes to their systems' configuration.

-----------------------------------------

Edit 2:

In aother workaround I just set the adapters on both systems to static ipv6 addressing. To deactivate RA I used the "accept-ra: no" flag in netplan. After applaying the netplan config and rebooting, the systems came up with the static addresses and could reach each other, fragmenting worked fine on both sides.

I did an additional test, to see if the router link MTU gets discovered correctly. For this, I have set the MTU on the source server to 9000, and left the MTU on the router on 1700. Sending first echo request with MTU 1900 fragmented, second MTU 1900 with no fragmenting:

root@Master-Project-41579:~# ping6 -c 3 -s 1900 <dst_server_ip6> PING <dst_server_ip6> (<dst_server_ip6>) 1900 data bytes 1908 bytes from <dst_server_ip6>: icmp_seq=1 ttl=63 time=0.955 ms 1908 bytes from <dst_server_ip6>: icmp_seq=2 ttl=63 time=0.508 ms 1908 bytes from <dst_server_ip6>: icmp_seq=3 ttl=63 time=0.642 ms --- <dst_server_ip6> ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2053ms rtt min/avg/max/mdev = 0.508/0.701/0.955/0.187 ms root@Master-Project-41579:~# ping6 -c 3 -s 1900 -M probe <dst_server_ip6> PING <dst_server_ip6> (<dst_server_ip6>) 1900 data bytes From fe80::50:56ff:fe56:4452%ens192 icmp_seq=1 Packet too big: mtu=1700 From fe80::50:56ff:fe56:4452%ens192 icmp_seq=2 Packet too big: mtu=1700 From fe80::50:56ff:fe56:4452%ens192 icmp_seq=3 Packet too big: mtu=1700 --- <dst_server_ip6> ping statistics --- 3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2030ms

The question remains: why, when autoconfiguration is on, the MTU setting is not applied to the adapter.

Switching to static ip addressing would mean more overhead with managing IPAM, but if it works that I guess it is better than headaches because of SLAAC.
2. RE: IPv6 Connections fail after changing NSX Routing (Global Interface) MTU; Benefits of setting Routing MTU to >1500?

Recommend
AbbedSedkaoui
Posted Feb 18, 2025 08:23 AM

Reply Reply Privately
Hi,

It looks like you're hitting this networkd issue:

https://github.com/systemd/systemd/issues/33160

Hope that help.

Original Message
3. RE: IPv6 Connections fail after changing NSX Routing (Global Interface) MTU; Benefits of setting Routing MTU to >1500?

Recommend
left_right
Posted Feb 25, 2025 09:57 AM

Reply Reply Privately
Thank you for the suggestion, but would the linked issue affect how the mtu is applied on the interface, based on the router advertisement? "networkd does not include the MTU of the interface in emitted RAs." - in my case the affected system does not send RAs, the router does.

The currently used systemd version is: systemd 255 (255.4-1ubuntu8.4)

I did not upgrade systemd directly, instead applied a release upgrade to 24.10, which comes with systemd version 256.5-2ubuntu3.1.

The router advertisement comes with a correct AdvLinkMTU of 9000.

The issue remains, like in the original post - I can send an ICMP ping6 with -s 9000 from this system to another on a different subnet. The remote system responds with a packet size that is based on the announced MTU of 9000:

15:51:25.328998 IP6 (flowlabel 0x87f98, hlim 2, next-header Fragment (44) payload length: 1456) <ubuntu_system> > <remote_ubuntu_system>: frag (0x4c667267:0|1448) ICMP6, echo request, id 1781, seq 1 15:51:25.329018 IP6 (flowlabel 0x87f98, hlim 2, next-header Fragment (44) payload length: 1456) <ubuntu_system> > <remote_ubuntu_system>: frag (0x4c667267:1448|1448) 15:51:25.329020 IP6 (flowlabel 0x87f98, hlim 2, next-header Fragment (44) payload length: 1456) <ubuntu_system> > <remote_ubuntu_system>: frag (0x4c667267:2896|1448) 15:51:25.329023 IP6 (flowlabel 0x87f98, hlim 2, next-header Fragment (44) payload length: 1456) <ubuntu_system> > <remote_ubuntu_system>: frag (0x4c667267:4344|1448) 15:51:25.329026 IP6 (flowlabel 0x87f98, hlim 2, next-header Fragment (44) payload length: 1456) <ubuntu_system> > <remote_ubuntu_system>: frag (0x4c667267:5792|1448) 15:51:25.329029 IP6 (flowlabel 0x87f98, hlim 2, next-header Fragment (44) payload length: 1456) <ubuntu_system> > <remote_ubuntu_system>: frag (0x4c667267:7240|1448) 15:51:25.329033 IP6 (flowlabel 0x87f98, hlim 2, next-header Fragment (44) payload length: 328) <ubuntu_system> > <remote_ubuntu_system>: frag (0x4c667267:8688|320) 15:51:25.329089 IP6 (flowlabel 0xde8e2, hlim 3, next-header Fragment (44) payload length: 8960) <remote_ubuntu_system> > <ubuntu_system>: frag (0x397c6877:0|8952) ICMP6, echo reply, id 1781, seq 1 15:51:25.329120 IP6 (flowlabel 0xde8e2, hlim 3, next-header Fragment (44) payload length: 64) <remote_ubuntu_system> > <ubuntu_system>: frag (0x397c6877:8952|56)

The source system properly handles fragmentation, the remote system applies a packet size equal to the MTU recevied with the RA.

I wanted also to recreate the issue on Windows Server 2019, however it does not seems to be affected.

Original Message
4. RE: IPv6 Connections fail after changing NSX Routing (Global Interface) MTU; Benefits of setting Routing MTU to >1500?

Recommend
AbbedSedkaoui
Posted Feb 26, 2025 07:24 AM

Reply Reply Privately
If you scroll down the systemd-networkd issue mentioned #33160, there is a pull request https://github.com/systemd/systemd/pull/33360 that close the issue,

but it went in july 2024 to being tagged from good-to-merge to needs-rework on august 2024,

so it is not merged yet and readily available to upgrade as a component and needless to say available in new distribution yet.
Tbh i haven't lookup much into the issue but if it really is your case, you want keep track of this PR status.

Original Message

VMware NSX

IPv6 Connections fail after changing NSX Routing (Global Interface) MTU; Benefits of setting Routing MTU to >1500?

left_rightFeb 17, 2025 07:47 AM

AbbedSedkaouiFeb 18, 2025 08:23 AM

left_rightFeb 25, 2025 09:57 AM

AbbedSedkaouiFeb 26, 2025 07:24 AM

1. IPv6 Connections fail after changing NSX Routing (Global Interface) MTU; Benefits of setting Routing MTU to >1500?

2. RE: IPv6 Connections fail after changing NSX Routing (Global Interface) MTU; Benefits of setting Routing MTU to >1500?

3. RE: IPv6 Connections fail after changing NSX Routing (Global Interface) MTU; Benefits of setting Routing MTU to >1500?

4. RE: IPv6 Connections fail after changing NSX Routing (Global Interface) MTU; Benefits of setting Routing MTU to >1500?