but it went in july 2024 to being tagged from good-to-merge to needs-rework on august 2024,
so it is not merged yet and readily available to upgrade as a component and needless to say available in new distribution yet.
Tbh i haven't lookup much into the issue but if it really is your case, you want keep track of this PR status.
Original Message:
Sent: Feb 25, 2025 09:57 AM
From: left_right
Subject: IPv6 Connections fail after changing NSX Routing (Global Interface) MTU; Benefits of setting Routing MTU to >1500?
Thank you for the suggestion, but would the linked issue affect how the mtu is applied on the interface, based on the router advertisement? "networkd does not include the MTU of the interface in emitted RAs." - in my case the affected system does not send RAs, the router does.
The currently used systemd version is: systemd 255 (255.4-1ubuntu8.4)
I did not upgrade systemd directly, instead applied a release upgrade to 24.10, which comes with systemd version 256.5-2ubuntu3.1.
The router advertisement comes with a correct AdvLinkMTU of 9000.
The issue remains, like in the original post - I can send an ICMP ping6 with -s 9000 from this system to another on a different subnet. The remote system responds with a packet size that is based on the announced MTU of 9000:
15:51:25.328998 IP6 (flowlabel 0x87f98, hlim 2, next-header Fragment (44) payload length: 1456) <ubuntu_system> > <remote_ubuntu_system>: frag (0x4c667267:0|1448) ICMP6, echo request, id 1781, seq 115:51:25.329018 IP6 (flowlabel 0x87f98, hlim 2, next-header Fragment (44) payload length: 1456) <ubuntu_system> > <remote_ubuntu_system>: frag (0x4c667267:1448|1448)15:51:25.329020 IP6 (flowlabel 0x87f98, hlim 2, next-header Fragment (44) payload length: 1456) <ubuntu_system> > <remote_ubuntu_system>: frag (0x4c667267:2896|1448)15:51:25.329023 IP6 (flowlabel 0x87f98, hlim 2, next-header Fragment (44) payload length: 1456) <ubuntu_system> > <remote_ubuntu_system>: frag (0x4c667267:4344|1448)15:51:25.329026 IP6 (flowlabel 0x87f98, hlim 2, next-header Fragment (44) payload length: 1456) <ubuntu_system> > <remote_ubuntu_system>: frag (0x4c667267:5792|1448)15:51:25.329029 IP6 (flowlabel 0x87f98, hlim 2, next-header Fragment (44) payload length: 1456) <ubuntu_system> > <remote_ubuntu_system>: frag (0x4c667267:7240|1448)15:51:25.329033 IP6 (flowlabel 0x87f98, hlim 2, next-header Fragment (44) payload length: 328) <ubuntu_system> > <remote_ubuntu_system>: frag (0x4c667267:8688|320)15:51:25.329089 IP6 (flowlabel 0xde8e2, hlim 3, next-header Fragment (44) payload length: 8960) <remote_ubuntu_system> > <ubuntu_system>: frag (0x397c6877:0|8952) ICMP6, echo reply, id 1781, seq 115:51:25.329120 IP6 (flowlabel 0xde8e2, hlim 3, next-header Fragment (44) payload length: 64) <remote_ubuntu_system> > <ubuntu_system>: frag (0x397c6877:8952|56)
The source system properly handles fragmentation, the remote system applies a packet size equal to the MTU recevied with the RA.
I wanted also to recreate the issue on Windows Server 2019, however it does not seems to be affected.
Original Message:
Sent: Feb 18, 2025 03:04 AM
From: AbbedSedkaoui
Subject: IPv6 Connections fail after changing NSX Routing (Global Interface) MTU; Benefits of setting Routing MTU to >1500?
Hi,
It looks like you're hitting this networkd issue:
https://github.com/systemd/systemd/issues/33160
Hope that help.
Original Message:
Sent: Feb 17, 2025 07:47 AM
From: left_right
Subject: IPv6 Connections fail after changing NSX Routing (Global Interface) MTU; Benefits of setting Routing MTU to >1500?
Hello,
recently we have begun to roll out dual stack networks for our workload vms. Currently ipv6 addresses are provisioned using RA, by the NSX gateways; the default ND profile is used.
The current MTU settings are:
physical fabric: 9216
vsphere dvs: 9000
NSX Tunnel Endpoint: 9000, RTEP not used
Following a few recommendations I have set the Global Interface MTU to 1700 (+200 from default). The MTU value on single Gateway Interfaces are not set, I want to use the global configuration.
Those settings work fine for ipv4, however after running some tests with ipv6 issue regarding fragmentation occur.
For tests I have deployed two Ubuntu Server 24 systems (src_system and dst_system), each placed in a different subnet.
After deployment, the ip configuration looks like this:
root@src_system:~# ip a | grep mtu1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 10002: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000root@src_system:~# ip -6 route<src_system_ip6_net>::/64 dev ens192 proto ra metric 1024 expires 2591994sec mtu 1700 hoplimit 64 pref medium
For a simple connectivity test I have set up nginx running on one of the systems (dst system), which is listening on port 80 and 443 (ssl).
The connection works for ipv4 http, ipv4 https and ipv6 http, but fails for ipv6 https:
root@src_system:~# curl -k -v https://[<remote_ip6>]:443* Trying [<remote_system_ip6>]:443...* Connected to <remote_system_ip6> (<remote_ip6>) port 443* ALPN: curl offers h2,http/1.1* TLSv1.3 (OUT), TLS handshake, Client hello (1):(timeout here)
tcpdump on the remote systems shows received packets:
root@<dst_system>:~# tcpdump -v -i ens192 src <src_system_ip6>tcpdump: listening on ens192, link-type EN10MB (Ethernet), snapshot length 262144 bytes12:28:32.763934 IP6 (flowlabel 0x676c2, hlim 63, next-header TCP (6) payload length: 40) <src_system_ip6>.57112 > <dst_system>.https: Flags [S], cksum 0x9479 (correct), seq 3800441539, win 63960, options [mss 1640,sackOK,TS val 1120252730 ecr 0,nop,wscale 7], length 012:28:32.764197 IP6 (flowlabel 0x676c2, hlim 63, next-header TCP (6) payload length: 32) <src_system_ip6>.57112 > <dst_system>.https: Flags [.], cksum 0x6dc9 (correct), ack 4042874102, win 500, options [nop,nop,TS val 1120252731 ecr 69467120], length 012:28:32.766115 IP6 (flowlabel 0x676c2, hlim 63, next-header TCP (6) payload length: 549) <src_system_ip6>.57112 > <dst_system>.https: Flags [P.], cksum 0x0cc0 (correct), seq 0:517, ack 1, win 500, options [nop,nop,TS val 1120252733 ecr 69467120], length 517
Next step was to change the MTU on the announced route:
root@<src_system>:~# ip -6 route replace <src_system_ip6_net>::/64 dev ens192 mtu 1500 hoplimit 64 pref mediumroot@<src_system>:~# ip -6 r<src_system_ip6_net>::/64 dev ens192 metric 1024 mtu 1500 hoplimit 64 pref mediumfe80::/64 dev ens192 proto kernel metric 256 pref mediumfe80::/64 dev ens224 proto kernel metric 256 pref mediumdefault proto static metric 1024 pref medium nexthop via <src_system_ip6_net>::1 dev ens192 weight 1 nexthop via fe80:<link_local> dev ens192 weight 1
This change resulted curl being able to build a ssl connection to the destination server:
root@<src_system>:~# curl -k -v https://[<dst_system_ip6>]:443* Trying [<dst_system_ip6>]:443...* Connected to <dst_system_ip6> (<dst_system_ip6>) port 443* ALPN: curl offers h2,http/1.1* TLSv1.3 (OUT), TLS handshake, Client hello (1):* TLSv1.3 (IN), TLS handshake, Server hello (2):* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):* TLSv1.3 (IN), TLS handshake, Certificate (11):* TLSv1.3 (IN), TLS handshake, CERT verify (15):* TLSv1.3 (IN), TLS handshake, Finished (20):...
The setting is not permanent. After rebooting the announced MTU changed back, This time I changed the MTU setting of the network adapter on the source system. Per default the MTU is 1500, I changed it using netplan and rebooted the system:
network: version: 2 renderer: networkd ethernets: ens192: mtu: 1700 dhcp4: no dhcp6: no addresses: - <src_system_ipv4>/24 - <src_system_ipv6>/64 routes: - to: default via: <src_system_ipv4_gw> - to: ::/0 via: <src_system_ipv6_gw>
Right after rebooting the curl command worked.
The question is, what the optimal setting would be, hence the title of this thread:
- Let the Gateway MTU sit at the default value of 1500, since most systems use a default adapter mtu of 1500?
- Change the adapter MTU of deployed systems to a higher value? this would be distruptive though
- If possible, do not provide a gateway MTU at all and let the hosts handle the MTU size? I do not think this is possible though.
I have read a great write up by @Francois Tallet on all the MTU setting in NSX: https://community.broadcom.com/applications-networking-security/blogs/francois-tallet/2024/11/21/mtus-in-nsx/?CommunityKey=b76535ef-c5a2-474d-8270-3e83685f020e
In the past, as is recommended in the document, we have set the Gateway MTU to 8800 in one of our environments, however this was ipv4 only.
In a paper on MTU path discovery Cisco recommends using 1500 on ipv6 links: ip6-mtu-path-disc.pdf
Could anyone expand on this?
-----------------------------------------------------------------------------------
Edit:
Forgot to add the relevant mtu system settings. The system should accept the RA MTU:
net.ipv6.conf.all.accept_ra_mtu = 1net.ipv6.conf.all.mtu = 1280net.ipv6.conf.default.accept_ra_mtu = 1net.ipv6.conf.default.mtu = 1280net.ipv6.conf.ens192.accept_ra_mtu = 1net.ipv6.conf.ens192.mtu = 1500net.ipv6.conf.ens224.accept_ra_mtu = 1net.ipv6.conf.ens224.mtu = 1500net.ipv6.conf.lo.accept_ra_mtu = 1net.ipv6.conf.lo.mtu = 65536net.ipv6.route.mtu_expires = 600
I did some more troubleshoting. For tests with ping6, it seems that the problem lies with the destination server not being able to reply with fragmentet packets, so that the reply does not even get sent out.
Configuration:
- src_server & dst_server: adapter MTU 1500, RA (SLAAC) MTU: 1700. Here it seems that the MTU sent by the router is ignored, the adapters are still on 1500.
- gateway: MTU 1700 set on L3 interface
root@<source_server>:~# ping6 -c 3 -s 1500 <dst_server_ip6>PING <dst_server_ip6> (<dst_server_ip6>) 1500 data bytes--- <dst_server_ip6> ping statistics ---3 packets transmitted, 0 received, 100% packet loss, time 2084ms
You can see on the destination server, that the packets are fragmented and transported to the destination, but the reply is not fragmented, is bigger than the adapter MTU of 1508 and as a result is not sent out:
root@<dst_server>:~# tcpdump -i ens192 -n host <src_server_ip6>tcpdump: verbose output suppressed, use -v[v]... for full protocol decodelistening on ens192, link-type EN10MB (Ethernet), snapshot length 262144 bytes16:44:22.003989 IP6 <src_server_ip6> > <dst_server_ip6>: frag (0|1448) ICMP6, echo request, id 1459, seq 1, length 144816:44:22.004026 IP6 <src_server_ip6> > <dst_server_ip6>: frag (1448|60)16:44:22.004065 IP6 <dst_server_ip6> > <src_server_ip6>: ICMP6, echo reply, id 1459, seq 1, length 1508 << does not reach source server
I suspect that on the source server the MTU advertised by the gateway is ignored, so that the server sends out fragmented packets. But, on the destination server, the advertised MTU it is applied somehow, hence the destination tries to respond with a 1508B replay and gets stuck on the 1500 adapter MTU.
Similar to the above workaround, when the adapter mtu is set to be equal to the router interface MTU, I applied the MTU using
ip link set mtu 1700 dev ens192
so that the adapter MTU is equal to the router MTU.
In this configuarion, the echo request as well as the echo replies get properly fragmented, sent and returned:
root@<src_server>:~# ping6 -c 3 -s 1900 <dst_server_ip6>PING <dst_server_ip6> (<dst_server_ip6>) 1900 data bytes1908 bytes from <dst_server_ip6>: icmp_seq=1 ttl=63 time=1.12 ms1908 bytes from <dst_server_ip6>: icmp_seq=2 ttl=63 time=0.592 ms1908 bytes from <dst_server_ip6>: icmp_seq=3 ttl=63 time=0.646 ms--- <dst_server_ip6> ping statistics ---3 packets transmitted, 3 received, 0% packet loss, time 2004msrtt min/avg/max/mdev = 0.592/0.787/1.123/0.238 msTCP dump on destination server:16:54:24.275703 IP6 <src_server_ip6> > <dst_server_ip6>: frag (0|1648) ICMP6, echo request, id 1475, seq 1, length 164816:54:24.275723 IP6 <src_server_ip6> > <dst_server_ip6>: frag (1648|260)16:54:24.275756 IP6 <dst_server_ip6> > <src_server_ip6>: frag (0|1648) ICMP6, echo reply, id 1475, seq 1, length 164816:54:24.275794 IP6 <dst_server_ip6> > <src_server_ip6>: frag (1648|260)
The woraround is somewhat valid for newly deployed systems.
However, should ipv6 addresses get provisioned using SLAAC in NSX with a MTU not matching the "default" MTU on workload servers, users would not be able to utilize the ipv6 addressing without making changes to their systems' configuration.
-----------------------------------------
Edit 2:
In aother workaround I just set the adapters on both systems to static ipv6 addressing. To deactivate RA I used the "accept-ra: no" flag in netplan. After applaying the netplan config and rebooting, the systems came up with the static addresses and could reach each other, fragmenting worked fine on both sides.
I did an additional test, to see if the router link MTU gets discovered correctly. For this, I have set the MTU on the source server to 9000, and left the MTU on the router on 1700. Sending first echo request with MTU 1900 fragmented, second MTU 1900 with no fragmenting:
root@Master-Project-41579:~# ping6 -c 3 -s 1900 <dst_server_ip6>PING <dst_server_ip6> (<dst_server_ip6>) 1900 data bytes1908 bytes from <dst_server_ip6>: icmp_seq=1 ttl=63 time=0.955 ms1908 bytes from <dst_server_ip6>: icmp_seq=2 ttl=63 time=0.508 ms1908 bytes from <dst_server_ip6>: icmp_seq=3 ttl=63 time=0.642 ms--- <dst_server_ip6> ping statistics ---3 packets transmitted, 3 received, 0% packet loss, time 2053msrtt min/avg/max/mdev = 0.508/0.701/0.955/0.187 msroot@Master-Project-41579:~# ping6 -c 3 -s 1900 -M probe <dst_server_ip6>PING <dst_server_ip6> (<dst_server_ip6>) 1900 data bytesFrom fe80::50:56ff:fe56:4452%ens192 icmp_seq=1 Packet too big: mtu=1700From fe80::50:56ff:fe56:4452%ens192 icmp_seq=2 Packet too big: mtu=1700From fe80::50:56ff:fe56:4452%ens192 icmp_seq=3 Packet too big: mtu=1700--- <dst_server_ip6> ping statistics ---3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2030ms
The question remains: why, when autoconfiguration is on, the MTU setting is not applied to the adapter.
Switching to static ip addressing would mean more overhead with managing IPAM, but if it works that I guess it is better than headaches because of SLAAC.