Our environment is quite dense, which I believe was the main trigger. We have around 20 Edge Clusters (approx. 40 Edge Nodes total), and in our design, we can have a maximum of 30 Edge Nodes per ESXi host during maintenance or failover scenarios.
If you have high-density Edge placements or use the 2048 Ring Size, I would strongly recommend checking your XMAP usage before triggering the upgrade.
-------------------------------------------
Original Message:
Sent: Apr 14, 2026 05:20 AM
From: Serhii
Subject: NSX Upgrade from 4.2.1.3 - NSX Managers become unresponsive
Hi Matic,
Thank you for sharing your experience and the response from Broadcom support.
Could you please share more details about your environment, such as the number of VMs per host, the presence of "monster" VMs, the number of VMs with more than 2–3 vNICs, the number and size of your Edge nodes, and whether you are using Enhanced Data Path (EDP) (and in which mode)?
This would help us better understand the issue and assess whether it could potentially occur in our infrastructure. Based on the KB, it is unclear in which specific environments this issue can be reproduced.
Thank you.
Original Message:
Sent: Apr 14, 2026 01:44 AM
From: Matic Lulik
Subject: NSX Upgrade from 4.2.1.3 - NSX Managers become unresponsive
Hi all,
After this happened support have checked the logs of NSX Managers, EDGE Nodes and ESXi hosts and found 2 issues that are already in Broadcom KB. Below is response from Broadcom support. Our NSX Managers did not get any memorz region once migrated to another host because of Memory exhaustion issue. Accordingly NSX Managers couldnt set MGMT interface in UP state within OS.
Total memory regions used by all vNICs is 8516108800 bytes which is around 8 GB. But its actually more in this case (3 VNICs per VM), and due to multiple vNICs of Edge VM requesting the same memory region from edge is leading to XMAP exhaustion.
XMAP space which is 32GB by default is nearing exhaustion due to Edge memory regions taking more memory due to a known issue (Fixed in 9.1) when the memory regions from multiple vNICs overlap.
Considering the above facts, below workaround needs to be implemented in the environment on all hosts that will be hosting Edge VMs, irrespective of NSX version:
- Increase the default XMAP space to 64GB.
This would require host reboot after applying this configuration. You can implement step 1 of the below KB:
https://knowledge.broadcom.com/external/article/398914/power-on-vm-fails-with-the-errror-failed.html - Increase P2M Buffer from 5 to 32 at host level:
This is to avoid any other P2M related issues during vmotion.
When multiple VMs are migrated together to a host where this is not set to the max buffer limit, issues such as below KB can occur; which is caused due to a crunch of p2m buffer slots and we might get a map failure: https://knowledge.broadcom.com/external/article?legacyId=76387
Kind regards,
Matic
Original Message:
Sent: Mar 28, 2026 06:16 AM
From: Matic Lulik
Subject: NSX Upgrade from 4.2.1.3 - NSX Managers become unresponsive
Hi all,
We were upgrading NSX from 4.2.1.3 to 4.2.3.3.
1. 18 clusters of Edge Nodes were upgraded successfully.
2. When upgrading Transport Nodes NSX component, TN upgrade was successfull until NSX managers were moved to already upgraded hosts(by DRS because of maintainance mode of another host). NSX managers lost connection to the network. NSX Upgrade was automatically paused when this happened, because all managers were unresponsive. Even when vMotionto hosts that were not upgraded yet, same issue persists. Broadcom support checked netstat on Hosts and NSX Manager was showing MAC address 00:00:00:00:00. (See attached screenshot).
NSX managers are part of non NSX vDS and VLAN segment.
We migrated NSX Managers to vSwitch, but still issues persists. When NSX Manager was rebootet then the connection worked again. NSX Managers were migrated back to vDS and issue was gone.
However we are in the middle of the upgrade now (2 Transport nodes to be upgraded and 3 NSX Managers cluster). We are waiting for support to analyze logs.
While the support were collecting logs on our ESXi hosts it triggered PSOD. When rebooting ESXi everything is back to operational state.
Did any of you ran into similar issue and could it be connected to JDK bug affecting current version of NSX Managers (4.2.1.3) ?
Best regards
-------------------------------------------