The symptoms you're experiencing with random NSX-V Edge Gateway hang issues on vCenter-A but not on vCenter-B can be challenging to diagnose, but I can provide some steps and considerations to help you troubleshoot and potentially resolve the problem:
1. **Verify Compatibility:**
- Ensure that all components, including ESXi hosts, NSX-V, vCenter, and physical infrastructure, are on the VMware Compatibility Guide for your specific versions. Incompatible hardware or software versions can lead to unexpected issues.
2. **Collect Logs and Diagnostics:**
- When the issue occurs, collect logs and diagnostics from the affected Edge Gateway VM and the corresponding ESXi host. Analyzing these logs can provide insights into what's happening at the time of the hang.
3. **Review Edge Gateway Configuration:**
- Verify the configuration of the affected Edge Gateway VMs. Pay attention to firewall rules, routing, NAT, and any custom configurations. Ensure that they align with your network design and requirements.
4. **Monitor Resource Utilization:**
- Monitor the resource utilization (CPU, memory, and network) on the affected ESXi host and Edge Gateway VMs. High resource utilization can lead to performance issues and hangs.
5. **Check for Network Issues:**
- Examine the physical network infrastructure for any potential problems such as packet loss, congestion, or switch issues. Ensure that the network configuration on vCenter-A matches vCenter-B.
6. **Review VMware KB Articles:**
- Search VMware's Knowledge Base for any known issues or solutions related to NSX-V Edge Gateway hangs for your specific version. VMware often publishes articles with troubleshooting steps and fixes for common issues.
7. **Check for ESXi Host Isolation:**
- Ensure that the ESXi hosts where Edge Gateway VMs are running are not experiencing isolation events or network issues. Host isolation can lead to communication problems.
8. **Check for VMware Tools and ESXi Updates:**
- Ensure that VMware Tools inside the VMs and ESXi hosts are up-to-date. Outdated or incompatible versions can lead to communication issues.
9. **Consider NSX-V Version Compatibility:**
- While you mentioned upgrading NSX on vCenter-A, ensure that NSX-V version 6.4.14 is fully compatible with your vCenter and ESXi versions.
10. **Engage VMware Support:**
- If the issue persists and you can't identify the root cause, consider opening a support case with VMware. They have the expertise and tools to diagnose and resolve complex issues.
11. **Performance Monitoring:**
- Implement performance monitoring and alerting for your NSX-V environment. Tools like vRealize Operations Manager can help you proactively detect performance issues.
12. **HA and Fault Tolerance:**
- Consider enabling High Availability (HA) and Fault Tolerance (FT) for critical VMs, including Edge Gateway VMs, to provide redundancy and minimize downtime in case of issues.
13. **Regular Maintenance:**
- Schedule regular maintenance windows to perform updates and patches on your VMware infrastructure components. This can help ensure that you have the latest bug fixes and security updates.
Given the complexity of your environment and the intermittent nature of the issue, it's essential to thoroughly investigate each potential cause and monitor the situation closely. VMware support may be your best resource for diagnosing and resolving this issue, especially if it's specific to vCenter-A and not reproducible on vCenter-B.