Are the ESG gateway VM on ESX hosts of Cluster1 or 2? If all of them are in Cluster1, is it possible to Vmotion one ESG VM to any ESX host on Cluster2? If there are ESG VM on cluster2 and still the Controller Cluster VTEP table is empty for this VNI Logical switch, then it is possible that controolers view and ESX view are not the same,
For this VNI Logical switch on the NSX Manager CLI, Is it possible to compare the difference between a ESX Host on Cluster1 and another host on Cluster2 this command:
http://cloudmaniac.net/nsx-central-cli-operations-troubleshooting/
sx01-cap-z51.sddc.lab> show logical-switch host host-15 vni 10000 verbose
VXLAN Global States:
Control plane Out-Of-Sync: No --> Control plane Out-of-Sync shoud be No
UDP port: 8472
VXLAN network: 10000
Multicast IP: N/A (headend replication)
Control plane: Enabled (multicast proxy,ARP proxy)
Controller: 10.51.10.72 (up) --> The Controller should be up state
MAC entry count: 0
ARP entry count: 0
Port count: 1
VXLAN port: vdrPort
Switch port ID: 50331655
vmknic ID: 0
For every Logical Switch, one of the 3 has the master role for VNI, so other 2 controllers may not show the table. Is the table checked on the master controller?
What does the Communication Channel Health shows between host and Controllers?
Installation -> Host Preparation-> Selecting Cluster2 -> Actions selecting Communication Channel Health normally shows status as Up with Green arrow for Control Plane Agent to Controller column.
http://www.virtualizationblog.com/vmware-nsx-6-2-communication-channel-health/
Host and NSX Controller: Heartbeats are sent every 30 seconds, if 3 iterations are lost a sync will occur
Are there any messages on the NSX Manager logs or system events?
https://docs.vmware.com/en/VMware-NSX-for-vSphere/6.3/com.vmware.nsx.troubleshooting.doc/GUID-6F1C026C-79FD-490E-BFAD-196228B39AA6.html
If the status of any of the three connections for a host changes, a message is written to the NSX Manager log. In the log message, the status of a connection can be UP, DOWN, or NOT_AVAILABLE (displayed as Unknown in vSphere Web Client). If the status changes from UP to DOWN or NOT_AVAILABLE, a warning message is generated. For example:
2016-05-23 23:36:34.736 GMT+00:00 WARN TaskFrameworkExecutor-25 VdnInventoryFacadeImpl$HostStatusChangedEventHandler:200 - Host Connection Status Changed: Event Code: 1941, Host: esx-04a.corp.local (ID: host-46), NSX Manager - Firewall Agent: UP, NSX Manager - Control Plane Agent: UP, Control Plane Agent - Controllers: DOWN.
If the status changes from DOWN or NOT_AVAILABLE to UP, an INFO message that is similar to the warning message is generated. For example:
2016-05-23 23:55:12.736 GMT+00:00 INFO TaskFrameworkExecutor-25 VdnInventoryFacadeImpl$HostStatusChangedEventHandler:200 - Host Connection Status Changed: Event Code: 1938, Host: esx-04a.corp.local (ID: host-46), NSX Manager - Firewall Agent: UP, NSX Manager - Control Plane Agent: UP, Control Plane Agent - Controllers: UP.
If the control plane channel experiences a communication fault, a system event with one of the following granular failure reason is generated:
- 1255601: Incomplete Host Certificate
- 1255602: Incomplete Controller Certificate
- 1255603: SSL Handshake Failure
- 1255604: Connection Refused
- 1255605: Keep-alive Timeout
- 1255606: SSL Exception
- 1255607: Bad Message
- 1255620: Unknown Error
Also, heartbeat messages are generated from NSX Manager to hosts. A configuration full sync is triggered, if heartbeat between the NSX Manager and netcpa is lost.