VMware NSX

  • 1.  vm lost network when pass through nsx-t distributed firewall

    Posted Apr 30, 2021 02:09 AM

    I have a environment with 4 hosts with name host1, host2, host3, host4.

    vcenter 7.0.2

    ESXi 7.0.1, 17551050

    NSX-T 3.1.2

    I just enabled the distributed firewall based vlan. I create some segements. All vms works well expcept one vm named test-213 which located on host2. the vm test-213 will lost work and can not reach the gateway. I captured the packet with the nsxcli.

    nsxcli -c start capture dvfilter nic-8898398-eth0-vmware-sfw.2 stage pre expression ipproto 0x01

    the packet is ok before the dirtributed firewall. but I can't capture any packet after the distributed firewall with nsxcli.

    nsxcli -c start capture dvfilter nic-8898398-eth0-vmware-sfw.2 stage post expression ipproto 0x01

    If I migrate the vm test-213 to other hosts or I reboot the host2. the vm works well. But after a moment, the vm test-213 will lost network again.

    The difference vm test-213 with other vms is that it is transfering large files. the distributed firewall policy for the vm test-213 is permitted any.

    is there any method to know why the vm lost network when it pas through the distributed firewall. and if the nsx-t distributed firewall and distributed IDS/IPS can not support vms with large throughput well.

     

     

     

     



  • 2.  RE: vm lost network when pass through nsx-t distributed firewall

    Posted Apr 30, 2021 04:21 AM

    Try using the traceflow tool in the UI, it will inject packets into the dataplane and if it is getting caught in any dfw rule it will tell you.  



  • 3.  RE: vm lost network when pass through nsx-t distributed firewall

    Posted Apr 30, 2021 09:39 AM

    can not use the traceflow tool in the ui. there is a error:

    Traceflow request failed. The request might be cancelled because it took more time than normal. Please retry.
    Error Message: Traceflow intent /infra/traceflows/772dfaf0-a997-11eb-b1fe-effdc2a85c0f realized on enforcement point /infra/sites/default/enforcement-points/default with error Traceflow does not support vlan switch for port: LogicalPort/f4923903-da25-4af0-8db5-5b2bd82f9f8e /infra/segments/OfficeVMs/ports/default:f4923903-da25-4af0-8db5-5b2bd82f9f8e.
     
    the segment used a vsphere 7 vds, not n-vds.


  • 4.  RE: vm lost network when pass through nsx-t distributed firewall

    Broadcom Employee
    Posted May 01, 2021 04:45 AM

    Look like you are using VLAN backed network not overlay networks. Keeping that aside can you confirm below points 

    1. Where is VM gateway configured? 

    2. Have you tried excluding the VM from DFW ? 

    3. When VM is not able to reach the L3 address, do we have L2 learning working fine? Is the issue specific to a VM ? 

     

     



  • 5.  RE: vm lost network when pass through nsx-t distributed firewall

    Posted May 06, 2021 02:35 AM

    Thanks Sreec.

    yes, I'm using vlan backed network not overlay networks. the gateway is configured in a physical switch.

    the vm can learning mac address of the gateway or other vm in the same subnetwork. but it can not access other vm or gateway.

    I have tried to excluding the vm from DFW, the vm is ok. 

    when I enabled DFW for the vm and there is not any limit, the DFW policy is any to any and permit any, I've captured it's packets in and off the DFW.  the DFW logs is ok, it shows mach pass.

    The problem is clear, the DFW drop or don't transmit the vm's packet.

    Most of the vms works fine except some ones. these problem vms in different networks. So far, there are 3 vms have the problem. if I change the vm's network to DVS's normal portgroup they works fine. If I change back to the sgement portgroup, the problem will appear again.

     



  • 6.  RE: vm lost network when pass through nsx-t distributed firewall

    Broadcom Employee
    Posted May 06, 2021 05:02 AM

    Thanks for sharing it. Can you also perform the below check? 

    1. Connect VM to logical Switch and issue below commands from the host where VM is residing 

    summarize-dvfilter | grep -A4 VM NAME - To get the slot name ( Eg: name: nic-4790914-eth0-vmware-sfw.2) 

    vsipioctl getfwconfig -f  slot name 

    Does it show any drop rule? 

    2. Connect VM to DV port group and execute the same step 

    3. Remove VM from VC  inventory and add it back and connect to logical switch and test the connectivity once again. 

     



  • 7.  RE: vm lost network when pass through nsx-t distributed firewall

    Posted May 06, 2021 06:59 AM

    when the vm connect to the logical switch of nsx-t. all the rules's action is accept. Even the ids's action is just to detect.

    191.212-rule.jpg

    when the vm connect to the DV port group it shows no rules and works fine.

    I removed the vm from vc and add it back again, the problem is same when the vm connected to the nsx-t segement portgroup.



  • 8.  RE: vm lost network when pass through nsx-t distributed firewall

    Posted May 06, 2021 08:11 AM

    I noticed a alarm in the NSX-T UI.

    The disk usage for the Manager node disk partition /image has reached.

    all the 3 manager nodes' /image partition usage reached 100%.

    p1.jpg

    P2.jpg

    there are many named java_pidxxx files which cost all the space of the /image.

    p3.jpg

    Can I delete these java_pid files? and does the 100% usage of the /image caused those weird questions?



  • 9.  RE: vm lost network when pass through nsx-t distributed firewall
    Best Answer

    Posted May 08, 2021 12:16 AM

    These days, I've tried a lot of things. the DFW droped or didn't tranfer the specific vms' packets.

    At last, I update the vCenter and ESXi to the newst release 7.0U2a.

    Now, all the vms works fine. So, I guess these specific vms may trigger some bugs.