I just ran into a very similar issue. In the end, it was a whole combination of different things that caused the issue. First, using vmxnet3 NICs on the VMs did permit UFI VMs to boot, it just took in excess of five minutes to complete the process. I could never, or never had enough patience, to let e1000 type NICs complete booting. It really looked like they would hang mid-way through the process.
Based on quite a bit of research, I started playing around with the different TFTP options, both in the WDS properties GUI and by creating registry keys to set default values (which I'm not ever sure applies in Server 2016 any more). A lot of what I read indicated that TFTP clients are sensitive to fragmented packets, and also just the nature of TFTP requiring ACKs sent for every frame. I'm operating on a 10g network, and the DHCP, WDS, and client VM are all virtual and hosted on the same vSphere host, so I want to take full advantage of the max speed possible.
Thus, I enabled jumbo frames support on the vSwitch that all of these VMs were connected to (which the physical switches in my environment were already set to), then set WDS' Max Frame size to 8192, which keeps it under the 9000 MTU threshold and thereby prevents fragmentation. Additionally, I disabled the Variable Window negotiation setting and hard-coded a registry value to set that to 8 windows.
My EFI now boots as fast, if not faster than BIOS. And I tried almost every possible combination of different settings to get this working, and finally settled on this.
My setup definitely did not like having the absolute max value of 16384 and the problem persisted until I set WDS Max Frame size to be under the MTU threshold. For most Ethernet networks, the default MTU (and also VMware's default) is 1500, so you should probably try setting the Max Frame rate to 1024 to keep the frames from fragmenting if you want to avoid jumbo frames. Though in our setup, jumbo frames also really increases the speed of the TFTP transfer.