You're correct that the VXLAN protocol itself operates a little further up the stack. When people refer to it as a L2 technology, they're really referring to its inherent ability to encapsulate L2 traffic so that it can be forwarded across L3 boundaries to essentially decouple the L2 domain from the underlying physical infrastructure. Assuming that the infrastructure can accommodate the slight uptick in MTU size to accommodate the encapsulation overhead (VMware NSX requires at least a 1600 byte MTU), then the use of VXLAN is transparent to the physical infrastructure as the encapsulated traffic is just normal UDP/IP traffic.