I understood you to mean that you use 1 uplink VLan for each VRF in Design 2, so anti-spoofing would not be an issue. Maybe I misinterpreted your graphics, if that is the case, then I would clearly go for option 4. Sorry for the confusion.
Design 5 will probably give you the best performance, but it depends on your firewall and whether you want to disable anti-spoofing or not. This may be a security issue. Otherwise you would have to work with AS-PATH prepend and local preference to avoid asyncronous routing. This means that effectively only one VLAN is used and only provides fast failover capacity; you would have 4 routes, but only 2 would be preferred.
BGP load balancing is always source dependent, if NSX uses ECMP, then the firewall must also use ECMP, otherwise only your outgoing traffic will do reasonable load balancing. Not every firewall actually uses ECMP.
Most of the time I don't peer directly with the firewall, but with the ToR switches and also use VRFs there if I need to. My firewall is then usually connected to my ToR via LACP and only uses one VLAN per VRF. But it all depends on your overall environment, your firewall and other decisions.
I have also built a direct peering with NSX and Checkpoint and then used 2 uplink VLANs for this. Anti-spoofing was deactivated on the downlink interfaces of the checkpoint. In addition, the downlink interfaces were still in an LACP bond. So I had 4x25 Gb/s at the checkpoint distributed over 2 VLANs. You have to explicitly switch on ECMP at the checkpoint.