You are correct that NVMe cache-devices will be "waiting" for SAS Flash devices. Sure. But that will be always be the case. So use the largest cache-devices you can afford. Very large NVMe cache devices can suck up a lot of incoming writes, serving them back from NVMe when those blocks are still "hot" (cache-hit probability is higher). At the same time, because hot blocks are served from cache, de-staging can afford to take longer.
The smaller the NVMe cache device, the more it is under pressure to de-stage as the write cache fills up so quickly in comparison.
Another approach is to have disk-groups with more SAS capacity devices per cache-device. SAS3 and SAS4 are not slow by any means and when the entire system is really put to work and data is spread over more capacity devices, de-staging can be done in parallel (per cache-device). Divide and conquer ;-)
To be honest, most of my customers worry about disk performance, buy NVMe+SAS Flash (some even all-NVMe) only to find out that those devices are picking their noses all day and the load they put on those systems is no where near the limits of the flash devices. Even SAS3 gobbles it up without making a dent. One must have brutally heavy applications to impress modern day flash, NVMe especially.
Also, buying networking equipment with relatively high inter-port latencies can ruin your NVMe happy day. When data, say mirrored, is written, it has to be written to at least 2 nodes and thus devices. That data goes over the network. You will notice it in the statistics if Switch A takes "X milliseconds" to forward packets between ports while Switch B uses half the time or less. Rule of thumb "The more brains a switch has, the more it "thinks", the slower the inter-port latency is". In other words, use "dumb as sh*t" switches with large per port packet buffers as they tend to be better suitable for ip-based storage (what vSAN is).
I've seen customers buy very expensive 25gig networking equipment with a million features, only to stretch the solution between two datacenters, 150 miles apart with 4ms latency between datacenters and then wonder why their super-duper all flash NVMe streched cluster "writes so slowly" (reads are local so they are always fast).