Reposting this for 3rd time as the post keeps vanishing.
"Must say...It's annoying. Please forgive my complaining, but this board is flaky. Has so many quirks. I typed a nice, detailed response to Duncan and reposted it 3 times. "
Sure, but bear in mind it has existed on this new platform for all of a week now (migrated from Jive to Khoros with a LOT of changes) - not to knock this platform or any other, but I have worked with enough 'put your words in the box, we keep them safe...' platforms/forms/websites/youNameIts that if I am writing something longer than 2 sentences then it is being done locally in Notepad++, this isn't always the platforms fault, maybe your login token for X was just about to expire when you clicked the button.
"But I don't see why if the SFTT is 0. Zero to me, and to every other engineer who I know works with vSAN and sells it as an SE, means that, once node 2 in my example fails, vSAN should failover to the mirrored VMDK and that's that"
Because it is the SFTT e.g. Secondary to the Primary (the default cluster settings which still exists and has been the default all along) - if you have PFTT=0, SFTT=0, this behave exactly as it would have before SFTT even existed e.g. it WOULD failover to running off the remaining data-replica and it WOULD try to rebuild the lost replica if there was still available space and available Fault Domains (without violating the SP).
"If even with an SFTT of 0 vSAN is still going to try to rebuild at the site where the failure occurred, then why add extra nodes and waste money?"
So that multiple concurrent or staggered failures can be withstood even if the other site fails, sure this costs more than less but so does anything else that provides durability-at-depth.
"Just add some extra disk on the 2 nodes at each site and let the rebuild occur. "
This won't particularly help if a motherboard, boot device or a shared controller is the point of failure. If you mean add them just after the failure occurred then maybe you are working with folks that can replace/add disks a lot faster than is realistic for most (especially in the current climate) - this also means you have to choose hardware that has ample free slots and that these are readily accessible.
"In fact, some people think that the site should have failed altogether once node 2 failed with an SFTT of 0 for that site -- and all VMs should fail to the secondary site."
Are you talking about in a 2+2+1 or 3+3+1? I would think that most (vSAN customers or otherwise) would prefer some of their data redundant shortly after a failure vs NONE of their data redundant.
"If 1 node fails, your SPBM will be in violation (FTT of 0) AND if another failure a\occurs, the whole cluster will be down because you will have lost quorum. "
As I said above - PFTT=0, SFTT=0 works the exact same as it did before SFTT was even a thing, nothing has changed in this regard.