I'm trying to determine at what level I should be warning and alarming on host memory utilization. We're currently warning at 82% but I regularly see hosts in a cluster above 82% with a warning yet another host in the cluster may be sitting at 65% yet DRS isn't balancing the workload. This leads me to think that DRS doesn't consider this to a problem, so perhaps my thresholds are too low.
From the memory management guide, "ESX maintains four host free memory states: high, soft, hard, and low, which are reflected by four thresholds: 6%, 4%, 2%, and 1% of host memory respectively." This would suggest that we may not even want to warn until we get below the high threshold of 94% utilization but this seems awfully high to me - my personal best practice has been to keep any resource below 70% but perhaps I'm not understanding host memory management well enough yet.
At what level do other people starting warning and alarming? How are you determining that you're "out of memory"? Or are you not monitoring memory utilization at all and focusing on ballooning and swapping instead (which could be an indication that you're already in trouble)?
Opinions greatly appreciated (and I'm sure that there are a lot of differing opinions).
Thanks!