vSAN1

View Only

Enabling Deduplication & Compression on Existing Cluster, any issues?

Brian Gabriel posted Jul 22, 2024 08:07 AM

I've read the article below but was wondering if there is anything to be concerned about due to existing capacity. I wanted to enable this feature as we are at 84% and 72% capacity on both our vSAN (all flash) datastores and there is no budget right now for additional hosts. I understand there will be a performance impact but right now I'm more concerned about running out of space.

* VMware DOCS:
VMware vSphere
- Enable Deduplication and Compression on Existing vSAN Cluster
https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vsan.doc/GUID-BD30E413-F870-4C25-9507-123F59D0A4B8.html

Broadcom Employee Mohammed Viquar Ahmed posted Jul 31, 2024 06:24 AM

Hey Brian,

Well when u enable the Dedup and Comp , the vSAN will evacuate the diskgroup ( considering u are almost full now,if u apply host failure policy u will b out of space ) and then recreate the diskgroup with dedup and comp . better if you can evacuate the data and then enable it.

TheBobkin posted Jul 31, 2024 03:11 PM

Hi Brian,

A couple of points (in no particular order):
Is the performance of these clusters far in excess of what is required/reasonable for the workloads? The reason I ask this is that there is a trade-off in enabling deduplication&compression and if these clusters are already not performant enough or borderline then this should be carefully considered - how much performance decrease there will be (or even if there will be any perceivable difference) is not knowable without in-depth scoping and understanding of the workload, the data and cluster configuration.

Are you aware of how unique or non-unique the data-set is here? This (along with Disk-Group size and layout) will have a direct impact on how much actual space-savings dedupe will save - if your data is highly unique, Disk-Groups are small (and numerous), and other factors (e.g. encryption and/or thick-provisioned data) then you may not save so much space by enabling this (or even get a negative loss e.g. the overhead of this feature uses more space than it saves!). Just stating this as enabling deduplication&compression isn't a guaranteed 'push-button, receive massive space savings' thing. If your data is highly homogenous (e.g. VDI being a good example) and Disk-Groups on the larger side then likely will get a reasonable/good dedupe ratio.

Whether a vsanDatastore is too full or even capable of safely doing rolling evacuate and reformat cannot be answered just on % full:
example 1:
20-node cluster with 3x10TB capacity Disk-Groups per node with vsanDatastore 90% full (540TB used of 600TB size).
Assuming even distribution of data, this cluster would only need 9TB of space free to completely evacuate one Disk-Group at a time (e.g. using Full data Migration (FDM), not Ensure Accessibility (EA)/Reduced Redundancy option) and be fine.
Example 2:
4-node cluster with 1x10TB capacity Disk-Groups per node with vsanDatastore 90% full (36TB used of 40TB size).
Assuming even distribution of data (and not using RAID5 storage policies which would make FDM option not possible) this cluster would need 8TB of space free (in locations that don't violate the storage policy) to completely evacuate one Disk-Group at a time - FDM option wouldn't be an option, even a single large vmdk spread over >2 nodes might make this impossible even with EA option.
Point being, that saying a cluster is at 84% or 72% doesn't really inform much - knowing the number of nodes, policies used and capacity and number of Disk-Groups per node is required to help answer this in an informed manner, anything less is guessing.

You say there is no budget for additional hosts - is there budget for additional disks and do you have free slots on the servers? This is generally more palatable for those allocating funds as it would cost significantly less for numerous reasons (licensing for one).

Have you made a concerted effort to confirm you are not wasting a significant amount of space on these datastores? You would be shocked at how much waste I sometimes find on datastores that administrator was unaware of - some low-hanging fruit: large and numerous snapshots, objects with proportionalCapacity=100 but low physically written space, VMs/objects that have been unregistered from inventory but never deleted, objects that were never properly deleted and remain on disk (can happen for many many reasons). Don't just go by datastore browser, use debug object list output or RVC to be more granular here.