Hi Bob! Thank you for the quick reply. I will try to reply all the questions:
"I see that many components takes very long time to succesfully resync"
If you could be more specific it may help narrow this down:
- Are the same Objects resyncing as were a few days ago or different Objects?
Same Objects
- Is the resync 'looping' for some Objects (e.g. it starts at 200GB to resync, gets down to 50GB then goes back to 200GB)?
Now, it has 243.49 GB to resync. I refresh and it changes, similar to a loop
- Are the Objects in question Inaccessible? - If a resync (e.g. repair) for an Object started and the only other full replica of the data was lost, these will not progress resync as there is no data to read from.
Some VMs are in reduced availlability, but they are working OK
- What is the given 'intent' of the resync - if you go to Cluster > Monitor > Resyncing components , it should state the intent (e.g. compliance, rebalance, repair etc.).
Because for any reason, backups of VMs (via vSphere Data Protection installed in solution) did not work with this VM that there are syncing
- What build version of vCenter and ESXi are in use here? (more helpful to state 'build:15820472' as opposed to '6.7')
ESXi version: 6.5.0 Update 1 (Build 5969303)
vCSA version: 6.5.0 (Build 7312210)
vSAN version (Health Service): 6.6.1
- If you go to Cluster > Monitor > vSAN > Health - do you have any triggered red alerts? If you do then please attach/PM screenshot of this with the drop-down details shown.
"I see many post in this forum and I check that there would be commands like vsan.resync... It affects to normal functionality of VMs in the moment you put these commands?"
Commands such as vsan.resync_dashboard , vsan.disks_stats and vsan.obj_status_report are basically just 'get' commands and do not cause any impact when ran - please if you can attach/PM the output of these 3 commands run against the cluster in question.
1. vsan.resync_dashboard (I omitted VM names for security reasons)
/vCenter IP Address/Datacenter/computers/CLUSTER> vsan.resync_dashboard .
2020-05-08 14:01:53 +0200: Querying all VMs on vSAN ...
2020-05-08 14:01:53 +0200: Querying all objects in the system from esxi01-... ...
2020-05-08 14:01:53 +0200: Got all the info, computing table ...
+----------------------------------------------------------------------------------------+-----------------+---------------+
| VM/Object | Syncing objects | Bytes to sync |
+----------------------------------------------------------------------------------------+-----------------+---------------+
| one_vm | 1 | |
| [vsanDatastore] 9ca7e05a-310b-c9db-05c0-98f2b325f0e0/one_vm.vmdk | | 63.72 GB |
| two_vm | 1 | |
| [vsanDatastore] a17d065b-bee5-42e5-aaaa-98f2b325f0e0/two_vm.vmdk | | 21.10 GB |
| three_vm | 1 | |
| [vsanDatastore] 467a0d5b-5a9c-2a0e-4d32-98f2b325f0e0/three_vm.vmdk | | 95.98 GB |
| four_vm | 1 | |
| [vsanDatastore] 019cd95a-e9a9-3f07-0589-1c98ec1de210/four_vm.vmdk | | 56.03 GB |
| vcenter.... | 1 | |
| [vsanDatastore] 578b585a-41e8-78b3-4e5f-98f2b325f0e0/vcenter....vmdk | | 6.59 GB |
+----------------------------------------------------------------------------------------+-----------------+---------------+
| Total | 5 | 243.41 GB |
+----------------------------------------------------------------------------------------+-----------------+---------------+
2. vsan_disks_stats (I omitted ESXi names for security reasons)
+---------------------+--------------------------+-------+------+-----------+---------+----------+------------+----------+----------+------------+---------+----------+---------+
| | | | Num | Capacity | | | Physical | Physical | Physical | Logical | Logical | Logical | Status |
| DisplayName | Host | isSSD | Comp | Total | Used | Reserved | Capacity | Used | Reserved | Capacity | Used | Reserved | Health |
+---------------------+--------------------------+-------+------+-----------+---------+----------+------------+----------+----------+------------+---------+----------+---------+
| mpx.vmhba1:C2:T0:L0 | esxi01-... | SSD | 0 | 894.25 GB | 0.00 % | 0.00 % | N/A | N/A | N/A | N/A | N/A | N/A | OK (v5) |
| mpx.vmhba1:C2:T1:L0 | esxi01-... | MD | 16 | 846.94 GB | 19.16 % | 2.75 % | 2540.81 GB | 19.18 % | 1.79 % | 8942.50 GB | 4.42 % | 0.26 % | OK (v5) |
| mpx.vmhba1:C2:T2:L0 | esxi01-... | MD | 16 | 846.94 GB | 19.16 % | 1.32 % | 2540.81 GB | 19.18 % | 1.79 % | 8942.50 GB | 3.14 % | 0.12 % | OK (v5) |
| mpx.vmhba1:C2:T3:L0 | esxi01-... | MD | 15 | 846.94 GB | 19.16 % | 1.32 % | 2540.81 GB | 19.18 % | 1.79 % | 8942.50 GB | 7.79 % | 0.12 % | OK (v5) |
+---------------------+--------------------------+-------+------+-----------+---------+----------+------------+----------+----------+------------+---------+----------+---------+
| mpx.vmhba1:C2:T0:L0 | esxi02-... | SSD | 0 | 894.25 GB | 0.00 % | 0.00 % | N/A | N/A | N/A | N/A | N/A | N/A | OK (v5) |
| mpx.vmhba1:C2:T1:L0 | esxi02-... | MD | 11 | 846.94 GB | 10.52 % | 1.31 % | 2540.81 GB | 10.53 % | 3.53 % | 8942.50 GB | 1.88 % | 0.12 % | OK (v5) |
| mpx.vmhba1:C2:T3:L0 | esxi02-... | MD | 19 | 846.94 GB | 10.52 % | 5.34 % | 2540.81 GB | 10.53 % | 3.53 % | 8942.50 GB | 3.65 % | 0.51 % | OK (v5) |
| mpx.vmhba1:C2:T2:L0 | esxi02-... | MD | 12 | 846.94 GB | 10.52 % | 3.94 % | 2540.81 GB | 10.53 % | 3.53 % | 8942.50 GB | 3.25 % | 0.37 % | OK (v5) |
+---------------------+--------------------------+-------+------+-----------+---------+----------+------------+----------+----------+------------+---------+----------+---------+
| mpx.vmhba1:C2:T0:L0 | esxi03-... | SSD | 0 | 894.25 GB | 0.00 % | 0.00 % | N/A | N/A | N/A | N/A | N/A | N/A | OK (v5) |
| mpx.vmhba1:C2:T2:L0 | esxi03-... | MD | 16 | 846.94 GB | 8.89 % | 3.44 % | 2540.81 GB | 8.89 % | 3.45 % | 8942.50 GB | 1.78 % | 0.33 % | OK (v5) |
| mpx.vmhba1:C2:T3:L0 | esxi03-... | MD | 14 | 846.94 GB | 8.89 % | 2.03 % | 2540.81 GB | 8.89 % | 3.45 % | 8942.50 GB | 2.39 % | 0.19 % | OK (v5) |
| mpx.vmhba1:C2:T1:L0 | esxi03-... | MD | 14 | 846.94 GB | 8.89 % | 4.88 % | 2540.81 GB | 8.89 % | 3.45 % | 8942.50 GB | 2.45 % | 0.46 % | OK (v5) |
+---------------------+--------------------------+-------+------+-----------+---------+----------+------------+----------+----------+------------+---------+----------+---------+
3. vsan.obj_status_report /vCenter IP Address/Datacenter Name/computers/Cluster Name
2020-05-08 14:18:15 +0200: Querying all VMs on vSAN ...
2020-05-08 14:18:15 +0200: Querying all objects in the system from esxi01-... ...
2020-05-08 14:18:16 +0200: Querying all disks in the system from esxi01-.... ...
2020-05-08 14:18:16 +0200: Querying all components in the system from esxi01-....es ...
2020-05-08 14:18:17 +0200: Querying all object versions in the system ...
2020-05-08 14:18:20 +0200: Got all the info, computing table ...
Histogram of component health for non-orphaned objects
+-------------------------------------+------------------------------+
| Num Healthy Comps / Total Num Comps | Num objects with such status |
+-------------------------------------+------------------------------+
| 3/3 (OK) | 38 |
| 4/4 (OK) | 3 |
| 6/6 (OK) | 1 |
+-------------------------------------+------------------------------+
Total non-orphans: 42
Histogram of component health for possibly orphaned objects
+-------------------------------------+------------------------------+
| Num Healthy Comps / Total Num Comps | Num objects with such status |
+-------------------------------------+------------------------------+
| 0/3 (Unavailable) | 1 |
+-------------------------------------+------------------------------+
Total orphans: 1
Total v1 objects: 0
Total v2 objects: 0
Total v2.5 objects: 0
Total v3 objects: 0
Total v5 objects: 43
"but for example when I try to clone a VM with resync components already working, there is no way, it fails to do the process (I understand because in this moment is doing resync components)."
There shouldn't be an issue with cloning a VM/vmdk if it is resyncing, this indicates a more severe issue than just resync being slow, this indicates that potentially the Objects are Inaccessible or in an otherwise impaired data state.
Please attach/PM the output of this grep against a log file run on all 3 hosts:
# grep -iE 'checks|apd|perm|heartbeat|lost|uplink' /var/log/vobd.log
I attached the three logs files (one for each esxi)