We have a vSAN stretched cluster running on ESXi 6.5u1e and vSAN 6.6.1.
Currently as we are moving VMs across to it the script is taking 20+ minutes to apply the storage policy. Looking in to it further none of the VMs in the cluster have a valid state for SPBM - they all have "Unknown" listed. I checked VASA and found one node had a IOFILTER marked as "Offline". I removed this as I could not get it to scan, restarted the node and then upon reboot it has logged the provider correctly with an "Online" state.
This process seems to have cleared the errors in the sps.log on the VCSA for this node but I still cannot refresh compliance information for any VM in this cluster. I get the following stream of errors in the sps.log:
[code]
2018-03-12T13:32:30.553Z [pool-32-thread-3] DEBUG opId=CheckVmRollupComplianceResolver-applyOnMultiEntity-1715716-ngc:70015178 com.vmware.vim.sms.provider.ProviderFactory - [getActiveProvider] Retry for the 24th time...
2018-03-12T13:32:30.578Z [pool-32-thread-3] ERROR opId=CheckVmRollupComplianceResolver-applyOnMultiEntity-1715716-ngc:70015178 com.vmware.vim.sms.provider.ProviderCache - [getProvider] No provider exists with uid: b8b50430-5aac-47ce-a1eb-20ffff99f497
2018-03-12T13:32:30.579Z [pool-32-thread-3] ERROR opId=CheckVmRollupComplianceResolver-applyOnMultiEntity-1715716-ngc:70015178 com.vmware.vim.sms.provider.ProviderFactory - [getActiveProvider] Failed to retrieve the provider!
(vim.fault.NotFound) {
faultCause = null,
faultMessage = null
}
at com.vmware.vim.sms.provider.ProviderFactory.getProvider(ProviderFactory.java:348)
at com.vmware.vim.sms.provider.ProviderFactory.getActiveProvider(ProviderFactory.java:390)
at com.vmware.vim.sms.provider.ProviderFactory.getVasaClientRetryProxy(ProviderFactory.java:365)
at com.vmware.vim.sms.policy.PolicyManagerImpl.queryComplianceResult(PolicyManagerImpl.java:174)
at com.vmware.sps.pbm.impl.LocalSMSServiceImpl.queryComplianceResult(LocalSMSServiceImpl.java:68)
at com.vmware.sps.pbm.compliance.ObjectStorageComplianceTask.run(ObjectStorageComplianceTask.java:160)
at com.vmware.vim.storage.common.task.opctx.RunnableOpCtxDecorator.run(RunnableOpCtxDecorator.java:38)
at com.vmware.vim.storage.common.task.opctx.RunnableOpCtxDecorator.run(RunnableOpCtxDecorator.java:38)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-03-12T13:32:30.992Z [pool-32-thread-1] DEBUG opId=CheckVmRollupComplianceResolver-applyOnMultiEntity-1715505-ngc:70015176 com.vmware.vim.sms.provider.ProviderFactory - [getActiveProvider] Retry for the 26th time...
2018-03-12T13:32:31.027Z [pool-32-thread-1] ERROR opId=CheckVmRollupComplianceResolver-applyOnMultiEntity-1715505-ngc:70015176 com.vmware.vim.sms.provider.ProviderCache - [getProvider] No provider exists with uid: b8b50430-5aac-47ce-a1eb-20ffff99f497
2018-03-12T13:32:31.028Z [pool-32-thread-1] ERROR opId=CheckVmRollupComplianceResolver-applyOnMultiEntity-1715505-ngc:70015176 com.vmware.vim.sms.provider.ProviderFactory - [getActiveProvider] Failed to retrieve the provider!
(vim.fault.NotFound) {
faultCause = null,
faultMessage = null
}
at com.vmware.vim.sms.provider.ProviderFactory.getProvider(ProviderFactory.java:348)
at com.vmware.vim.sms.provider.ProviderFactory.getActiveProvider(ProviderFactory.java:390)
at com.vmware.vim.sms.provider.ProviderFactory.getVasaClientRetryProxy(ProviderFactory.java:365)
at com.vmware.vim.sms.policy.PolicyManagerImpl.queryComplianceResult(PolicyManagerImpl.java:174)
at com.vmware.sps.pbm.impl.LocalSMSServiceImpl.queryComplianceResult(LocalSMSServiceImpl.java:68)
at com.vmware.sps.pbm.compliance.ObjectStorageComplianceTask.run(ObjectStorageComplianceTask.java:160)
at com.vmware.vim.storage.common.task.opctx.RunnableOpCtxDecorator.run(RunnableOpCtxDecorator.java:38)
at com.vmware.vim.storage.common.task.opctx.RunnableOpCtxDecorator.run(RunnableOpCtxDecorator.java:38)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[/code]
It will eventually reach 60 attempts and then the compliance check fails in the web client. Anyone seen this behaviour before or know a different angle I can take with troubleshooting it?