vSAN1

 View Only
  • 1.  virtual san device is under permanent failure

    Posted Apr 25, 2017 09:00 AM

    Hello Everyone,

    I need assistance on vSAN alert.

    On one of the Cluster we are getting an error as, Virtual SAN device is under permanent failure.

    - Failed : Physical disk

    - Failed : Component metadata health

    - Failed : Overall disks health

    I have gone through with couple of KBs and community.

    VSAN health check - component metadata health

    Component metadata health check fails with invalid state error (2145347) | VMware KB

    ESXi host :

    VMware ESXi 6.0.0 build-3620759

    VMware ESXi 6.0.0 Update 2

    vSAN Version:

    Name        : VMware-vsan-health           Relocations: (not relocatable)

    Version     : 6.2.0                             Vendor: VMware, Inc.

    Release     : 3547697                       Build Date: Sat Feb 13 03:04:16 2016

    Install Date: Thu Oct 13 18:12:01 2016         Build Host: sc-bld-lin1268.eng.vmware.com

    Group       : Applications/Management       Source RPM: VMware-vsan-health-6.2.0-3547697.src.rpm

    Size        : 52872114                         License: commercial

    Signature   : (none)

    Summary     : VMware Virtual SAN Health Service

    Description :

    VMware Virtual SAN Health Service

    Distribution: (none)

    vmkernel.log

    2017-04-24T10:17:07.853Z cpu16:42460)PLOG: PLOG_QuiesceDevice:8531: : Got quiesce reason 1 on disk naa.600605b00991a3f0202de2c45f900beb:2 5296f94a-d540-efa9-e0e4-d7a2788d97ce

    2017-04-24T10:17:07.853Z cpu7:33656)PLOG: PLOG_CleanupElevator:1473: Waiting for Elevator from UUID 5296f94a-d540-efa9-e0e4-d7a2788d97ce

    2017-04-24T10:17:07.863Z cpu32:2341680)WARNING: LSOM: LSOMEventNotify:6450: Virtual SAN device 5296f94a-d540-efa9-e0e4-d7a2788d97ce has gone offline.

    2017-04-24T10:17:09.857Z cpu4:33662)PLOG: PLOGGarbageCollectDevice:1542: Throttled: Device naa.600605b00991a3f0202de2c45f900beb:1 5296f94a-d540-efa9-e0e4-d7a2788d97ce is prepared to delete

    2017-04-24T10:17:09.857Z cpu4:33662)PLOG: PLOG_FreeDevice:325: PLOG in-mem device 0x430cdf26f030 naa.600605b00991a3f0202de2c45f900beb:1 0x419 5296f94a-d540-efa9-e0e4-d7a2788d97ce is being freed SSD 52cec8b9-4703-a9ad-aa5b-eaccb9b6f0e8

    2017-04-24T10:17:09.867Z cpu9:33662)PLOG: PLOG_FreeDevice:325: PLOG in-mem device 0x430cdf270070 naa.600605b00991a3f0202de2c45f900beb:2 0x41d 5296f94a-d540-efa9-e0e4-d7a2788d97ce is being freed SSD 52cec8b9-4703-a9ad-aa5b-eaccb9b6f0e8

    2017-04-24T10:17:11.369Z cpu36:41665)PLOG: PLOGNotifyDisks:4010: MD 3 with UUID 5296f94a-d540-efa9-e0e4-d7a2788d97ce with state 0 formatVersion 4 backing SSD 52cec8b9-4703-a9ad-aa5b-eaccb9b6f0e8 notified

    2017-04-24T10:17:11.418Z cpu0:7034782)PLOG: PLOGGetRecoveredState:6637: Last LSN recoverd 5296f94a-d540-efa9-e0e4-d7a2788d97ce 46544828

    2017-04-24T10:17:12.421Z cpu0:7034782)PLOG: PLOG_OpenDevHandles:1228: Registered APD callback for naa.600605b00991a3f0202de2c45f900beb:2 5296f94a-d540-efa9-e0e4-d7a2788d97ce

    2017-04-24T10:17:12.424Z cpu0:7034782)PLOG: PLOG_OpenDevHandles:1228: Registered APD callback for naa.600605b00991a3f0202de2c45f900beb:2 5296f94a-d540-efa9-e0e4-d7a2788d97ce

    2017-04-24T10:17:12.425Z cpu0:7034782)PLOG: PLOGInitAndAnnounceMD:6987: Successfully announced VSAN MD (naa.600605b00991a3f0202de2c45f900beb:2) with UUID 5296f94a-d540-efa9-e0e4-d7a2788d97ce

    2017-04-24T10:17:12.530Z cpu26:43820)WARNING: LSOM: LSOMEventNotify:6440: Virtual SAN device 5296f94a-d540-efa9-e0e4-d7a2788d97ce is under permanent error.

    2017-04-24T10:17:07.853Z cpu8:7034742)PLOG: PLOGValidateDiskGroupOpFn:1415: Issuing PLOG Op DISKGROUP UNMOUNT for MD :naa.600605b00991a3f0202de2c45f900beb

    2017-04-24T10:17:07.853Z cpu16:42460)PLOG: PLOG_QuiesceDevice:8531: : Got quiesce reason 1 on disk naa.600605b00991a3f0202de2c45f900beb:2 5296f94a-d540-efa9-e0e4-d7a2788d97ce

    2017-04-24T10:17:07.853Z cpu32:41665)LSOM: LSOMEventNotify:6413: Throttled: Waiting for component cleanup

    2017-04-24T10:17:07.853Z cpu7:33656)PLOG: PLOG_CleanupElevator:1473: Waiting for Elevator from UUID 5296f94a-d540-efa9-e0e4-d7a2788d97ce

    2017-04-24T10:17:07.863Z cpu32:2341680)WARNING: LSOM: LSOMEventNotify:6450: Virtual SAN device 5296f94a-d540-efa9-e0e4-d7a2788d97ce has gone offline.

    2017-04-24T10:17:07.863Z cpu32:2341680)LSOM: LSOMEventNotify:6519: Throttled: Waiting for open component countto drop to zero

    2017-04-24T10:17:07.872Z cpu29:36378)PLOG: PLOGIsPlogUnloading:100: Elevator exit for device is set

    2017-04-24T10:17:07.872Z cpu29:36378)PLOG: PLOGElevBaseHandler:617: Elevator exiting due to unload operation

    2017-04-24T10:17:07.974Z cpu8:33711)Global: Virsto_DetachInstance:301: INFO: Detaching Virsto Instance 0x430b680a9060 from PLOG device

    2017-04-24T10:17:08.855Z cpu21:33659)PLOG: PLOG_CleanupDefence:6346: Waiting for defence task for naa.600605b00991a3f0202de2c45f900beb:1

    2017-04-24T10:17:09.856Z cpu21:33659)Destroyed VSAN Slab PLOGIORetry_slab_0000000000 (maxCount=0 failCount=0)

    2017-04-24T10:17:09.857Z cpu21:33659)Destroyed VSAN Slab PLOGIORetry_slab_0000000001 (maxCount=1 failCount=0)

    2017-04-24T10:17:09.857Z cpu21:33659)ScsiEvents: 353: EventSubsystem: Device Events, Event Mask: 20, Parameter: 0x430cdde547e0, UnRegistered!

    2017-04-24T10:17:09.857Z cpu3:7034742)PLOG: PLOGValidateDiskGroupOpFn:1415: Issuing PLOG Op DISKGROUP UNMOUNT for MD :naa.600605b00991a3f0202de2c45f900beb

    2017-04-24T10:17:09.857Z cpu4:33662)PLOG: PLOGGarbageCollectDevice:1542: Throttled: Device naa.600605b00991a3f0202de2c45f900beb:1 5296f94a-d540-efa9-e0e4-d7a2788d97ce is prepared to delete

    2017-04-24T10:17:09.857Z cpu4:33662)PLOG: PLOG_FreeDevice:325: PLOG in-mem device 0x430cdf26f030 naa.600605b00991a3f0202de2c45f900beb:1 0x419 5296f94a-d540-efa9-e0e4-d7a2788d97ce is being freed SSD 52cec8b9-4703-a9ad-aa5b-eaccb9b6f0e8

    2017-04-24T10:17:09.857Z cpu4:33662)PLOG: PLOG_FreeDevice:496: Throttled: Waiting for ops to complete on device: 0x430cdf26f030 naa.600605b00991a3f0202de2c45f900beb:1

    2017-04-24T10:17:09.867Z cpu9:33662)PLOG: PLOG_FreeDevice:325: PLOG in-mem device 0x430cdf270070 naa.600605b00991a3f0202de2c45f900beb:2 0x41d 5296f94a-d540-efa9-e0e4-d7a2788d97ce is being freed SSD 52cec8b9-4703-a9ad-aa5b-eaccb9b6f0e8

    2017-04-24T10:17:09.867Z cpu9:33662)PLOG: PLOG_FreeDevice:454: Unregistering diskAttrHandle:0x430cdf2708b0 on disk naa.600605b00991a3f0202de2c45f900beb

    2017-04-24T10:17:09.867Z cpu9:33662)LSOMCommon: LSOM_UnregisterDiskAttrHandle:136: DiskAttrHandle:0x430cdf2708b0 is removed from moduleID 86 for disk:naa.600605b00991a3f0202de2c45f900beb

    2017-04-24T10:17:09.868Z cpu9:33662)Destroyed VSAN Slab PLOGIORetry_slab_0000000000 (maxCount=26 failCount=0)

    2017-04-24T10:17:09.868Z cpu9:33662)Destroyed VSAN Slab PLOGIORetry_slab_0000000001 (maxCount=9 failCount=0)

    2017-04-24T10:17:09.868Z cpu9:33662)ScsiEvents: 353: EventSubsystem: Device Events, Event Mask: 20, Parameter: 0x430cdf2720d0, UnRegistered!

    2017-04-24T10:17:09.906Z cpu28:33528)WARNING: DVFilter: 1181: Couldn't enable keepalive: Not supported

    2017-04-24T10:17:09.982Z cpu46:7034760)VSAN Device Monitor: Successfully unmounted failed VSAN disk naa.600605b00991a3f0202de2c45f900beb

    Regards,

    Ali



  • 2.  RE: virtual san device is under permanent failure

    Posted Apr 25, 2017 10:22 AM

    Greetings!

    This is a drive failure case and you need to replace the faulted drive.

    ______________________

    Was your question answered correctly? If so, please remember to mark your question as "Correct" or "Helpful" when you get the appropriate answer. This helps others searching for a similar issue.

    Cheers!

    Shivam



  • 3.  RE: virtual san device is under permanent failure

    Posted Apr 25, 2017 11:23 AM

    Check if the device in question is shown as predictive failure or failed in hardware logs. Replace the disk if you see errors at hardware level.

    Ensure the firmware of the devices are supported for vSAN as per VMWare HCL and update them if required.



  • 4.  RE: virtual san device is under permanent failure

    Posted Apr 26, 2017 06:05 AM

    as mentioned above.

    you need to replace drive. but make sure you follow step.

    VMware Virtual SAN Operations: Replacing Disk Devices - Virtual Blocks - VMware Blogs

    Login to the vSphere Web Client

    Navigate to the Hosts and Clusters view and select the Virtual SAN enabled cluster

    Go to the manage tab and select Disk management under the Virtual SAN section

    Select the disk group with the failed magnetic device

    Select the failed magnetic device and click the delete button

    take out failed drive from your host and replace it. make sure esxi detected new drive, than re-add newly replace drive to disk group

    from your screenshot, you are using pass-through configuration so that you don't need extra step for raid 0 device. above step will be enough.