ESXi

 View Only
Expand all | Collapse all

Lost access to volume … due to connectivity issues

  • 1.  Lost access to volume … due to connectivity issues

    Posted Nov 26, 2016 07:56 PM

    Hello,

    I have a brand new SYS-E200-8D Micro Server eqipped with 64 GB RAM and a 1 TB Samsung EVO 850 PRO SATA 6 SSD and installed ESXi 6.5 on it.

    I have currently deployed one Windows Server 2016 RTM Virtual Machine on the SSD that acts as the Datastore.

    But as soon as I do storage performance testing within the VM (I use CrystalDiskMark) I get the following warnings in the Monitor tab of the Datastore:

    Successfully restored access to volume 58399b3f-53265d09-9851-0cc47aca3b52 (datastore1) following connectivity issues.Saturday, November 26, 2016, 20:47:22 +0100Warning
    Lost access to volume 58399b3f-53265d09-9851-0cc47aca3b52 (datastore1) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.Saturday, November 26, 2016, 20:47:22 +0100Warning
    Successfully restored access to volume 58399b3f-53265d09-9851-0cc47aca3b52 (datastore1) following connectivity issues.Saturday, November 26, 2016, 20:46:59 +0100Warning
    Lost access to volume 58399b3f-53265d09-9851-0cc47aca3b52 (datastore1) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.Saturday, November 26, 2016, 20:46:58 +0100Warning
    Successfully restored access to volume 58399b3f-53265d09-9851-0cc47aca3b52 (datastore1) following connectivity issues.Saturday, November 26, 2016, 20:45:02 +0100Warning
    Lost access to volume 58399b3f-53265d09-9851-0cc47aca3b52 (datastore1) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.Saturday, November 26, 2016, 20:45:01 +0100Warning

    CrystalDiskMark also reports very bad throughput for the C drive, which is physically stored through the vmdk file on the Datastore (which makes sense because of the above mentioned warnings).

    Any ideas what could happen here and why I get these warnings?

    I have also a Mac Mini setup (16 GB RAM, 256 GB Samsung SATA 6 SSD, ESXi 6.5 installation) where these warnings don't occur when I do storage performance tests.

    Thanks for your help & input,

    -Klaus



  • 2.  RE: Lost access to volume … due to connectivity issues

    Posted Nov 27, 2016 07:13 AM

    Actually, "Lost access to volume" has many cause but at first step, you should check your hardware.

    It's recommended that check FC ports on SAN switch and HBA on server and also fiber cables in a SAN environment.

    So, you can change your SATA port on mainboard for test.

    Also read this KB for more information.



  • 3.  RE: Lost access to volume … due to connectivity issues

    Posted Nov 27, 2016 11:22 AM

    Hi Codeman1980,

    I had the same problem on ESXi 6.5.

    I have 2 hard drives on the same server, 1 SSD and 1 HDD. With the SSD I had this error when I was writing on it but not with the HDD.

    After dissabled the vmw_ahci driver, no more errors.

    "esxcli system module set --enabled=false --module=vmw_ahci"

    More info:

    http://www.nxhut.com/2016/11/fix-slow-disk-performance-vmwahci.html

    Regards,

    Dan



  • 4.  RE: Lost access to volume … due to connectivity issues

    Posted Nov 27, 2016 02:45 PM

    Hello Dan,

    Thanks for your answer.

    I've tried to disable the module (and restarted the ESXi host), but the problem is still there.

    Interestingly enough, the storage throughput is much lower as when the driver was enabled - so the opposite happened in my scenario as described in the mentioned blog posting.

    I have also tried to install a Windows Server 2012 as a bare metal installation on the SYS-E200-8D to see if the problem is depend to ESXi.

    Unfortunately Windows Server 2012 also reports some problems with the storage in the Event Viewer like the following:

    *) Reset to device device raidport0 was issued

    *) the io operation at logical block address for disk was retried

    So it seems that the problem is not directly related to ESXi, but more related to the SYS-E200-8D Server itself.

    I have no idea what could be wrong in my scenario, because I run everything with the default BIOS settings, and I have also already tried to use a different SSD disk - but I have here still the same problems.

    I have also tried to attach the SATA SSD to different SATA ports (SATA 0, SATA 1), but this also didn't helped :-(

    It shouldn't be the case that the SYS-E200-8D Server causes this problem?

    Thanks!

    -Klaus



  • 5.  RE: Lost access to volume … due to connectivity issues

    Posted Nov 27, 2016 03:34 PM

    Hi Klaus,

    Looks like is a compatibility problem (drivers).

    Things you can test:

    - Use a HDD drive (different drivers depending if is a HDD or SSD).


    - Try with a previous ESXi version.

    Good luck.

    Regards,

    Dan



  • 6.  RE: Lost access to volume … due to connectivity issues

    Posted May 10, 2017 04:01 PM

    Hi D2B2,

    Thank you for your solution. It works fine for me!



  • 7.  RE: Lost access to volume … due to connectivity issues

    Posted Feb 03, 2017 07:56 PM

    This still isn't fixed in the new ESXi 6.5.0a patch released this week :smileysad:



  • 8.  RE: Lost access to volume … due to connectivity issues

    Posted Aug 28, 2017 06:34 PM

    Hi Bleeder,

    Please note its not a ESXi host bug which needs to be fixed, if you are receiving this error that means ESXi host is not receiving heartbeat in timely manner which is then reported as a warning you are seeing.



  • 9.  RE: Lost access to volume … due to connectivity issues

    Posted Mar 03, 2017 09:04 AM

    I also facing similar issue. Could anyone help?

    Lost access to volume 4c8ba981-473af21b-b02e-001a64b45292 (datastore1) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly



  • 10.  RE: Lost access to volume … due to connectivity issues

    Posted Aug 29, 2017 01:34 PM

    Did you check your fnic/hba firmware compatibility with 6.5 , it seems storage issue

    refer this KB

    Host Connectivity Degraded in ESX/ESXi (1009557) | VMware KB

    Understanding lost access to volume messages in ESXi 5.5/6.x (2136081) | VMware KB



  • 11.  RE: Lost access to volume … due to connectivity issues

    Posted Aug 29, 2017 02:43 PM

    You might want to start with checking the vmkernel.log, vobd.log and may be vmksummary.log as well  to get more insight about what's happening behind the curtain to find  cause of  the connectivity issue.

    If you found this or any other answer helpful, please consider the use of the Correct or Helpful to award points.

    Best Regards,

    Deepak Koshal

    CNE|CLA|CWMA|VCP4|VCP5|CCAH



  • 12.  RE: Lost access to volume … due to connectivity issues

    Posted Sep 06, 2020 07:15 PM

    This seems to be one of those timeless problems that keeps coming back across versions.

    In my homelab errors popped up out of nowhere: Lost access to volume 5f48d732-cdc8d5db-14d9-0cc47ac9b978 (local-ssd) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.

    I did some checks with smartctl and it looked like of my ssd's was turning bad (some pending sectors) so I replaced the SSD with a new EVO 860 but to my surprise the errors kept on coming. I tried disabling the VMFS reclaim option which didn't help.

    Was using this version of ESXi 6.7:

    esxcli software vib list | grep ahci with build-16316930

    sata-ahci        3.0-26vmw.670.0.0.8169922             VMW            VMwareCertified     2019-06-08

    vmw-ahci       1.2.8-1vmw.670.3.73.14320388          VMW            VMwareCertified     2019-12-22

    grepping the vmkernel log gave me hope reading through the 2020-08 6.7 patch release notes because I was seeing these errors:

    vmkernel.log

    [root@labhost:/vmfs/volumes/5a4894b1-dd26f0cc-1182-0cc47ac9b978/patches] cat /var/log/vmkernel.log | grep vmw_ahci

    2020-09-02T06:20:13.255Z cpu6:2097972)vmw_ahci[0000001f]: scsiTaskMgmtCommand:VMK Task: VIRT_RESET initiator=0x430458b8b940

    2020-09-02T06:20:13.255Z cpu6:2097972)vmw_ahci[0000001f]: ahciAbortIO:(curr) HWQD: 0 BusyL: 0

    2020-09-02T07:08:27.784Z cpu17:2107862)vmw_ahci[00000011]: HBAIntrHandler:new interrupts coming, PxIS = 0x8, no repeat

    2020-09-02T12:50:26.982Z cpu17:2158172)vmw_ahci[0000001f]: scsiTaskMgmtCommand:VMK Task: ABORT sn=0x7475a5 initiator=0x430458b8b940

    2020-09-02T12:50:26.982Z cpu17:2158172)vmw_ahci[0000001f]: ahciAbortIO:(curr) HWQD: 27 BusyL: 27

    2020-09-02T12:50:27.980Z cpu6:2097972)vmw_ahci[0000001f]: scsiTaskMgmtCommand:VMK Task: VIRT_RESET initiator=0x430858ee9340

    2020-09-02T12:50:27.980Z cpu6:2097972)vmw_ahci[0000001f]: ahciAbortIO:(curr) HWQD: 27 BusyL: 27

    2020-09-02T12:50:31.981Z cpu8:2158172)vmw_ahci[0000001f]: LogExceptionSignal:Port 0, Signal:  --|--|--|AB|--|--|--|--|--|--|--|-- (0x0008) Curr: --|--|--|--|--|--|--|--|--|--|PR|-- (0x0400)

    2020-09-02T12:50:33.115Z cpu0:2097689)vmw_ahci[0000001f]: ExecInternalCommandPolled:FAIL!!: Internal command b0, 00

    2020-09-02T12:50:33.115Z cpu0:2097689)vmw_ahci[0000001f]: LogExceptionProcess:Port 0, Process: --|--|--|AB|--|--|--|--|--|--|--|-- (0x0008) Curr: --|--|--|AB|--|--|--|--|--|--|--|-- (0x0008)

    2020-09-02T12:50:33.115Z cpu0:2097689)vmw_ahci[0000001f]: ExceptionHandlerWorld:AHCI_SIGNAL_ABORT_REQUEST signal.

    2020-09-02T12:50:33.115Z cpu0:2097689)vmw_ahci[0000001f]: ProcessAbortRequest:Aborting command tag 4 from the Busy list

    2020-09-02T12:50:33.115Z cpu0:2097689)vmw_ahci[0000001f]: ProcessAbortRequest:aborted command I:0x430458b8b940 SN:0x7475a5 tag:4

    2020-09-02T12:50:33.115Z cpu0:2097689)vmw_ahci[0000001f]: ExceptionHandlerWorld:Abort scan took 6 (us) to complete, 0 commands aborted.

    2020-09-02T12:50:33.116Z cpu9:2097972)vmw_ahci[0000001f]: LogExceptionSignal:Port 0, Signal:  --|--|--|AB|--|--|--|--|--|--|--|-- (0x0008) Curr: --|--|--|--|--|--|--|--|--|--|--|-- (0x0000)

    2020-09-02T12:50:33.116Z cpu0:2097689)vmw_ahci[0000001f]: LogExceptionProcess:Port 0, Process: --|--|--|AB|--|--|--|--|--|--|--|-- (0x0008) Curr: --|--|--|AB|--|--|--|--|--|--|--|-- (0x0008)

    2020-09-02T12:50:33.116Z cpu0:2097689)vmw_ahci[0000001f]: ExceptionHandlerWorld:AHCI_SIGNAL_ABORT_REQUEST signal.

    2020-09-02T12:50:33.116Z cpu0:2097689)vmw_ahci[0000001f]: ProcessAbortRequest:aborted command I:0x430858ee9340 SN:0xc8718ac8 tag:5

    2020-09-02T12:50:33.116Z cpu0:2097689)vmw_ahci[0000001f]: ProcessAbortRequest:aborted command I:0x430858ee9340 SN:0xc871bed0 tag:6

    2020-09-02T12:50:33.116Z cpu0:2097689)vmw_ahci[0000001f]: ProcessAbortRequest:aborted command I:0x430858ee9340 SN:0xc8717d90 tag:25

    2020-09-02T12:50:33.116Z cpu0:2097689)vmw_ahci[0000001f]: ExceptionHandlerWorld:Abort scan took 10 (us) to complete, 3 commands aborted.

    2020-09-02T12:50:33.117Z cpu0:2097689)vmw_ahci[0000001f]: _IssueComReset:Issuing comreset...

    2020-09-02T12:50:33.125Z cpu0:2097689)vmw_ahci[0000001f]: IssueCommand:tag: 20 already active during issue, reissue_flag:1

    2020-09-02T12:50:33.125Z cpu0:2097689)vmw_ahci[0000001f]: IssueCommand:tag: 21 already active during issue, reissue_flag:1

    2020-09-02T12:50:33.125Z cpu0:2097689)vmw_ahci[0000001f]: IssueCommand:tag: 22 already active during issue, reissue_flag:1

    2020-09-02T12:50:33.125Z cpu0:2097689)vmw_ahci[0000001f]: IssueCommand:tag: 23 already active during issue, reissue_flag:1

    2020-09-02T12:50:33.125Z cpu0:2097689)vmw_ahci[0000001f]: IssueCommand:tag: 24 already active during issue, reissue_flag:1

    2020-09-02T12:50:33.125Z cpu0:2097689)vmw_ahci[0000001f]: IssueCommand:tag: 26 already active during issue, reissue_flag:1

    2020-09-02T12:50:33.125Z cpu0:2097689)vmw_ahci[0000001f]: IssueCommand:tag: 27 already active during issue, reissue_flag:1

    2020-09-02T12:50:33.125Z cpu0:2097689)vmw_ahci[0000001f]: ProcessActiveCommands:Commands completed: 0, re-issued: 7

    2020-09-02T12:50:33.126Z cpu11:2097972)vmw_ahci[0000001f]: scsiTaskMgmtCommand:VMK Task: VIRT_RESET initiator=0x430858ebc800

    2020-09-02T12:50:33.126Z cpu11:2097972)vmw_ahci[0000001f]: ahciAbortIO:(curr) HWQD: 0 BusyL: 0

    2020-09-02T12:50:33.128Z cpu11:2097972)vmw_ahci[0000001f]: scsiTaskMgmtCommand:VMK Task: VIRT_RESET initiator=0x430858eba200

    2020-09-02T12:50:33.128Z cpu11:2097972)vmw_ahci[0000001f]: ahciAbortIO:(curr) HWQD: 0 BusyL: 0

    2020-09-02T12:50:33.130Z cpu11:2097972)vmw_ahci[0000001f]: scsiTaskMgmtCommand:VMK Task: VIRT_RESET initiator=0x430458b8b940

    2020-09-02T12:50:33.130Z cpu11:2097972)vmw_ahci[0000001f]: ahciAbortIO:(curr) HWQD: 0 BusyL: 0

    2020-09-02T12:50:33.132Z cpu9:2097972)vmw_ahci[0000001f]: scsiTaskMgmtCommand:VMK Task: VIRT_RESET initiator=0x430858ebc4c0

    2020-09-02T12:50:33.132Z cpu9:2097972)vmw_ahci[0000001f]: ahciAbortIO:(curr) HWQD: 0 BusyL: 0

    2020-09-02T12:50:33.134Z cpu9:2097972)vmw_ahci[0000001f]: scsiTaskMgmtCommand:VMK Task: VIRT_RESET initiator=0x430858edbdc0

    2020-09-02T12:50:33.134Z cpu9:2097972)vmw_ahci[0000001f]: ahciAbortIO:(curr) HWQD: 0 BusyL: 0

    2020-09-02T12:50:33.136Z cpu9:2097972)vmw_ahci[0000001f]: scsiTaskMgmtCommand:VMK Task: VIRT_RESET initiator=0x430458b8b940

    2020-09-02T12:50:33.136Z cpu9:2097972)vmw_ahci[0000001f]: ahciAbortIO:(curr) HWQD: 0 BusyL: 0

    2020-09-02T14:07:30.319Z cpu15:2108092)vmw_ahci[0000001f]: AHCI_EdgeIntrHandler:new interrupts coming, IS= 0x1, no repeat

    2020-09-02T14:07:51.959Z cpu13:2097204)vmw_ahci[0000001f]: AHCI_EdgeIntrHandler:new interrupts coming, IS= 0x1, no repeat

    2020-09-02T14:08:16.168Z cpu0:2107827)vmw_ahci[0000001f]: AHCI_EdgeIntrHandler:new interrupts coming, IS= 0x1, no repeat

    2020-09-02T14:16:53.668Z cpu16:2097207)vmw_ahci[0000001f]: AHCI_EdgeIntrHandler:new interrupts coming, IS= 0x1, no repeat

    2020-09-02T14:17:00.628Z cpu2:2097193)vmw_ahci[0000001f]: AHCI_EdgeIntrHandler:new interrupts coming, IS= 0x1, no repeat

    2020-09-02T14:17:07.235Z cpu15:2097206)vmw_ahci[0000001f]: AHCI_EdgeIntrHandler:new interrupts coming, IS= 0x1, no repeat

    2020-09-02T15:50:44.368Z cpu16:2097689)vmw_ahci[0000001f]: ExecInternalCommandPolled:FAIL!!: Internal command ec, 00

    2020-09-02T23:21:35.836Z cpu0:2172046)vmw_ahci[00000011]: AHCI_EdgeIntrHandler:new interrupts coming, IS= 0x2, no repeat

    2020-09-03T12:51:09.420Z cpu8:2097689)vmw_ahci[0000001f]: ExecInternalCommandPolled:FAIL!!: Internal command ec, 00

    2020-09-03T14:32:24.865Z cpu7:2158172)vmw_ahci[00000011]: scsiTaskMgmtCommand:VMK Task: ABORT sn=0x652b initiator=0x430458b81bc0

    2020-09-03T14:32:24.865Z cpu7:2158172)vmw_ahci[00000011]: ahciAbortIO:(curr) HWQD: 4 BusyL: 0

    Since then I updated the host which updated the vmw_ahci driver to this version:

    esxcli software vib list | grep ahci with build-16316930

    esxcli software vib list | grep ahci with build-16713306

    sata-ahci     3.0-26vmw.670.0.0.8169922                VMW            VMwareCertified     2019-06-08

    vmw-ahci     2.0.5-1vmw.670.3.116.16713306         VMW            VMwareCertified     2020-09-03

    The errrors have not come back since then ( *knocks on wood)

    Not a lot information on this error on the net about this potentially could point to a vmware driver error so sharing my experience here