ESXi

 View Only
Expand all | Collapse all

ESXi Lost access to volume

mnaitvpro

mnaitvproAug 19, 2013 12:28 PM

  • 1.  ESXi Lost access to volume

    Posted Aug 19, 2013 07:22 AM


    Hello Gurus,

    Offlate i noticed an event log entry in vSphere client twice in short span of less than a day related to local storage as follows:

    Lost access to volume

    4f4c8bc0-4d13eab8-c8fc-5cf3fc09c3fa (vms-1)

    due to connectivity issues. Recovery attempt is in

    progress and outcome will be reported shortly.

    info

    8/18/2013 2:01:22 AM

    vms-1

    and immediately

    Successfully restored access to volume 4f4c8bc0-

    4d13eab8-c8fc-5cf3fc09c3fa (vms-1) following

    connectivity issues.

    info

    8/18/2013 2:01:22 AM

    nesxi1.x.x

    The event details itself recommends "Ask VMware" link leads to VMware KB: Host Connectivity Degraded

    and

    this VMware KB: Host Connectivity Restored

    As per the KB VMware is referring to SAN LUN, but in our case its the local storage, kindly shed some info as to why the local storage would lost its connectivity.

    Note: all the local disk are on RAID-10.

    thanks



  • 2.  RE: ESXi Lost access to volume

    Posted Aug 19, 2013 08:23 AM

    Hi,

    Which ESXi version are you running on?

    I had a similar issue with ESXi 5.1 no update and after patching it to the latest, ESXi 5.1 Update 1, the issue has been resolved.

    Hope this helps,

    Steven.



  • 3.  RE: ESXi Lost access to volume

    Posted Aug 19, 2013 12:28 PM

    ESXi Ver 5.0, Build 469512



  • 4.  RE: ESXi Lost access to volume

    Posted Sep 09, 2013 06:52 AM

    Hello,

    I am experiencing the same issue since one week and this is corresponding to the upgrade of our ESXs to 5.1.0 1157734, but I'm not sure this is is related to.

    Side effect are:

    - Very high disk latency peaks (up to 10s!)

    - Instability

    - Lost of storage paths on some ESX.

    - Inconsistencies of some virtual hard disk

    Restarting the ESX solves the problem, but it comes back as soon as we have more disk access (i.e. during backup)

    How did you solve the problem?

    Thanks a lot for your feedback and best regard



  • 5.  RE: ESXi Lost access to volume

    Posted Mar 21, 2014 01:46 PM

    I'm running into same issues with SAN datastore (VNX5500 array).  I'm running ESXi 5.0 (1311175)

    did you guys ever resolved the issues?

    Thanks



  • 6.  RE: ESXi Lost access to volume

    Posted Apr 03, 2014 02:11 PM

    Exact same issue here.
    It’s killing me.  5.1.0 (1612806).  All SAN (EMC CX4), Qlogic Fiber HBA’s and new
    Dell R720’s.

    It’s getting ugly.

    Has anyone resolved this issue?



  • 7.  RE: ESXi Lost access to volume

    Posted Apr 30, 2014 07:10 PM

    hi!

    Some problem here,the scenario is also similar.

    Do you fix it this?! have any idea?!



  • 8.  RE: ESXi Lost access to volume

    Posted Apr 30, 2014 07:26 PM

    My issue is due to a bug between HP blade chassis virtual connect and Nexus 5000, but during my month long troubleshooting, I suggest anyone suffers this problem to look at everything.

    1. check the HBA firmware/driver, some version of Emulex LOM have bugs that exhibit this behavior

    2. if you use brocade FC swtiches with HP blades, check out the FillWord Value in your swtich config

    3. If you use HP Virtual Connect Flexfabric with Nexus 5000 as your FC access swtich, there is a bug with 8GB FC, upgrade your virtual connect firmware or upgrade your nexus OS.

    4. Upgrade your VNX flare code to december 2013 level, there is a dramatic improvement over ATS locking offload in that version of FLARE.

    5. check to see if you array frontend ports are getting QFULL messages, if so, think about throttling the queue depth on the HBA, there is an ESXi setting for this.

    6. check for bad fibre cable and SFP on and between the HBAs, FC Switches and Array.

    Good luck.



  • 9.  RE: ESXi Lost access to volume

    Posted Apr 06, 2014 06:09 PM

    Hi -

    Same issue here with 5.0 and VNX 7500.

    Has anyone resolved this issue?





  • 10.  RE: ESXi Lost access to volume

    Posted Apr 14, 2014 06:01 PM

    So ... Any news on this. Had the same issue for a while going to 5.1u2 this week. Did anyone else have luck resolving this?



  • 11.  RE: ESXi Lost access to volume

    Posted Apr 14, 2014 06:19 PM

    I don't want to rule out anything. However, I had to troubleshoot an issue like this a few months ago, and it turned out that a bad fiber cable was causing the issue. You may want to check the FC switch ports to see whether the port(s) show e.g. CRC errors.

    André



  • 12.  RE: ESXi Lost access to volume

    Posted Oct 05, 2014 05:07 AM

    Has anyone had any luck fixing this problem?  I have a WD iSCSI drive with the same problem.  I have to constantly reboot the ESX host and it is causing all of my servers to go down.



  • 13.  RE: ESXi Lost access to volume

    Posted Oct 05, 2014 02:50 PM

    I would check MTU size for network Attached Storage.

    For the others, i would suggest you check the battery on the raid Controllers, and thereafter did a check of all cables.

    I have also seen this issue sometimes when servers have been installed With standard image instead for the hw vendor customized image.



  • 14.  RE: ESXi Lost access to volume

    Posted Aug 23, 2015 07:44 PM

    What version of esxi are you on? if you are on esxi 5.1 update it to Esxi 5.5 will solve this issue.



  • 15.  RE: ESXi Lost access to volume

    Posted Sep 14, 2015 05:35 PM

    I'm using ESXi 5.5 Build 1746974...I could see the error.


    Not yet resolved..Any idea?



  • 16.  RE: ESXi Lost access to volume

    Posted Mar 12, 2016 08:04 AM

    hello in my case i was loosing connectivity to datastore and after some seconds restored..(vmware 6) ibm x 3550 m2, 4 ssd raid 5. i was looking and reading around more than a month.. after a powe loss and a ups ran out of batteries.. the system could't boot properly (more than 1 hour) .. so i start really examine the system and i found my little button battery was bad.. actually the battery itseld was ok 3 volts but the system show an error  ( a led on motherbaord) after i change it according to ibm instructions..( power off  etc.. ) the system volume works fine since.. its been 1 week without any errors.



  • 17.  RE: ESXi Lost access to volume

    Posted Sep 30, 2015 01:04 PM

    I'm having this same issue on a few Cisco R210 and C240 UCS servers, all have local datastores using Megaraid controllers and running different versions of ESXi

    Cisco C240 - esxi 5.0 - no issues

    Cisco R210 - esxi 5.0 - disk access issue

    Cisco C240 - esxi 5.1 - disk access issue

    Cisco R210 - esxi 5.0 - disk access issue

    Cisco R210 - esxi 5.0 - disk access issue

    Cisco C220 - esxi 5.5 - no issues

    First

    Device

    naa.600605b005df73201951a1d33bc62893

    performance has deteriorated. I/O latency

    increased from average value of 708

    microseconds to 24612 microseconds.

    warning

    9/30/2015 4:46:09 AM

    10.2.42.23

    Lost access to volume

    54383f2f-62e7730b-ec74-4c4e3544bf5e

    (snap-0a1ec5ee-datastore1) due to connectivity

    issues. Recovery attempt is in progress and

    outcome will be reported shortly.

    info

    9/30/2015 7:39:33 AM

    snap-0a1ec5ee-datastore1

    Successfully restored access to volume 54383f2f-

    62e7730b-ec74-4c4e3544bf5e

    (snap-0a1ec5ee-datastore1) following

    connectivity issues.

    info

    9/30/2015 7:39:46 AM

    10.2.42.23



  • 18.  RE: ESXi Lost access to volume

    Posted Oct 27, 2015 09:05 PM

    Are you using FI for your Rack servers or tradational method?



  • 19.  RE: ESXi Lost access to volume

    Posted Mar 15, 2016 01:24 AM

    HI.

    we encountered the same problem. Still troubleshooting. We've been having conf calls with the Dell Master Engineer, 2 guys from Vmware, 2 from EMC, one from Brocade.

    We have:

    multiple Dell m1000e chassis

         - Dell m630 with qlogic qme2662 mezz cards

         - Brocade 6505 Chassis Switches

    Multiple Dell r730xd

         - qlogic qle2662

    Brocade 5100 Core FC Switch

    The LUNs are on a EMC VNX 7600.

    Vmware ESX 5.5 U2 and 6.0U1

    We used Dell custom ISO, Vmware vanilla ISO.

    On the Dell customs the qlnative driver is really new (v2.x).

    We tried a lot of changes. Been fighting with this issue for 3 weeks now.

    What we managed to find that seems to be working is the following.

    Since we have a lot of older servers that works well, we added 4 new paths on the VNX for the new chassis and servers.

    We downgraded the qlnative to the following:

    qlnativefc-1.1.20.0

    And we changed all at the same time. When we add one host with newer drivers, it seems to start again with the lost datastores...

    I will keep writing in this post when I have something new, good or bad.



  • 20.  RE: ESXi Lost access to volume

    Posted Apr 28, 2016 05:17 AM

    Hi, Any updates on this issue ?



  • 21.  RE: ESXi Lost access to volume

    Posted May 02, 2016 09:46 PM

    HI,

    so we installed the following driver and it seems stable since a couple of weeks.

    VMW-ESX-5.5.0-qlnativefc-1.1.20.0-1604804

    We use it on 5.5 and 6.0u1 hosts.



  • 22.  RE: ESXi Lost access to volume

    Posted May 22, 2016 09:59 AM

    Dear Malabelle,

    Are the issue solved for you; because i have the same hardware and OS version.

    with best regards,



  • 23.  RE: ESXi Lost access to volume

    Posted Jul 04, 2016 07:45 PM

    yes after driver installation it is good.



  • 24.  RE: ESXi Lost access to volume

    Posted Mar 29, 2016 11:39 PM

    I just started seeing this today as well with ESXi 6.0 on an HP Proliant DL380 G6. The volume is on internal hard drives.

    It is never able to recover. The recovery process takes up lots of CPU, rendering the VMs unresponsive. The only way to recover is to power cycle the server.

    Is this indicative of a failing hard drive or controller?

    Thanks.



  • 25.  RE: ESXi Lost access to volume

    Posted May 09, 2016 03:07 PM

    Hello,

    I suffered this same problem in HP servers G7, and after some investigation, the problem is related to  hpsa driver.

    The only hpsa driver which does NOT cause this behavior is this version: scsi-hpsa   5.5.0.60-1OEM.550.0.0.1331820

    If you have upgraded your ESX from version 5.1 to version 5.5 or version 6.0, you will run into the same problem in case of you have G7 servers.

    What I did is to implement an downgrade in the ESX servers G7 running with ESXi version 5.5 and 6.0, and they are working properly without disconnections.

    If you have local storage, this can be a big problem, because I could see for example in virtual filers, that this problems caused a 2 or 3 ping lost everytime.

    Version without problems: scsi-hpsa   5.5.0.60-1OEM.550.0.0.1331820

    Versions with problems: 5.5.0.74,  5.5.0.84 and  6.0.0.114.

    If you upgrade the ESX later on, you will have to do the same downgrade operation.



  • 26.  RE: ESXi Lost access to volume

    Posted Jul 21, 2016 05:40 PM

    I have the same issue with just one UCS blade.  Have a total of 6 on same Chassis.  Running 5.5 on all of them.

    • Updated FNIC and ENIC drivers on the host itself.
    • Moved blade to a different slot on the chassis to rule that out.
    • Checked the logs on the Host itself and see

    No correlator for vob.vmfs.heartbeat.timedout
    [vmfsCorrelator] 264959970134us: [esx.problem.vmfs.heartbeat.timedout] 54daccec-748bb3a6-2ac0-0025b500000f Datastore
    No correlator for vob.vmfs.heartbeat.timedout
    [vmfsCorrelator] 264959970322us: [esx.problem.vmfs.heartbeat.timedout] 54dacc8a-028436f2-3cec-0025b500000f Datastore
    No correlator for vob.vmfs.heartbeat.timedout
    [vmfsCorrelator] 264960581844us: [esx.problem.vmfs.heartbeat.timedout] 54dacccf-7fd2fbca-b8ac-0025b500000f Datastore
    No correlator for vob.vmfs.heartbeat.timedout
    [vmfsCorrelator] 264962970092us: [esx.problem.vmfs.heartbeat.timedout] 54dacc75-b097216f-18a6-0025b500000f Datastore
    No correlator for vob.vmfs.heartbeat.recovered
    [vmfsCorrelator] 264964877110us: [esx.problem.vmfs.heartbeat.recovered] 54daccec-748bb3a6-2ac0-0025b500000 Datastore

    • Working with Cisco support and they confirmed not errors on the NIC logs of the affected host.

    I believe its an issue with blade itself.  I have uploaded tons of logs to Cisco support and waiting to hear back from them.