vSphere Storage Appliance

 View Only
Expand all | Collapse all

lost connectivity to storage device affected datastores unknown

  • 1.  lost connectivity to storage device affected datastores unknown

    Posted Aug 04, 2010 02:52 PM

    Hello,

    I am receiving the error "lost connectivity to storage device affected datastores unknown" once a day, different times everyday. Sometimes it identifies which datastore is affected, so far it is only happening to two datastores. I have been running vSphere for about 10 months without issues and haven't changed/patched anything recently. I have 2 hosts, 2 Equallogic PS5000 sans connected through iSSCSI. Screen shots of my cirtual switch's are attached it that helps.

    Thank you for your assistance



  • 2.  RE: lost connectivity to storage device affected datastores unknown

    Posted Aug 04, 2010 05:41 PM

    I am receiving the error "lost connectivity to storage device affected datastores unknown" once a day, different times everyday. Sometimes it identifies which datastore is affected, so far it is only happening to two datastores.

    Exact log information would be quite useful. Do you see any lasting effects from this, or is it just a message you see with no other problems seen?

    I have 2 hosts, 2 Equallogic PS5000 sans connected through iSSCSI.

    A first guess would be normal load-balancing operations done by the PS storage. The storage will disconnect from the host and then expect an immediate reconnection. It does this so connection loads can be spread effectively. But, that's just a guess ...

    Andy



  • 3.  RE: lost connectivity to storage device affected datastores unknown

    Posted Aug 09, 2010 01:42 PM

    There are no lasting effects at all, the vm's on the datastore seem to stay up and there have been no user complaints. Which logs should I post, I just exported the logs and there are many to choose from. When I do the load balancing operation n the SAN should I expect my VM's to go down?



  • 4.  RE: lost connectivity to storage device affected datastores unknown

    Posted Aug 09, 2010 05:14 PM

    When I do the load balancing operation n the SAN should I expect my VM's to go down?

    Definitely not. These are just momentary changes to make sure connections to the storage are evenly maintained. You shouldn't even notice them, except possibly the report the connection went away and is getting restarted that you're seeing.

    Which version and flavor of vSphere are you using? For vSphere 4 ESX, the contents of /var/log/vmkiscsid.log would be interesting. In ESXi, the iscsid info is logged in /var/log/messages. Checking the PS event logs for any corresponding entry could help clear up whether this is expected or not.

    Andy



  • 5.  RE: lost connectivity to storage device affected datastores unknown

    Posted Aug 10, 2010 01:16 PM

    Andy,

    The log you requested is attached, I am researching right now how to do a load balancing operation on my Equalogic SAN.

    Andre,

    What in my iSCSI configuration is misconfigured? I had Dell do the initial install and haven't made any iSCSI changes, I did upgrade from 3.5 to vSphere myself though, maybe the iSCSI configuration needs to be different? i have been running vSphere for almost a year and these errors have only been around for a few weeks.

    Thanks for all of your help,

    Dave



  • 6.  RE: lost connectivity to storage device affected datastores unknown

    Posted Aug 10, 2010 04:11 PM

    The log you requested is attached, I am researching right now how to do a load balancing operation on my Equalogic SAN.

    Dave,

    These log entries show the handling of the load balancing events from the PS. Do these correspond to your original "lost connectivity" messages?

    2010-07-17-13:59:02: iscsid: Target requests logout within 3 seconds for connection

    2010-07-17-13:59:06: iscsid: connection6:0 is operational after recovery (2 attempts)

    2010-07-18-14:23:07: iscsid: Target requests logout within 3 seconds for connection

    2010-07-18-14:23:11: iscsid: connection6:0 is operational after recovery (2 attempts)

    2010-07-19-20:56:14: iscsid: Target requests logout within 3 seconds for connection

    2010-07-19-20:56:18: iscsid: connection6:0 is operational after recovery (2 attempts)

    2010-07-20-03:08:16: iscsid: Target requests logout within 3 seconds for connection

    2010-07-20-03:08:20: iscsid: connection6:0 is operational after recovery (2 attempts)

    These are normal but relatively rare operations.

    Andy



  • 7.  RE: lost connectivity to storage device affected datastores unknown

    Posted Aug 10, 2010 03:09 PM

    I haven't been able to figure out how to do a load-balancing on my Equallogic PS5000, do you know if it is through the GUI or the CLI? I also noticed that my path selectionis set to fixed for my datastores, should I change it to "round robin" or "Most recently used"?

    Thanks,



  • 8.  RE: lost connectivity to storage device affected datastores unknown

    Posted Aug 10, 2010 03:59 PM

    I haven't been able to figure out how to do a load-balancing on my Equallogic PS5000, do you know if it is through the GUI or the CLI?

    If you login to the PS as "root" and issue "iscsi_test alogout" it will generate an asynchronous logout on ALL sessions. This is the same event that happens in load balancing operations. This isn't intended for casual use.

    I also noticed that my path selectionis set to fixed for my datastores, should I change it to "round robin" or "Most recently used"?

    As Active/Active storage, fixed will be the default policy. With EqualLogic, round-robin works also. Or, running ESX 4.1, install their MEM and go to town.

    Andy



  • 9.  RE: lost connectivity to storage device affected datastores unknown

    Posted Aug 10, 2010 05:28 PM

    When you say "This isn't intended for casual use." What do you mean, should I expect to see an outage when I run that command?

    The lost connectivity message does correspond with a "target request logout within 3 seconds for connection" message. If they are normal but rare, do you have any idea why it would be happening so often in my environment?

    Thanks,

    Dave



  • 10.  RE: lost connectivity to storage device affected datastores unknown

    Posted Aug 10, 2010 06:13 PM

    When you say "This isn't intended for casual use." What do you mean, should I expect to see an outage when I run that command?

    Dave,

    All sessions to the PS momentarily drop. If the only hosts connected to the storage are ESX and ESXi, they should all recover in 4 to 5 seconds.

    VMs on ESX hosts should have no trouble, other than IO at that time taking a few seconds. Other hosts should be able to handle these events, as well. What I mean is that it's intended for test purposes, only, and you probably shouldn't use it for sport in a production environment.

    The lost connectivity message does correspond with a "target request logout within 3 seconds for connection" message. If they are

    normal but rare, do you have any idea why it would be happening so often in my environment?

    It happened 17 times in two months on your host, based on the log. I'm guessing the PS will do this when it sees a disparity of load between several ports on the storage (you can ask EqualLogic for more definite information). In that case, the frequency would be environment-dependent. By rare, it shouldn't happen every ten minutes. Various loads probably would make it happen more often than you're seeing it happen. As I mention, it's normal, and it's intended to make better use of the network and ports that are available. In spite of creating some log clutter, it's supposed to be a good thing.

    Andy



  • 11.  RE: lost connectivity to storage device affected datastores unknown

    Posted Aug 31, 2010 01:03 PM

    Andy,

    I have been busy on other projects for the past few week. I ran the load balancing command and i am still gettign the "lost connectivity" messages in my event logs. I don't know what else to try, do you have any more suggestions?

    There are still no complaints from users and I don't see a patern of whern these events happen. The person in charge of backups mentioned that he was getting some iSCSI errors when backups on my VM's were running but the events I am seeing do not occur when backup is happening.

    Thanks,

    Dave



  • 12.  RE: lost connectivity to storage device affected datastores unknown

    Posted Oct 12, 2010 06:05 PM

    If anybody has any information that could help me resolve this issues it would be greatly appreciated. The errors are still appering but the VM's aren't having any problems.

    Thanks,

    Dave



  • 13.  RE: lost connectivity to storage device affected datastores unknown

    Posted Oct 12, 2010 08:18 PM

    Dave,

    I've been away for a little while, as well.

    The message is a harmless artifact of the EqualLogic system performing load balancing, in this case. It's supposed to do this. You can safely disregard this message if it comes up once in a while.

    Enjoy,

    Andy



  • 14.  RE: lost connectivity to storage device affected datastores unknown

    Posted Oct 12, 2010 08:26 PM

    I ran the load balancing command and i am still getting the "lost connectivity" messages in my event logs.

    Triggering a load balancing event should demonstrate to you that this is the cause of the log message. Logging in 4.1 is a little tidier and you shouldn't see this message any more.

    Andy



  • 15.  RE: lost connectivity to storage device affected datastores unknown

    Posted Aug 10, 2010 09:05 AM

    Your iSCSI configuration does not follow the suggested guide:

    http://www.equallogic.com/resourcecenter/assetview.aspx?id=8453

    Reconfigure your hosts, apply all latest 4.0 patches on vSphere and latest 4.x firmware on Equallogic.

    Andre



  • 16.  RE: lost connectivity to storage device affected datastores unknown

    Posted Apr 07, 2025 12:46 PM

    Hi Dave,

    Were you able to get this issue resolved? I have the same exact issue.




  • 17.  RE: lost connectivity to storage device affected datastores unknown

    Broadcom Employee
    Posted Apr 08, 2025 05:37 PM

    Hi Jennifer,

    We should be able to tell from taking a look at the logs.
    Can you upload the hostd, vmkernel and vobd logs?




  • 18.  RE: lost connectivity to storage device affected datastores unknown

    Posted Apr 09, 2025 03:20 PM

    Hi Kumar,

    There are no logs for the 3 that you mentioned when I looked in the /var/log directory.

    We a primary and a failover, the issue is on the failover site:

    2 datacenters

    2 Pure Storage

    2 Cisco B200M5s

    ESXi hosts are all 8.0 3d

    Storage adapters is a shared connection from the chassis, fi are 2 physical connections are shared between blades. Not standalone servers, it's shared. If it's a physical connect it would affect all blades. 

    The alerts that we are getting from vCenter is for paths that do not exist, that is why it shows datastore is 'unknown':

    Alarm alarm.StorageConnectivityAlarm on Host host1_name
    because Path redundancy to storage device naa.624a93709cac436e0b074d33000f4019 degraded. Path vmhba5:C0:T692:L237 is down. Affected datastores: Unknown..

    Not sure why the T: (Target) is so high in the numbering... 

    Seems like there's a cached path somewhere that is triggering the alarms. 

    When we do get the alerts, there is nothing down, all Hosts, VMs, vNic, storage adapters, datastores, etc. are green and good. Why is it that we don't get the alerts for the primary site but just for the failover site? They have the same setup and configuration.

    Things tried:

    Rescanned datastores

    Rescanned storage adapters

    Disable then re-enabled different alarms: vSAN online health alarm 'Disks usage on storage controller (vCenter level), Cannot connect to storage (vCenter level), Datastore usage on disk (vCenter level)

    SFP health check was done by Pure Support

    Any idea on this is greatly appreciated.




  • 19.  RE: lost connectivity to storage device affected datastores unknown

    Broadcom Employee
    Posted Apr 09, 2025 04:16 PM
    Edited by Bharath Kumar G Apr 09, 2025 04:17 PM

    That's strange. Could you SSH to the ESXi host that the alarm is referring to, navigate to /var/log, run ls and provide me a screenshot of what you see?

    If you do not find the 3 log files, please do the following:

    This would give you all the paths ESXi was able to pick up over the PSA from Storage array:
    esxcfg-mpath -bd naa.624a93709cac436e0b074d33000f4019
    See if any of the paths listed is showing anything but active. E.g., dead
    As your LUNs have multiple paths configured for redundancy, you will not likely see any effect of this message, with your workloads.
    https://knowledge.broadcom.com/external/article/318935/path-redundancy-to-storage-device-degrad.html

    This would give tell you whether or not there is a VMFS partition on the LUN:
    partedUtli getptbl /vmfs/devices/disks/naa.624a93709cac436e0b074d33000f4019

    If you see a partition labelled vmfs, you can use the following command to find out which datastore the alert is about:
    esxcli storage vmfs extent list | grep naa.624a93709cac436e0b074d33000f4019

    The cause of this message is usually due to an issue at Layer 1.
    Capture the timestamp and and hostname per the vCenter and engage your Fabric and Storage Vendors to see if they observe any anomalies at the time of the message.
    Once you have the report, you can file a ticket with Broadcom VMware-Storage Support team, if you'd like or we may continue to investigate over this thread.




  • 20.  RE: lost connectivity to storage device affected datastores unknown

    Posted Apr 09, 2025 05:01 PM
      |   view attached

    Thank you for the quick response. I ran esxcfg-mpath -bd naa.624a93709cac436e0b074d33000f4019 on host-04 and all paths shows as active. Also ran it against 11 other storage devices (1 local) and they all are showing as active.

    This is happening on all 13 of our hosts, on different vmhbas, same storage device and different LUNS.

    For example: last night, we received 2 alarms for host 4, on vmhba 4 and 5, same storage device, same LUN. But on Sunday, we received alarms for multiple hosts, different vmhba, storage device, different LUN.

    I actually have a host log from one of the last alarms. please see attachment.

    We have been working with Pure on the issue. They did not see anything issues on their end. 


    Attachment(s)



  • 21.  RE: lost connectivity to storage device affected datastores unknown

    Broadcom Employee
    Posted Apr 23, 2025 03:13 PM

    Hi Jennifer,

    Sorry I wasn't around a few days.
    I don't see the attachment for some reason.
    Path related issues are usually straight forward. ESXi has storage drivers that record response from the storage subsystem and mark the relevant path by failing the command and response H:0x1. This can be viewed int he vmkernel log.
    hostd log will record ''Path redundancy to storage device naa.################################degraded. Path vmhba3:C0:T1:L7 is down. Affected datastores: Datastore1.".
    Are you able to file a Broadcom VMware ticket by any chance?
    If not, please see if you can upload the hostd and vmkernel logs of the ESXi host on which the alert appeared.

    Regards,
    Bharath G




  • 22.  RE: lost connectivity to storage device affected datastores unknown

    Posted Apr 25, 2025 09:51 AM

    Hi Bharath,

    No worries. Pure suggested that we upgrade our FA to the latest version. We did that last weekend. Going to be monitoring it for a few weeks. Will keep yall updated.




  • 23.  RE: lost connectivity to storage device affected datastores unknown

    Broadcom Employee
    Posted Apr 25, 2025 01:26 PM

    Sure. Let me know how it goes.

    Regards,
    Bharath G