lost connectivity to storage device affected datastores unknown

5. RE: lost connectivity to storage device affected datastores unknown

Recommend

DAMahoney

Posted Aug 10, 2010 01:16 PM

Andy,

The log you requested is attached, I am researching right now how to do a load balancing operation on my Equalogic SAN.

Andre,

What in my iSCSI configuration is misconfigured? I had Dell do the initial install and haven't made any iSCSI changes, I did upgrade from 3.5 to vSphere myself though, maybe the iSCSI configuration needs to be different? i have been running vSphere for almost a year and these errors have only been around for a few weeks.

Thanks for all of your help,

Dave

6. RE: lost connectivity to storage device affected datastores unknown

Recommend

Andy_Banta

Posted Aug 10, 2010 04:11 PM

The log you requested is attached, I am researching right now how to do a load balancing operation on my Equalogic SAN.

Dave,

These log entries show the handling of the load balancing events from the PS. Do these correspond to your original "lost connectivity" messages?

2010-07-17-13:59:02: iscsid: Target requests logout within 3 seconds for connection

2010-07-17-13:59:06: iscsid: connection6:0 is operational after recovery (2 attempts)

2010-07-18-14:23:07: iscsid: Target requests logout within 3 seconds for connection

2010-07-18-14:23:11: iscsid: connection6:0 is operational after recovery (2 attempts)

2010-07-19-20:56:14: iscsid: Target requests logout within 3 seconds for connection

2010-07-19-20:56:18: iscsid: connection6:0 is operational after recovery (2 attempts)

2010-07-20-03:08:16: iscsid: Target requests logout within 3 seconds for connection

2010-07-20-03:08:20: iscsid: connection6:0 is operational after recovery (2 attempts)

These are normal but relatively rare operations.

Andy

7. RE: lost connectivity to storage device affected datastores unknown

Recommend

DAMahoney

Posted Aug 10, 2010 03:09 PM

I haven't been able to figure out how to do a load-balancing on my Equallogic PS5000, do you know if it is through the GUI or the CLI? I also noticed that my path selectionis set to fixed for my datastores, should I change it to "round robin" or "Most recently used"?

Thanks,

8. RE: lost connectivity to storage device affected datastores unknown

Recommend

Andy_Banta

Posted Aug 10, 2010 03:59 PM

I haven't been able to figure out how to do a load-balancing on my Equallogic PS5000, do you know if it is through the GUI or the CLI?

If you login to the PS as "root" and issue "iscsi_test alogout" it will generate an asynchronous logout on ALL sessions. This is the same event that happens in load balancing operations. This isn't intended for casual use.

I also noticed that my path selectionis set to fixed for my datastores, should I change it to "round robin" or "Most recently used"?

As Active/Active storage, fixed will be the default policy. With EqualLogic, round-robin works also. Or, running ESX 4.1, install their MEM and go to town.

Andy

9. RE: lost connectivity to storage device affected datastores unknown

Recommend

DAMahoney

Posted Aug 10, 2010 05:28 PM

When you say "This isn't intended for casual use." What do you mean, should I expect to see an outage when I run that command?

The lost connectivity message does correspond with a "target request logout within 3 seconds for connection" message. If they are normal but rare, do you have any idea why it would be happening so often in my environment?

Thanks,

Dave

10. RE: lost connectivity to storage device affected datastores unknown

Recommend

Andy_Banta

Posted Aug 10, 2010 06:13 PM

When you say "This isn't intended for casual use." What do you mean, should I expect to see an outage when I run that command?

Dave,

All sessions to the PS momentarily drop. If the only hosts connected to the storage are ESX and ESXi, they should all recover in 4 to 5 seconds.

VMs on ESX hosts should have no trouble, other than IO at that time taking a few seconds. Other hosts should be able to handle these events, as well. What I mean is that it's intended for test purposes, only, and you probably shouldn't use it for sport in a production environment.

The lost connectivity message does correspond with a "target request logout within 3 seconds for connection" message. If they are
normal but rare, do you have any idea why it would be happening so often in my environment?

It happened 17 times in two months on your host, based on the log. I'm guessing the PS will do this when it sees a disparity of load between several ports on the storage (you can ask EqualLogic for more definite information). In that case, the frequency would be environment-dependent. By rare, it shouldn't happen every ten minutes. Various loads probably would make it happen more often than you're seeing it happen. As I mention, it's normal, and it's intended to make better use of the network and ports that are available. In spite of creating some log clutter, it's supposed to be a good thing.

Andy

11. RE: lost connectivity to storage device affected datastores unknown

Recommend

DAMahoney

Posted Aug 31, 2010 01:03 PM

Andy,

I have been busy on other projects for the past few week. I ran the load balancing command and i am still gettign the "lost connectivity" messages in my event logs. I don't know what else to try, do you have any more suggestions?

There are still no complaints from users and I don't see a patern of whern these events happen. The person in charge of backups mentioned that he was getting some iSCSI errors when backups on my VM's were running but the events I am seeing do not occur when backup is happening.

Thanks,

Dave

12. RE: lost connectivity to storage device affected datastores unknown

Recommend

DAMahoney

Posted Oct 12, 2010 06:05 PM

If anybody has any information that could help me resolve this issues it would be greatly appreciated. The errors are still appering but the VM's aren't having any problems.

Thanks,

Dave

13. RE: lost connectivity to storage device affected datastores unknown

Recommend

Andy_Banta

Posted Oct 12, 2010 08:18 PM

Dave,

I've been away for a little while, as well.

The message is a harmless artifact of the EqualLogic system performing load balancing, in this case. It's supposed to do this. You can safely disregard this message if it comes up once in a while.

Enjoy,

Andy

14. RE: lost connectivity to storage device affected datastores unknown

Recommend

Andy_Banta

Posted Oct 12, 2010 08:26 PM

I ran the load balancing command and i am still getting the "lost connectivity" messages in my event logs.

Triggering a load balancing event should demonstrate to you that this is the cause of the log message. Logging in 4.1 is a little tidier and you shouldn't see this message any more.

Andy

19. RE: lost connectivity to storage device affected datastores unknown

Recommend

Broadcom Employee

Bharath Kumar G

Posted Apr 09, 2025 04:16 PM
Edited by Bharath Kumar G Apr 09, 2025 04:17 PM

That's strange. Could you SSH to the ESXi host that the alarm is referring to, navigate to /var/log, run ls and provide me a screenshot of what you see?

If you do not find the 3 log files, please do the following:

This would give you all the paths ESXi was able to pick up over the PSA from Storage array:
esxcfg-mpath -bd naa.624a93709cac436e0b074d33000f4019
See if any of the paths listed is showing anything but active. E.g., dead
As your LUNs have multiple paths configured for redundancy, you will not likely see any effect of this message, with your workloads.
https://knowledge.broadcom.com/external/article/318935/path-redundancy-to-storage-device-degrad.html

This would give tell you whether or not there is a VMFS partition on the LUN:
partedUtli getptbl /vmfs/devices/disks/naa.624a93709cac436e0b074d33000f4019

If you see a partition labelled vmfs, you can use the following command to find out which datastore the alert is about:
esxcli storage vmfs extent list | grep naa.624a93709cac436e0b074d33000f4019

The cause of this message is usually due to an issue at Layer 1.
Capture the timestamp and and hostname per the vCenter and engage your Fabric and Storage Vendors to see if they observe any anomalies at the time of the message.
Once you have the report, you can file a ticket with Broadcom VMware-Storage Support team, if you'd like or we may continue to investigate over this thread.

Original Message

Original Message:
Sent: Apr 09, 2025 03:19 PM
From: Jennifer Nguyen
Subject: lost connectivity to storage device affected datastores unknown

Hi Kumar,

There are no logs for the 3 that you mentioned when I looked in the /var/log directory.

We a primary and a failover, the issue is on the failover site:

2 datacenters

2 Pure Storage

2 Cisco B200M5s

ESXi hosts are all 8.0 3d

Storage adapters is a shared connection from the chassis, fi are 2 physical connections are shared between blades. Not standalone servers, it's shared. If it's a physical connect it would affect all blades.

The alerts that we are getting from vCenter is for paths that do not exist, that is why it shows datastore is 'unknown':

Alarm alarm.StorageConnectivityAlarm on Host host1_name
because Path redundancy to storage device naa.624a93709cac436e0b074d33000f4019 degraded. Path vmhba5:C0:T692:L237 is down. Affected datastores: Unknown..

Not sure why the T: (Target) is so high in the numbering...

Seems like there's a cached path somewhere that is triggering the alarms.

When we do get the alerts, there is nothing down, all Hosts, VMs, vNic, storage adapters, datastores, etc. are green and good. Why is it that we don't get the alerts for the primary site but just for the failover site? They have the same setup and configuration.

Things tried:

Rescanned datastores

Rescanned storage adapters

Disable then re-enabled different alarms: vSAN online health alarm 'Disks usage on storage controller (vCenter level), Cannot connect to storage (vCenter level), Datastore usage on disk (vCenter level)

SFP health check was done by Pure Support

Any idea on this is greatly appreciated.

Original Message:
Sent: Apr 07, 2025 06:53 PM
From: Bharath Kumar G
Subject: lost connectivity to storage device affected datastores unknown

Hi Jennifer,

We should be able to tell from taking a look at the logs.
Can you upload the hostd, vmkernel and vobd logs?

Original Message:
Sent: Apr 07, 2025 11:33 AM
From: Jennifer Nguyen
Subject: lost connectivity to storage device affected datastores unknown

Hi Dave,

Were you able to get this issue resolved? I have the same exact issue.

Original Message:
Sent: Aug 04, 2010 02:51 PM
From: DAMahoney
Subject: lost connectivity to storage device affected datastores unknown

Hello,

I am receiving the error "lost connectivity to storage device affected datastores unknown" once a day, different times everyday. Sometimes it identifies which datastore is affected, so far it is only happening to two datastores. I have been running vSphere for about 10 months without issues and haven't changed/patched anything recently. I have 2 hosts, 2 Equallogic PS5000 sans connected through iSSCSI. Screen shots of my cirtual switch's are attached it that helps.

Thank you for your assistance

20. RE: lost connectivity to storage device affected datastores unknown

Recommend

Jennifer Nguyen

Posted Apr 09, 2025 05:01 PM

| view attached

Thank you for the quick response. I ran esxcfg-mpath -bd naa.624a93709cac436e0b074d33000f4019 on host-04 and all paths shows as active. Also ran it against 11 other storage devices (1 local) and they all are showing as active.

This is happening on all 13 of our hosts, on different vmhbas, same storage device and different LUNS.

For example: last night, we received 2 alarms for host 4, on vmhba 4 and 5, same storage device, same LUN. But on Sunday, we received alarms for multiple hosts, different vmhba, storage device, different LUN.

I actually have a host log from one of the last alarms. please see attachment.

We have been working with Pure on the issue. They did not see anything issues on their end.

Attachment(s)

Host-04-events-03-31-2025-11-08-58-AM.csv 17 KB 1 version

Original Message

Original Message:
Sent: Apr 09, 2025 04:15 PM
From: Bharath Kumar G
Subject: lost connectivity to storage device affected datastores unknown

That's strange. Could you SSH to the ESXi host that the alarm is referring to, navigate to /var/log, run ls and provide me a screenshot of what you see?

If you do not find the 3 log files, please do the following:

This would give you all the paths ESXi was able to pick up over the PSA from Storage array:
esxcfg-mpath -bd naa.624a93709cac436e0b074d33000f4019
See if any of the paths listed is showing anything but active. E.g., dead
As your LUNs have multiple paths configured for redundancy, you will not likely see any effect of this message, with your workloads.
https://knowledge.broadcom.com/external/article/318935/path-redundancy-to-storage-device-degrad.html

This would give tell you whether or not there is a VMFS partition on the LUN:
partedUtli getptbl /vmfs/devices/disks/naa.624a93709cac436e0b074d33000f4019

If you see a partition labelled vmfs, you can use the following command to find out which datastore the alert is about:
esxcli storage vmfs extent list | grep naa.624a93709cac436e0b074d33000f4019

The cause of this message is usually due to an issue at Layer 1.
Capture the timestamp and and hostname per the vCenter and engage your Fabric and Storage Vendors to see if they observe any anomalies at the time of the message.
Once you have the report, you can file a ticket with Broadcom VMware-Storage Support team, if you'd like or we may continue to investigate over this thread.

Original Message:
Sent: Apr 09, 2025 03:19 PM
From: Jennifer Nguyen
Subject: lost connectivity to storage device affected datastores unknown

Hi Kumar,

There are no logs for the 3 that you mentioned when I looked in the /var/log directory.

We a primary and a failover, the issue is on the failover site:

2 datacenters

2 Pure Storage

2 Cisco B200M5s

ESXi hosts are all 8.0 3d

Storage adapters is a shared connection from the chassis, fi are 2 physical connections are shared between blades. Not standalone servers, it's shared. If it's a physical connect it would affect all blades.

The alerts that we are getting from vCenter is for paths that do not exist, that is why it shows datastore is 'unknown':

Alarm alarm.StorageConnectivityAlarm on Host host1_name
because Path redundancy to storage device naa.624a93709cac436e0b074d33000f4019 degraded. Path vmhba5:C0:T692:L237 is down. Affected datastores: Unknown..

Not sure why the T: (Target) is so high in the numbering...

Seems like there's a cached path somewhere that is triggering the alarms.

When we do get the alerts, there is nothing down, all Hosts, VMs, vNic, storage adapters, datastores, etc. are green and good. Why is it that we don't get the alerts for the primary site but just for the failover site? They have the same setup and configuration.

Things tried:

Rescanned datastores

Rescanned storage adapters

Disable then re-enabled different alarms: vSAN online health alarm 'Disks usage on storage controller (vCenter level), Cannot connect to storage (vCenter level), Datastore usage on disk (vCenter level)

SFP health check was done by Pure Support

Any idea on this is greatly appreciated.

Original Message:
Sent: Apr 07, 2025 06:53 PM
From: Bharath Kumar G
Subject: lost connectivity to storage device affected datastores unknown

Hi Jennifer,

We should be able to tell from taking a look at the logs.
Can you upload the hostd, vmkernel and vobd logs?

Original Message:
Sent: Apr 07, 2025 11:33 AM
From: Jennifer Nguyen
Subject: lost connectivity to storage device affected datastores unknown

Hi Dave,

Were you able to get this issue resolved? I have the same exact issue.

Original Message:
Sent: Aug 04, 2010 02:51 PM
From: DAMahoney
Subject: lost connectivity to storage device affected datastores unknown

Hello,

I am receiving the error "lost connectivity to storage device affected datastores unknown" once a day, different times everyday. Sometimes it identifies which datastore is affected, so far it is only happening to two datastores. I have been running vSphere for about 10 months without issues and haven't changed/patched anything recently. I have 2 hosts, 2 Equallogic PS5000 sans connected through iSSCSI. Screen shots of my cirtual switch's are attached it that helps.

Thank you for your assistance

21. RE: lost connectivity to storage device affected datastores unknown

Recommend

Broadcom Employee

Bharath Kumar G

Posted Apr 23, 2025 03:13 PM

Hi Jennifer,

Sorry I wasn't around a few days.
I don't see the attachment for some reason.
Path related issues are usually straight forward. ESXi has storage drivers that record response from the storage subsystem and mark the relevant path by failing the command and response H:0x1. This can be viewed int he vmkernel log.
hostd log will record ''Path redundancy to storage device naa.################################degraded. Path vmhba3:C0:T1:L7 is down. Affected datastores: Datastore1.".
Are you able to file a Broadcom VMware ticket by any chance?
If not, please see if you can upload the hostd and vmkernel logs of the ESXi host on which the alert appeared.

Regards,
Bharath G

Original Message

Original Message:
Sent: Apr 09, 2025 05:01 PM
From: Jennifer Nguyen
Subject: lost connectivity to storage device affected datastores unknown