vSphere Storage Appliance

 View Only
Expand all | Collapse all

LUN disconnects, vmkernel errors

  • 1.  LUN disconnects, vmkernel errors

    Posted Jan 25, 2010 11:01 AM

    Hello we have a problem with ESX servers connections to DS4700.

    Several times already we're getting errors about LUN disconnects in vmkwarning log for several paths.

    Can anyone help to understand what these errors mean?

    The errors are similiar to this:

    Jan 17 02:07:27 v0054esx021 vmkernel: 32:14:20:02.996 cpu5:4258)WARNING: VMW_SATP_LSI: satp_lsi_issueSyncPathCommandWithRetries: MODE_SENSE10 command to path "vmhba2:C0:T0:L29" failed with status = 0/1 0x0 0x0 0x0

    Jan 21 02:07:13 v0054esx021 vmkernel: 36:14:19:05.589 cpu3:4180)WARNING: VMW_SATP_LSI: satp_lsi_setPreferredController: Could not write mode page data for path "vmhba2:C0:T0:L21"

    Jan 17 02:07:28 v0054esx021 vmkernel: 32:14:20:35.930 cpu5:4173)WARNING: NMP: nmp_DeviceStartLoop: NMP Device "naa.600a0b8000508ce4000008f24a98d9b5" is blocked. Not starting I/O from device.

    Jan 17 02:07:28 v0054esx021 vmkernel: 32:14:20:47.619 cpu2:4111)WARNING: NMP: nmp_SelectPathAndIssueCommand: PSP selected path "vmhba1:C0:T0:L28" in a bad state (standby)on device "naa.600a0b8000508ce40000375d4b3316d5".

    Jan 17 02:07:28 v0054esx021 vmkernel: 32:14:20:47.619 cpu2:4111)WARNING: NMP: nmp_DeviceRetryCommand: Device "naa.600a0b8000508ce40000375d4b3316d5": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.

    WARNING: vmw_psp_fixed: psp_fixedSelectPathToActivateInt: Switching to preferred path vmhba2:C0:T0:L26 in STANDBY state - on device naa.600a0b8000508ce4000037594b3316ad. VMware does not support this configuration.

    Jan 17 02:09:04 v0054esx021 vmkernel: 32:14:22:32.857 cpu5:4222)WARNING: NMP: nmp_DeviceRetryCommand: Device "naa.600a0b8000508ce4000008f24a98d9b5": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.



  • 2.  RE: LUN disconnects, vmkernel errors

    Posted Jan 25, 2010 11:27 AM

    and also like this:

    Jan 17 02:07:27 v0054esx015 vmkernel: 32:16:47:12.902 cpu5:4908)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x410002089240) to NMP device "naa.600a0b8000508ce4000008cc4a98ce32" failed on physical path "vmhba0:C0:T0:L1" H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x0.

    the DS4700 specs tell that there is no errors on the array.

    But I think these errors tell that ESX servers loose connectivity to array...



  • 3.  RE: LUN disconnects, vmkernel errors

    Posted Jan 25, 2010 12:49 PM

    Could be a problem with multipath policy setting.

    Have you manually changed to round robin?

    How is the current policy?

    Andre



  • 4.  RE: LUN disconnects, vmkernel errors

    Posted Jan 25, 2010 09:07 PM

    The DS4700 is an active/passive device your path selection policy should be set to MRU, Most Recently Used (VMware).

    -C



  • 5.  RE: LUN disconnects, vmkernel errors

    Broadcom Employee
    Posted Jan 26, 2010 01:34 AM

    correct;

    Hi, jasco

    are you using vmw_psp_fixed? it is not default for active/passive, please change it to the default vmw_psp_mru;

    also the below message indicates "no connection", please check your physical connections

    Jan 17 02:07:27 v0054esx021 vmkernel: 32:14:20:02.996 cpu5:4258)WARNING: VMW_SATP_LSI: satp_lsi_issueSyncPathCommandWithRetries: MODE_SENSE10 command to path "vmhba2:C0:T0:L29" failed with status = 0/1 0x0 0x0 0x0

    binoche, VMware VCP, Cisco CCNA



  • 6.  RE: LUN disconnects, vmkernel errors

    Posted Feb 15, 2010 02:24 PM

    Hello Jasco,

    We experience the same problem. Just received our new DS5020 storage and every 30 seconds I receive this message in vmkwarnings of our ESX 4.0 U1 hosts.

    Feb 12 13:56:46 esx-srv-b vmkernel: 3:13:25:05.758 cpu6:4421)WARNING: NMP: nmp_DeviceRetryCommand: Device "naa.60080e500017ed5e000007034b711d45": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.

    Feb 12 13:56:47 esx-srv-b vmkernel: 3:13:25:07.273 cpu0:4206)WARNING: NMP: nmp_DeviceAttemptFailover: Retry world failover device "naa.60080e500017ed5e000007034b711d45" - issuing command 0x410005145f00

    Feb 12 13:59:45 esx-srv-b vmkernel: 3:13:28:05.193 cpu7:4436)WARNING: NMP: nmp_DeviceRetryCommand: Device "naa.60080e500017ed5e000007034b711d45": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.

    Feb 12 13:59:47 esx-srv-b vmkernel: 3:13:28:06.699 cpu0:4206)WARNING: NMP: nmp_DeviceAttemptFailover: Retry world failover device "naa.60080e500017ed5e000007034b711d45" - issuing command 0x410005141a00

    Feb 12 14:01:43 esx-srv-b vmkernel: 3:13:30:02.980 cpu6:4244)WARNING: VMW_SATP_LSI: satp_lsi_issueCmdRetryOnLockViolation: MODE_SELECT10 command to path "vmhba2:C0:T5:L0" failed with status = 2/0 0x5 0x91 0x0

    Feb 12 14:01:43 esx-srv-b vmkernel: 3:13:30:02.980 cpu6:4244)WARNING: VMW_SATP_LSI: satp_lsi_setPreferredController: Could not write mode page data for path "vmhba2:C0:T5:L0"

    Feb 12 14:01:43 esx-srv-b vmkernel: 3:13:30:02.981 cpu7:4244)WARNING: VMW_SATP_LSI: satp_lsi_setPreferredController: Issuing forced satp_fastt_setPreferredController

    This means he's constantly trying to change paths. All servers are MRU and all connections are working.

    Anyone has a solution as yet? Because it is affecting performance?

    Thanks,

    Didier



  • 7.  RE: LUN disconnects, vmkernel errors

    Posted Feb 15, 2010 03:26 PM

    well, actually our problem was that the Storage manager was updated to the latest version and wanted to get some statistics every night at 2 am.

    then it did, there was so much stats so the SPs was rebooted.

    If you have a path trashing when mru is set you might have too many luns with too many VMs connected to one host at a time.

    We had such issue too, and we ended up recreating LUNs and moving VMs.

    That is why we use a script that tells ESX servers to work in the way that first HBA to first SP, second HBA to second SP. it makes a bit of loadbalancing.



  • 8.  RE: LUN disconnects, vmkernel errors

    Broadcom Employee
    Posted Feb 16, 2010 03:40 AM

    Hi, Didier

    are there other ESX hosts than esx-srv-b also accessing DS5020? could you please upload also their vmkernel messages? thanks

    binoche, VMware VCP, Cisco CCNA



  • 9.  RE: LUN disconnects, vmkernel errors

    Posted Feb 16, 2010 08:55 AM

    Binoche,

    Some other examples:

    A newly created ESX 4 with update 1

    ESX-F: on lun B-S-VEEAM

    Feb 15 10:41:01 esx-srv-f vmkernel: 4:17:45:57.734 cpu11:4107)WARNING: NMP: nmp_DeviceRetryCommand: Device "naa.60080e500017eef800000ae64b78bb67": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.

    the only thing running was a copy operation to this lun (nu VM's)

    ESX-C:

    Feb 16 07:26:52 esx-srv-c vmkernel: 69:14:32:10.900 cpu1:19469)WARNING: NMP: nmp_DeviceRetryCommand: Device "naa.60080e500017ed5e000007034b711d45": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.

    Feb 16 07:26:55 esx-srv-c vmkernel: 69:14:32:13.448 cpu7:4206)WARNING: NMP: nmp_DeviceAttemptFailover: Retry world failover device "naa.60080e500017ed5e000007034b711d45" - issuing command 0x41000519ce00

    Feb 16 07:26:56 esx-srv-c vmkernel: 69:14:32:14.773 cpu4:4358)WARNING: VMW_SATP_LSI: satp_lsi_issueCmdRetryOnLockViolation: MODE_SELECT10 command to path "vmhba4:C0:T5:L8" failed with status = 2/0 0x5 0x91 0x0

    Feb 16 07:26:56 esx-srv-c vmkernel: 69:14:32:14.773 cpu4:4358)WARNING: VMW_SATP_LSI: satp_lsi_setPreferredController: Could not write mode page data for path "vmhba4:C0:T5:L8"

    Feb 16 07:26:56 esx-srv-c vmkernel: 69:14:32:14.774 cpu4:4358)WARNING: VMW_SATP_LSI: satp_lsi_setPreferredController: Issuing forced satp_fastt_setPreferredController

    ESX-A:

    Feb 16 07:56:49 esx-srv-a vmkernel: 7:07:25:09.296 cpu2:4206)WARNING: NMP: nmp_DeviceAttemptFailover: Retry world failover device "naa.60080e500017ed5e000007004b711ca3" - issuing command 0x4100054b5880

    Feb 16 07:56:49 esx-srv-a vmkernel: 7:07:25:09.296 cpu2:4206)WARNING: NMP: nmp_SelectPathAndIssueCommand: PSP selected path "vmhba2:C0:T5:L0" in a bad state (standby)on device "naa.60080e500017ed5e000007004b711ca3".

    Feb 16 07:56:49 esx-srv-a vmkernel: 7:07:25:09.296 cpu2:4206)WARNING: NMP: nmp_CompleteRetryForPath: Retry command 0x12 (0x4100054b5880) to NMP device "naa.60080e500017ed5e000007004b711ca3" failed on physical path "vmhba2:C0:T5:L0" H:0x1 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0

    Feb 16 07:56:49 esx-srv-a vmkernel: 7:07:25:09.296 cpu2:4206)WARNING: NMP: nmp_CompleteRetryForPath: Logical device "naa.60080e500017ed5e000007004b711ca3": awaiting fast path state update before retrying failed command again...

    Feb 16 07:57:00 esx-srv-a vmkernel: 7:07:25:19.944 cpu4:4176)WARNING: VMW_SATP_LSI: satp_lsi_issueCmdRetryOnLockViolation: MODE_SELECT10 command to path "vmhba2:C0:T5:L0" failed with status = 2/0 0x5 0x91 0x0

    Feb 16 07:57:00 esx-srv-a vmkernel: 7:07:25:19.944 cpu4:4176)WARNING: VMW_SATP_LSI: satp_lsi_setPreferredController: Could not write mode page data for path "vmhba2:C0:T5:L0"

    Feb 16 07:57:00 esx-srv-a vmkernel: 7:07:25:19.945 cpu4:4176)WARNING: VMW_SATP_LSI: satp_lsi_setPreferredController: Issuing forced satp_fastt_setPreferredController

    thanks for taking a look at it.

    Didier



  • 10.  RE: LUN disconnects, vmkernel errors

    Broadcom Employee
    Posted Feb 16, 2010 02:33 PM

    1, Feb 16 07:56:49 esx-srv-a vmkernel: 7:07:25:09.296 cpu2:4206)WARNING: NMP: nmp_CompleteRetryForPath: Retry command 0x12 (0x4100054b5880) to NMP device "naa.60080e500017ed5e000007004b711ca3" failed on physical path "vmhba2:C0:T5:L0" H:0x1 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0

    this usually means connection loss; does DS5020 controller reboot here?

    2, Feb 16 07:57:00 esx-srv-a vmkernel: 7:07:25:19.944 cpu4:4176)WARNING: VMW_SATP_LSI: satp_lsi_issueCmdRetryOnLockViolation: MODE_SELECT10 command to path "vmhba2:C0:T5:L0" failed with status = 2/0 0x5 0x91 0x0

    I do not know what this means?

    3, can you recheck that your DS5020 are configured correctly?

    binoche, VMware VCP, Cisco CCNA



  • 11.  RE: LUN disconnects, vmkernel errors

    Posted Feb 18, 2010 08:08 AM

    This issue is a bug in Vsphere 4.0 because we removed a lun (no vms on it) but was connected to all our esx servers.

    Look at this post.

    http://virtualgeek.typepad.com/virtual_geek/2009/12/an-important-vsphere-4-storage-bug-and-workaround.html

    Thanks for your help,

    Didier