ESXi

 View Only
Expand all | Collapse all

vSphere 6 U2, PDL errors on VVOL PE device

  • 1.  vSphere 6 U2, PDL errors on VVOL PE device

    Broadcom Employee
    Posted Apr 05, 2016 02:51 PM

    Hi guys

    Since we've updated our hosts to vSphere 6.0 U2 with the latest HPE image we noticed massive lags on the hosts. Looking into vmkernel.log shows us thousands of PDL errors on a VVOL PE from our 3PAR array. But we don't use VVOL at this time... Is there e known Issue ore a way to disable VVOL functions? Drivers and Firmware are actual.

    [root@vesxvdi01:~] esxcli storage core adapter list

    HBA Name  Driver         Link State  UID                                   Capabilities         Description

    --------  -------------  ----------  ------------------------------------  -------------------  --------------------------------------------------------------------------------

    vmhba0    ata_piix       link-n/a    sata.vmhba0                                                (0000:00:1f.2) Intel Corporation ICH10 4 port SATA IDE Controller

    vmhba1    hpsa           link-n/a    sas.50123456789abcde                                       (0000:04:00.0) Hewlett-Packard Company Smart Array P410i

    vmhba2    lpfc           link-up     fc.20000090fa56bb24:10000090fa56bb24  Second Level Lun ID  (0000:07:00.0) Emulex Corporation Emulex LPe12000 8Gb PCIe Fibre Channel Adapter

    vmhba3    lpfc           link-up     fc.20000090fa56bb25:10000090fa56bb25  Second Level Lun ID  (0000:07:00.1) Emulex Corporation Emulex LPe12000 8Gb PCIe Fibre Channel Adapter

    vmhba32   bnx2i          unbound     iscsi.vmhba32                                              Broadcom NetXtreme II iSCSI Adapter

    vmhba33   bnx2i          unbound     iscsi.vmhba33                                              Broadcom NetXtreme II iSCSI Adapter

    vmhba34   bnx2i          unbound     iscsi.vmhba34                                              Broadcom NetXtreme II iSCSI Adapter

    vmhba35   bnx2i          unbound     iscsi.vmhba35                                              Broadcom NetXtreme II iSCSI Adapter

    vmhba36   ata_piix       link-n/a    sata.vmhba36                                               (0000:00:1f.2) Intel Corporation ICH10 4 port SATA IDE Controller

    [root@vesxvdi01:~]

    [root@vesxvdi01:~] vmkload_mod -s lpfc | grep Version

    Version: 10.4.236.0-1OEM.600.0.0.2159203

    [root@vesxvdi01:~]

    vmkernel.log

    2016-04-05T14:27:05.577Z cpu6:33386)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0x28) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T2:L256 device naa.2ff70002ac014e9d - triggering path failover

    2016-04-05T14:27:05.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:382: Logical device "naa.2ff70002ac014e9d": awaiting fast path state update before retrying failed command again...

    2016-04-05T14:27:06.577Z cpu9:57333)WARNING: NMP: nmpDeviceAttemptFailover:603: Retry world failover device "naa.2ff70002ac014e9d" - issuing command 0x43a5cc84e400

    2016-04-05T14:27:06.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x28 (0x43a5cc84e400) to dev "naa.2ff70002ac014e9d" failed on path "vmhba2:C0:T3:L256" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.

    2016-04-05T14:27:06.577Z cpu6:33386)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0x28) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T3:L256 device naa.2ff70002ac014e9d - triggering path failover

    2016-04-05T14:27:06.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:382: Logical device "naa.2ff70002ac014e9d": awaiting fast path state update before retrying failed command again...

    2016-04-05T14:27:07.577Z cpu12:33266)WARNING: NMP: nmpDeviceAttemptFailover:603: Retry world failover device "naa.2ff70002ac014e9d" - issuing command 0x43a5cc84e400

    2016-04-05T14:27:07.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x28 (0x43a5cc84e400) to dev "naa.2ff70002ac014e9d" failed on path "vmhba2:C0:T2:L256" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.

    2016-04-05T14:27:07.577Z cpu6:33386)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0x28) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T2:L256 device naa.2ff70002ac014e9d - triggering path failover

    2016-04-05T14:27:07.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:382: Logical device "naa.2ff70002ac014e9d": awaiting fast path state update before retrying failed command again...

    2016-04-05T14:27:07.802Z cpu0:33356)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x2a (0x43a5c0694e40, 53705) to dev "naa.600508b4000af00d0000500001bc0000" on path "vmhba37:C0:T0:L1" Failed: H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:NONE

    2016-04-05T14:27:07.802Z cpu0:33356)ScsiDeviceIO: 2613: Cmd(0x43a5c0694e40) 0x2a, CmdSN 0x80000070 from world 53705 to dev "naa.600508b4000af00d0000500001bc0000" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.

    2016-04-05T14:27:08.577Z cpu9:57333)WARNING: NMP: nmpDeviceAttemptFailover:603: Retry world failover device "naa.2ff70002ac014e9d" - issuing command 0x43a5cc84e400

    2016-04-05T14:27:08.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x28 (0x43a5cc84e400) to dev "naa.2ff70002ac014e9d" failed on path "vmhba2:C0:T3:L256" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.

    2016-04-05T14:27:08.577Z cpu6:33386)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0x28) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T3:L256 device naa.2ff70002ac014e9d - triggering path failover

    2016-04-05T14:27:08.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:382: Logical device "naa.2ff70002ac014e9d": awaiting fast path state update before retrying failed command again...

    2016-04-05T14:27:09.577Z cpu9:33266)WARNING: NMP: nmpDeviceAttemptFailover:603: Retry world failover device "naa.2ff70002ac014e9d" - issuing command 0x43a5cc84e400

    2016-04-05T14:27:09.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x28 (0x43a5cc84e400) to dev "naa.2ff70002ac014e9d" failed on path "vmhba2:C0:T2:L256" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.

    2016-04-05T14:27:09.577Z cpu6:33386)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0x28) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T2:L256 device naa.2ff70002ac014e9d - triggering path failover

    2016-04-05T14:27:09.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:382: Logical device "naa.2ff70002ac014e9d": awaiting fast path state update before retrying failed command again...

    2016-04-05T14:27:10.577Z cpu9:57333)WARNING: NMP: nmpDeviceAttemptFailover:603: Retry world failover device "naa.2ff70002ac014e9d" - issuing command 0x43a5cc84e400

    2016-04-05T14:27:10.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x28 (0x43a5cc84e400) to dev "naa.2ff70002ac014e9d" failed on path "vmhba2:C0:T3:L256" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.



  • 2.  RE: vSphere 6 U2, PDL errors on VVOL PE device

    Posted Apr 05, 2016 03:39 PM

    Yes, this is known issue. Logs showing sense code : failed on path "vmhba2:C0:T2:L256" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0. which means LUN is no longer available or is unmapped.

    Fail over is triggering but not find next available path.

    This table outlines possible SCSI sense codes that determine if a device is in a PDL state:

    SCSI sense codeDescription
    H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0LOGICAL UNIT NOT SUPPORTED
    H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x4c 0x0LOGICAL UNIT FAILED SELF-CONFIGURATION
    H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x3e 0x3LOGICAL UNIT FAILED SELF-TEST
    H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x3e 0x1LOGICAL UNIT FAILURE

    VMware KB: Permanent Device Loss (PDL) and All-Paths-Down (APD) in vSphere 5.x and 6.x

    VMware KB: SCSI events that can trigger ESX server to fail a LUN over to another path

    You need to request hot-fix from Vmware, these are not public and available on request. Before that that validate multipathing is correct.



  • 3.  RE: vSphere 6 U2, PDL errors on VVOL PE device

    Posted Apr 27, 2016 10:54 AM

    Hi!

    Did you get this sorted out? We upgraded our test/dev enviroment and we ran into the exact same issue.

    Regards

    Kenth



  • 4.  RE: vSphere 6 U2, PDL errors on VVOL PE device

    Posted Apr 27, 2016 06:51 PM

    Experiencing a similar thing on a newly installed host.   Any resolve?

    2016-04-27T18:49:03.902Z cpu48:33639)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0x85) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T6:L14 device naa.514f0c535640000f - triggering path failover

    2016-04-27T18:49:03.902Z cpu48:33639)WARNING: NMP: nmpCompleteRetryForPath:382: Logical device "naa.514f0c535640000f": awaiting fast path state update before retrying failed command again...

    2016-04-27T18:49:04.860Z cpu32:33475)<6>host14: fip: host14: FIP VLAN ID unavail. Retry VLAN discovery.

    2016-04-27T18:49:04.860Z cpu32:33475)<6>host14: fip: fcoe_ctlr_vlan_request() is done

    2016-04-27T18:49:04.890Z cpu10:33481)<6>host15: fip: host15: FIP VLAN ID unavail. Retry VLAN discovery.

    2016-04-27T18:49:04.890Z cpu10:33481)<6>host15: fip: fcoe_ctlr_vlan_request() is done

    2016-04-27T18:49:04.902Z cpu43:55091)WARNING: NMP: nmpDeviceAttemptFailover:603: Retry world failover device "naa.514f0c535640000f" - issuing command 0x439e538367c0

    2016-04-27T18:49:04.902Z cpu48:33639)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x85 (0x439e538367c0) to dev "naa.514f0c535640000f" failed on path "vmhba2:C0:T6:L14" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.



  • 5.  RE: vSphere 6 U2, PDL errors on VVOL PE device

    Posted Apr 27, 2016 06:59 PM

    I found this kb: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2133286

    If I stop smartd, the log entries do stop, but still have this in the vmkernel.log

    2016-04-27T18:57:43.407Z cpu18:33464)<6>host15: fip: host15: FIP VLAN ID unavail. Retry VLAN discovery.

    2016-04-27T18:57:43.407Z cpu18:33464)<6>host15: fip: fcoe_ctlr_vlan_request() is done

    2016-04-27T18:57:45.371Z cpu38:33489)<6>host14: fip: host14: FIP VLAN ID unavail. Retry VLAN discovery.

    2016-04-27T18:57:45.371Z cpu38:33489)<6>host14: fip: fcoe_ctlr_vlan_request() is done



  • 6.  RE: vSphere 6 U2, PDL errors on VVOL PE device

    Posted Apr 27, 2016 07:07 PM

    Are you running a 3PAR array as well? ..

    I tried to stop the smart daemon as well but that didn't help us, took forever to do a storage refresh on the ESX even with the smartd stopped.

    We got this issue sorted out with this KB :

    VMware KB: Changing the Disk.MaxLUN parameter on ESXi Hosts

    I turned it down to 180. (depends on how many LUNs you got in your environment), this will make the ESX only see the LUNids below 180 in my case, that means that the vVol PE (LUNid 256) is no longer present in the ESX. If you try it, don't forget to reboot the host after you change the value.

    Regards Kenth



  • 7.  RE: vSphere 6 U2, PDL errors on VVOL PE device

    Posted Apr 27, 2016 07:09 PM

    No 3PAR, but all FC storage with EMC XtremIO, EMC VNX, and NetApp. 



  • 8.  RE: vSphere 6 U2, PDL errors on VVOL PE device

    Posted Apr 27, 2016 07:19 PM

    Ah okay. Since the vVol PE on a 3PAR is ID 256, i could use the MaxDisk value to make it disappear.

    The lun you seem to have a problem with is ID 14 so the workaround I used is probably not the way to go for you.



  • 9.  RE: vSphere 6 U2, PDL errors on VVOL PE device

    Posted Apr 27, 2016 07:09 PM

    Did you try to change the Disk.MaxLun value to a value below 256? .. We tried 180 and that did the trick!

    KB

    VMware KB: Changing the Disk.MaxLUN parameter on ESXi Hosts



  • 10.  RE: vSphere 6 U2, PDL errors on VVOL PE device

    Broadcom Employee
    Posted Apr 29, 2016 07:48 AM

    Hi guys

    Meanwhile I have two cases open with HPE and VMware regarding this issue. The only workaround is to limit the Disk.MaxLUN=256 to sort out the PE device. But unfortunately you can not use VVOLs at this point. Disabling the smartd service doesn't have an impact on this. I let you know if there's a solution for that...



  • 11.  RE: vSphere 6 U2, PDL errors on VVOL PE device

    Posted May 03, 2016 11:40 AM

    Hi

    I have also see this under upgrade from 5.5 to 6.0 update 2, I had to change the Disk.MaxLun to 255 before upgrading, if it's not set the upgrades stops at the start, after loading all the modules.

    This is also HPE 3PAR



  • 12.  RE: vSphere 6 U2, PDL errors on VVOL PE device
    Best Answer

    Broadcom Employee
    Posted Jun 22, 2016 09:44 PM

    Hi guys

    Finally, there is a patch for the 3PAR arrays available which fix this issue by answering the SCSI requests. There are two patches, one for 3.2.1 P37 and one for 3.2.2 P24.

    They work correctly.



  • 13.  RE: vSphere 6 U2, PDL errors on VVOL PE device

    Posted Jun 23, 2016 08:45 PM

    @EagleB5 - any idea what MU level for P37? Not able to find anything on it...



  • 14.  RE: vSphere 6 U2, PDL errors on VVOL PE device

    Broadcom Employee
    Posted Aug 30, 2016 12:54 PM

    Sorry for my delay, I must configure a notification somehow.

    You need 3.2.1 MU3 for P37 or 3.2.2 MU2 for P24.