VMware vSphere

 View Only
Expand all | Collapse all

Slow acces to datastore on HP ProLiant ML350 Gen10

  • 1.  Slow acces to datastore on HP ProLiant ML350 Gen10

    Posted Nov 30, 2021 10:22 AM

    Hi,

    it first slow downs and then no Server running on the Datastore is accessible. The Clone-Tasks on the Datastore run against timeouts. The VMs cant be restarted or shutdown. The ESX does not respond and the Datatore can't be browsed. The Only way is to Reset\ Restart the Host via ILO. There are no Errors whatsover in ILO, vCenter or ESX Monitoring. The ISCSI Datastores are still accessible.

    We have reinstalled the ESX and reconfigured the RAIDS. The Machines run on the Backup Dell Maschine with same ESX Version without any Problems.

    There is just one Datastore 2 TB in size. There are three VMs including vCenter running on the Host. The other two are the DC and AppServer. The Appserver is with normal load. We had no problems before. The Problem appeared when we moved from 6.5 to some 6.7 Update.

    Attached you will find the HDD Performance Stats (in German) which maybe points to the storage driver. Also attached is the firmware info.
    Since we have tried everything possible we narrow down our problem to Storage.

    Here are the further specs:

     

    • Product Name
      ProLiant ML350 Gen10
    • Server
      ESX7.praxis-bux.local
    • Operating System
      VMware ESXi 7.0.2 Build-17867351 Update 2
    • System ROM
      U41 v2.42 (01/23/2021)
    • System ROM Date
      01/23/2021
    • Redundant System ROM
      U41 v2.22 (11/13/2019)

     

     

    Is anybody having similar issues?

    I thankyou in advance for any help to resolve this issue.

     

    Kind Regards

    Sardar



  • 2.  RE: Slow acces to datastore on HP ProLiant ML350 Gen10

    Posted Nov 30, 2021 06:38 PM

    Hello.
    Usually the slow access to the internal storage depends on the FIrmware levels of the internal disk controller and your disks. It also influences the driver version.
    In your case we show an HP Smart Array
    P408i-a SR Gen10 with Firmware 3.53
    The disks have HPD3 firmware, you need to know the P/n and model of the disks to check for new firmware levels.
    As you are using version 7.0 Update 2 you should try the latest levels available:
    Firmware 4.11 and Driver smartpqi version 70.4150.0.119.

    e_espinel_0-1638297120956.png

     

    https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=io&productid=43704

    Firmware link:

    https://support.hpe.com/hpesc/public/swd/detail?swItemId=MTX-f39382f2be4d450e987c26819e

     

     



  • 3.  RE: Slow acces to datastore on HP ProLiant ML350 Gen10

    Posted Dec 01, 2021 09:19 PM

    Hello,

    Thank you for your reply.

    I will go through the specifics you mentioned and post back.

    Kind Regards

    Khan



  • 4.  RE: Slow acces to datastore on HP ProLiant ML350 Gen10

    Posted Dec 02, 2021 10:22 PM

    Hi,

    we installed the newest available SPP 2021.10.0 for the pProliant Server.

    With that now we have  the latest Firmware 4.11. This didn't change the situation.

    I now also updated the smartpqi to latest version. Attached you will find the newest version.

    I will report if this solves my problem. It takes almost a day or two to conclude.

    P.S.: I updated the correct Firmware image of HP.

    Kind Regards

    Sardar



  • 5.  RE: Slow acces to datastore on HP ProLiant ML350 Gen10



  • 6.  RE: Slow acces to datastore on HP ProLiant ML350 Gen10

    Posted Dec 03, 2021 12:24 PM
      |   view attached

    Hi Enrique,

    unfortuantely the problem is not resolved even after updating the both drivers. Please see the attached file.

    Any further ideas where to look.?

    Kind Regards
    Sardar



  • 7.  RE: Slow acces to datastore on HP ProLiant ML350 Gen10

    Posted Dec 03, 2021 03:04 PM

    Hello.
    Have you updated the firmware of the disks?
    Does the controller have cache and battery?
    What are the read write cache values?
    Do you have HPE Smart Storage Administrator (HPE SSA) CLI for VMware 7.0 installed?
    attached link:

    If you installed VMware vSphere with the HPE custom image then you already have the right utility.
    it should be located in a subdirectory like these
    /opt/smartstorageadmin/ssacli/bin/ssacli
    /opt/hp/hpsssacli/bin/hpssacli

     

     



  • 8.  RE: Slow acces to datastore on HP ProLiant ML350 Gen10

    Posted Dec 11, 2021 02:27 PM

    Hi Enrique,

    no i have not updated the firmware of the Disks. To be honest i dont know how to do that. Can i do that from ILO?

    I have not seen it physically but as far as ILO shows, it is supposed to have cache and Battery. I have attached some screenshots.

    We do have installed the custom HPE image. Where and how should i look for the Tools and subdirectories?

    I am not an advanced user, which u can already guess. I would be thankful for any further help.

    P.S.: Is there anywhere in logs to see why the ESX behaves as shown in image "DISK-IO.jpg"

     

    Kind Regards
    Sardar



  • 9.  RE: Slow acces to datastore on HP ProLiant ML350 Gen10

    Posted Dec 11, 2021 04:56 PM

    Hello.
    According to what you have sent us:
    Disk model EG001200JWJNQ with HPD4 firmware is at its latest level (2021), originally the disks had HPD3 firmware, they were upgraded by applying the latest SPP 2021.
    Controller FW version 4.11, this is ok.

    If you installed the custom image of HPE VMware ESXi 7.0 Build-17867351 Update 2 this is from May 2021, this image includes HPE Agentless Management Bundle for ESXi 7.0.
    HPE has reported problems with the ASM (Agentless Management Service) in versions 6.7 and 7.0 and recommends installing the latest version (Nov 2021).

    Attached is a link to the latest version

    https://support.hpe.com/hpesc/public/swd/detail?swItemId=MTX_ef3c7b0fd13e4ee486e0263676#tab-history

    follow the instructions to install:
    1 Power off any virtual machines that are running on the host and place the host into maintenance mode.
    2. login to the ESXi host with an SSH session (you must enable SSH access), the user must be root.
    3. copy file xx.zip to an internal ESXI volume
    4. run the command
         # esxcli software component apply -d <ESXi local path><component.zip> command
    5. After the component is installed, reboot the ESXi host for the updates to take effect.
    6. Log in to the ESXi host and take it out of maintenance mode and verify its operation.

    Since you have the custom HPE image, the utility to manage and monitor the smart Array controller should be in one of these directories.
    /opt/smartstorageadmin/ssacli/bin/ssacli
    /opt/hp/hpsssacli/bin/hpssacli

    In this link you will find more information about HPE SSACLI and its commands.

    https://be-virtual.net/hpe-storage-controller-management-ssacli/

    https://kb.gtkc.net/hp-smart-array-cli-commands/


    Execute the commands and attach your results in this post
    Show config
    Show detail
    Show config detail
    Show status

     

     

     



  • 10.  RE: Slow acces to datastore on HP ProLiant ML350 Gen10

    Posted Dec 19, 2021 03:16 PM

    Hi,

     

    thanks alot for the instructions. I will give a shot this week and post back the results.

     

    Kind Regards

    Khan



  • 11.  RE: Slow acces to datastore on HP ProLiant ML350 Gen10

    Posted Dec 19, 2021 09:47 PM

    Hi,

     

    have you ever checked the logs if they would contain any hints or error messages?
    Based on my personal experience driver or firmware issues might cause such problems, but most of the time they're caused by other reasons.

    The vmkernel, vmkwarning or vobd logs should contain messages when the IOs got stucked.

     

    Just my 2 cents.

    Ralf



  • 12.  RE: Slow acces to datastore on HP ProLiant ML350 Gen10

    Posted Dec 31, 2021 09:43 PM

    Hi Ralf,

     

    thx for the Tipp. I have checked the logs but not intense. Maybe its time to do that. Do you know maybe what (or wicht Logfile) should i look for in particular?

    Kind Regards

    Khan



  • 13.  RE: Slow acces to datastore on HP ProLiant ML350 Gen10

    Posted Dec 31, 2021 09:40 PM
      |   view attached

    Hi,

    Appoligies for the delay. Here are the results (you will also find the results as an attachment):

    [root@ESX7:~] /opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show config

    HPE Smart Array P408i-a SR Gen10 in Slot 0 (Embedded) (sn: PEYHC0DRHCXC65)

     

    Internal Drive Cage at Port 1I, Box 3, OK

     

    Internal Drive Cage at Port 2I, Box 0, OK


    Port Name: 1I (Mixed)

    Port Name: 2I (Mixed)

    Array A (SAS, Unused Space: 1 MB)

    logicaldrive 1 (2.18 TB, RAID 1+0, OK)

    physicaldrive 1I:3:1 (port 1I:box 3:bay 1, SAS HDD, 1.2 TB, OK)
    physicaldrive 1I:3:2 (port 1I:box 3:bay 2, SAS HDD, 1.2 TB, OK)
    physicaldrive 1I:3:3 (port 1I:box 3:bay 3, SAS HDD, 1.2 TB, OK)
    physicaldrive 1I:3:4 (port 1I:box 3:bay 4, SAS HDD, 1.2 TB, OK)

    SEP (Vendor ID HPE, Model Smart Adapter) 379 (WWID: 51402EC0144D1C28)

     

    [root@ESX7:~] /opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show detail

    HPE Smart Array P408i-a SR Gen10 in Slot 0 (Embedded)
    Bus Interface: PCI
    Slot: 0
    Serial Number: PEYHC0DRHCXC65
    RAID 6 Status: Enabled
    Controller Status: OK
    Hardware Revision: B
    Firmware Version: 4.11
    Firmware Supports Online Firmware Activation: True
    Driver Supports Online Firmware Activation: False
    Rebuild Priority: High
    Expand Priority: Medium
    Surface Scan Delay: 3 secs
    Surface Scan Mode: Idle
    Parallel Surface Scan Supported: Yes
    Current Parallel Surface Scan Count: 1
    Max Parallel Surface Scan Count: 16
    Queue Depth: Automatic
    Monitor and Performance Delay: 60 min
    Elevator Sort: Enabled
    Degraded Performance Optimization: Disabled
    Inconsistency Repair Policy: Disabled
    Write Cache Bypass Threshold Size: 1040 KiB
    Wait for Cache Room: Disabled
    Surface Analysis Inconsistency Notification: Disabled
    Post Prompt Timeout: 0 secs
    Cache Board Present: True
    Cache Status: OK
    Cache Ratio: 10% Read / 90% Write
    Configured Drive Write Cache Policy: Default
    Unconfigured Drive Write Cache Policy: Default
    Total Cache Size: 2.0
    Total Cache Memory Available: 1.8
    Battery Backed Cache Size: 1.8
    No-Battery Write Cache: Disabled
    SSD Caching RAID5 WriteBack Enabled: True
    SSD Caching Version: 2
    Cache Backup Power Source: Batteries
    Battery/Capacitor Count: 1
    Battery/Capacitor Status: OK
    SATA NCQ Supported: True
    Spare Activation Mode: Activate on physical drive failure (default)
    Controller Temperature (C): 68
    Capacitor Temperature (C): 57
    Number of Ports: 2 Internal only
    Encryption: Not Set
    Express Local Encryption: False
    Driver Name: smartpqi
    Driver Version: VMware 70.4150.0.119
    PCI Address (DomainDevice.Function): 0000:65:00.0
    Negotiated PCIe Data Rate: PCIe 3.0 x8 (7880 MB/s)
    Controller Mode: Mixed
    Port Max Phy Rate Limiting Supported: False
    Latency Scheduler Setting: Disabled
    Current Power Mode: MaxPerformance
    Survival Mode: Enabled
    Host Serial Number: CZJ94704DV
    Sanitize Erase Supported: True
    Sanitize Lock: None
    Sensor ID: 0
    Location: Capacitor
    Current Value (C): 57
    Max Value Since Power On: 59
    Sensor ID: 1
    Location: ASIC
    Current Value (C): 68
    Max Value Since Power On: 70
    Sensor ID: 2
    Location: Unknown
    Current Value (C): 53
    Max Value Since Power On: 56
    Primary Boot Volume: None
    Secondary Boot Volume: None

     


    [root@ESX7:~] /opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show config detail

    HPE Smart Array P408i-a SR Gen10 in Slot 0 (Embedded)
    Bus Interface: PCI
    Slot: 0
    Serial Number: PEYHC0DRHCXC65
    RAID 6 Status: Enabled
    Controller Status: OK
    Hardware Revision: B
    Firmware Version: 4.11
    Firmware Supports Online Firmware Activation: True
    Driver Supports Online Firmware Activation: False
    Rebuild Priority: High
    Expand Priority: Medium
    Surface Scan Delay: 3 secs
    Surface Scan Mode: Idle
    Parallel Surface Scan Supported: Yes
    Current Parallel Surface Scan Count: 1
    Max Parallel Surface Scan Count: 16
    Queue Depth: Automatic
    Monitor and Performance Delay: 60 min
    Elevator Sort: Enabled
    Degraded Performance Optimization: Disabled
    Inconsistency Repair Policy: Disabled
    Write Cache Bypass Threshold Size: 1040 KiB
    Wait for Cache Room: Disabled
    Surface Analysis Inconsistency Notification: Disabled
    Post Prompt Timeout: 0 secs
    Cache Board Present: True
    Cache Status: OK
    Cache Ratio: 10% Read / 90% Write
    Configured Drive Write Cache Policy: Default
    Unconfigured Drive Write Cache Policy: Default
    Total Cache Size: 2.0
    Total Cache Memory Available: 1.8
    Battery Backed Cache Size: 1.8
    No-Battery Write Cache: Disabled
    SSD Caching RAID5 WriteBack Enabled: True
    SSD Caching Version: 2
    Cache Backup Power Source: Batteries
    Battery/Capacitor Count: 1
    Battery/Capacitor Status: OK
    SATA NCQ Supported: True
    Spare Activation Mode: Activate on physical drive failure (default)
    Controller Temperature (C): 67
    Capacitor Temperature (C): 57
    Number of Ports: 2 Internal only
    Encryption: Not Set
    Express Local Encryption: False
    Driver Name: smartpqi
    Driver Version: VMware 70.4150.0.119
     PCI Address (Domain:Bus:Device.Function): 0000:65:00.0
    Negotiated PCIe Data Rate: PCIe 3.0 x8 (7880 MB/s)
    Controller Mode: Mixed
    Port Max Phy Rate Limiting Supported: False
    Latency Scheduler Setting: Disabled
    Current Power Mode: MaxPerformance
    Survival Mode: Enabled
    Host Serial Number: CZJ94704DV
    Sanitize Erase Supported: True
    Sanitize Lock: None
    Sensor ID: 0
    Location: Capacitor
    Current Value (C): 57
    Max Value Since Power On: 59
    Sensor ID: 1
    Location: ASIC
    Current Value (C): 67
    Max Value Since Power On: 70
    Sensor ID: 2
    Location: Unknown
    Current Value (C): 52
    Max Value Since Power On: 56
    Primary Boot Volume: None
    Secondary Boot Volume: None

     

    Internal Drive Cage at Port 1I, Box 3, OK

    Drive Bays: 4
    Port: 1I
    Box: 3
    Location: Internal

    Physical Drives
    physicaldrive 1I:3:1 (port 1I:box 3:bay 1, SAS HDD, 1.2 TB, OK)
    physicaldrive 1I:3:2 (port 1I:box 3:bay 2, SAS HDD, 1.2 TB, OK)
    physicaldrive 1I:3:3 (port 1I:box 3:bay 3, SAS HDD, 1.2 TB, OK)
    physicaldrive 1I:3:4 (port 1I:box 3:bay 4, SAS HDD, 1.2 TB, OK)

     

    Internal Drive Cage at Port 2I, Box 0, OK

    Drive Bays: 4
    Port: 2I
    Box: 0
    Location: Internal

    Physical Drives
    None attached


    Port Name: 1I
    Port ID: 0
    Port Mode: Mixed
    Port Connection Number: 0
    SAS Address: 51402EC0144D1C20
    Port Location: Internal

    Port Name: 2I
    Port ID: 1
    Port Mode: Mixed
    Port Connection Number: 1
    SAS Address: 51402EC0144D1C24
    Port Location: Internal

    Array: A
    Interface Type: SAS
    Unused Space: 1 MB (0.00%)
    Used Space: 4.37 TB (100.00%)
    Status: OK
    MultiDomain Status: OK
    Array Type: Data
    Smart Path: disable


    Logical Drive: 1
    Size: 2.18 TB
    Fault Tolerance: 1+0
    Heads: 255
    Sectors Per Track: 32
    Cylinders: 65535
    Strip Size: 256 KB
    Full Stripe Size: 512 KB
    Status: OK
    Unrecoverable Media Errors: None
    MultiDomain Status: OK
    Caching: Enabled
    Unique Identifier: 600508B1001C35C6E76289131F3BF536
    Logical Drive Label: Logical Drive 1
    Mirror Group 1:
    physicaldrive 1I:3:1 (port 1I:box 3:bay 1, SAS HDD, 1.2 TB, OK)
    physicaldrive 1I:3:2 (port 1I:box 3:bay 2, SAS HDD, 1.2 TB, OK)
    Mirror Group 2:
    physicaldrive 1I:3:3 (port 1I:box 3:bay 3, SAS HDD, 1.2 TB, OK)
    physicaldrive 1I:3:4 (port 1I:box 3:bay 4, SAS HDD, 1.2 TB, OK)
    Drive Type: Data
    LD Acceleration Method: Controller Cache


    physicaldrive 1I:3:1
    Port: 1I
    Box: 3
    Bay: 1
    Status: OK
    Drive Type: Data Drive
    Interface Type: SAS
    Size: 1.2 TB
    Drive exposed to OS: False
    Logical/Physical Block Size: 512/512
    Rotational Speed: 10500
    Firmware Revision: HPD4
    Serial Number: WFK54QCF
    WWID: 5000C5009A62B229
    Model: HPE EG001200JWJNQ
    Current Temperature (C): 53
    Maximum Temperature (C): 56
    PHY Count: 2
    PHY Transfer Rate: 12.0Gbps, Unknown
    PHY Physical Link Rate: 12.0Gbps, Unknown
    PHY Maximum Link Rate: 12.0Gbps, 12.0Gbps
    Drive Authentication Status: OK
    Carrier Application Version: 11
    Carrier Bootloader Version: 6
    Sanitize Erase Supported: True
    Sanitize Estimated Max Erase Time: 1 hour(s), 55 minute(s)
    Unrestricted Sanitize Supported: True
    Shingled Magnetic Recording Support: None
    Drive Unique ID: 5000C5009A62B22B
    Self Encrypting Drive: False

    physicaldrive 1I:3:2
    Port: 1I
    Box: 3
    Bay: 2
    Status: OK
    Drive Type: Data Drive
    Interface Type: SAS
    Size: 1.2 TB
    Drive exposed to OS: False
    Logical/Physical Block Size: 512/512
    Rotational Speed: 10500
    Firmware Revision: HPD4
    Serial Number: WFK58NSL
    WWID: 5000C5009A5A6399
    Model: HPE EG001200JWJNQ
    Current Temperature (C): 58
    Maximum Temperature (C): 61
    PHY Count: 2
    PHY Transfer Rate: 12.0Gbps, Unknown
    PHY Physical Link Rate: 12.0Gbps, Unknown
    PHY Maximum Link Rate: 12.0Gbps, 12.0Gbps
    Drive Authentication Status: OK
    Carrier Application Version: 11
    Carrier Bootloader Version: 6
    Sanitize Erase Supported: True
    Sanitize Estimated Max Erase Time: 1 hour(s), 55 minute(s)
    Unrestricted Sanitize Supported: True
    Shingled Magnetic Recording Support: None
    Drive Unique ID: 5000C5009A5A639B
    Self Encrypting Drive: False

    physicaldrive 1I:3:3
    Port: 1I
    Box: 3
    Bay: 3
    Status: OK
    Drive Type: Data Drive
    Interface Type: SAS
    Size: 1.2 TB
    Drive exposed to OS: False
    Logical/Physical Block Size: 512/512
    Rotational Speed: 10500
    Firmware Revision: HPD4
    Serial Number: WFK54QE4
    WWID: 5000C5009A62B001
    Model: HPE EG001200JWJNQ
    Current Temperature (C): 57
    Maximum Temperature (C): 59
    PHY Count: 2
    PHY Transfer Rate: 12.0Gbps, Unknown
    PHY Physical Link Rate: 12.0Gbps, Unknown
    PHY Maximum Link Rate: 12.0Gbps, 12.0Gbps
    Drive Authentication Status: OK
    Carrier Application Version: 11
    Carrier Bootloader Version: 6
    Sanitize Erase Supported: True
    Sanitize Estimated Max Erase Time: 1 hour(s), 55 minute(s)
    Unrestricted Sanitize Supported: True
    Shingled Magnetic Recording Support: None
    Drive Unique ID: 5000C5009A62B003
    Self Encrypting Drive: False

    physicaldrive 1I:3:4
    Port: 1I
    Box: 3
    Bay: 4
    Status: OK
    Drive Type: Data Drive
    Interface Type: SAS
    Size: 1.2 TB
    Drive exposed to OS: False
    Logical/Physical Block Size: 512/512
    Rotational Speed: 10500
    Firmware Revision: HPD4
    Serial Number: WFK54QEN
    WWID: 5000C5009A62AFA1
    Model: HPE EG001200JWJNQ
    Current Temperature (C): 51
    Maximum Temperature (C): 55
    PHY Count: 2
    PHY Transfer Rate: 12.0Gbps, Unknown
    PHY Physical Link Rate: 12.0Gbps, Unknown
    PHY Maximum Link Rate: 12.0Gbps, 12.0Gbps
    Drive Authentication Status: OK
    Carrier Application Version: 11
    Carrier Bootloader Version: 6
    Sanitize Erase Supported: True
    Sanitize Estimated Max Erase Time: 1 hour(s), 55 minute(s)
    Unrestricted Sanitize Supported: True
    Shingled Magnetic Recording Support: None
    Drive Unique ID: 5000C5009A62AFA3
    Self Encrypting Drive: False


    SEP (Vendor ID HPE, Model Smart Adapter) 379
    Device Number: 379
    Firmware Version: 4.11
    WWID: 51402EC0144D1C28
    Vendor ID: HPE
    Model: Smart Adapter

     


    [root@ESX7:~] /opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show status

    HPE Smart Array P408i-a SR Gen10 in Slot 0 (Embedded)
    Controller Status: OK
    Cache Status: OK
    Battery/Capacitor Status: OK

    Attachment(s)

    txt
    ESXiAnalyse.txt   12 KB 1 version


  • 14.  RE: Slow acces to datastore on HP ProLiant ML350 Gen10

    Posted Jan 14, 2022 12:29 PM

    Hi Enrique,

     

    i have uploaded the logs. Is there anything suspicious.

    Kind Regards
    Sardar



  • 15.  RE: Slow acces to datastore on HP ProLiant ML350 Gen10

    Posted Jan 14, 2022 01:30 PM

    Hi Sardar,

     

    looks like you still have an issue here.

    I created a list of questions which should allow us to get a better picture of the current situation and also some recommendations what should be checked now.

    • You mentioned "the iSCSI datastores are still available".
      So the problem only exists on the "local" VMFS datastores and not those accessed via iSCSI?
    • Cloning VMs might use VAAI commands, did you check if VAAI is used and supported by the datastore in question?
    • Is such task the only "trigger" for the performance problem or will the problem also appear to happen unexpected?
    • What about the reported high temps, 50++ (51 up to 68) Celsius seems to be way to high.

    • Check the vmkwarning, vmkernel and vobd logs if the report problems related to
      • 600508B1001C35C6E76289131F3BF536  (the naa identifier for the 2.18 TB datastore)
      • search for performance in vobd.log to figure out if datastore performance issues are reported 
    • Check the scsi stats of the controller used to access the datastore