VMware vSphere

 View Only
  • 1.  High Disk Queue Length on SQL2005

    Posted Sep 09, 2008 10:37 PM

    We have a Virtual SQL2005 server on Windows 2003 which is getting very high Average Disk Queue Lengths and the database drive. It can spike up to 450, and sit around 50-100 for several minutes. Just wondering if anyone has come across this and has any ideas.

    Host Machine: HP Proliant BL480c G1, 8CPU E5320 @ 1.86Ghz, 32Gb RAM

    Virtual Machine: 2003 Standard 32bit, 2vCPU, 3.5Gb RAM

    Database drive is a SAN, fibre attached LUN, 3 disk RAID 5 50Gb. Separate physical LOG and OS drives which perform normally

    Database is 8Gb and no other databases or logs reside on this drive. has a peak throughput of 10MB/s.

    CPU and Memory usage low.

    VMWare Tools is installed.

    Any suggestions appreciated



  • 2.  RE: High Disk Queue Length on SQL2005

    Posted Sep 10, 2008 03:29 AM

    First of all, you should look at the disk

    latency before wasting any time on average disk queue length in a SAN

    environment. But it really does depend on what you are trying to figure

    out. You'll see why in a moment.

    The
    explanation of this counter (from Perfmon) is "Avg. Disk Queue Length
    is the average number of both read and write requests that were queued
    for the selected disk during the sample interval." This is either a
    physical disk or logical disk counter, so the number that you get
    depends on how the underlying storage is presented to the operating
    system.




    f the Avg. Disk Queue Length says 10,
    then the OS queued 10 I/Os to the disk array. Theoretically, that could
    be one I/O to each of the 10 mirrored sets in the stripe, or it could
    have been 10 I/Os all to the one disk. You have no way of knowing which
    it was.

    This is where the
    missing information comes in. Let's assume your stripe size is 64KB,
    your write size is 8KB, and you're going to do a whole chunk of
    sequential writes. This is a typical scenario for SQL Server storage
    activity. In this case, there is a good chance that eight of the I/Os
    all went to the first disk, and the next two I/Os went to the next
    disk. So, if you're trying to work out the disk queue length per disk
    for this scenario, it is 8 for the first disk, 2 for the second disk,
    and 0 for the remaining eight disks in the array.


    Now
    let's change the scenario to a theoretical stripe size of 8KB and a
    write block size of 64KB-and keep the disk queue length at 10. In this
    case, each 64KB block gets spread over 8 disks, so one I/O is written
    to 8 disks and the 10 queued I/Os are spread across 80 disk writes over
    all 10 disks in the array. If you're trying to calculate the disk queue
    length per disk in the array, it's going to be 8 for each disk in the
    array.


    Let's be realistic
    and add another level of uncertainty into the picture. In most cases,
    your SAN storage is going to be connected to the server using one or
    more HBAs in the server, a number of fibers to connect the HBA to the
    SAN, a number of ports on the front end of the SAN, and perhaps some
    kind of fiber switch in the fabric between the server and the SAN. Then
    we get into the internal architecture of the buses inside the SAN, and
    how the disks are connected to the ports on the front of the SAN.


    Any
    queuing you are seeing reported in Perfmon can be a symptom of high
    latency or queuing at any of these points between where the OS measures
    its disk queue length and the surface of the disks. This is why you
    should look at the latency and base any decisions on that counter
    rather than the average disk queue length.




    http://technet.microsoft.com/en-us/magazine/cc162466.aspx



  • 3.  RE: High Disk Queue Length on SQL2005

    Posted Sep 10, 2008 03:39 AM

    A statement was published many years ago that sustained disk queue length greater than 2 is an indication of an I/O bottleneck. This statement is still true if the application is not designed to handle the situation. SQL Server is designed to push disk queue lengths above 2 when it is appropriate.

    SQL Server uses async I/O to help maximize resource usage. SQL Server understands that it can hand off an I/O request to the I/O subsystem and continue with other activity. Let’s look an example of this.

    SQL Server checkpoint posts up to 100 I/O requests and monitors the I/O response time in order to properly throttle checkpoint impact. When the I/O response time exceeds the target the number of I/Os is throttled. The disk queue length can easily exceed 2 and not be an indication of a subsystem problem. SQL Server is attempting to maximize the I/O channel.

    SQL Server does read ahead or tempdb spooling actions. The workers seldom directly wait for the I/O to complete. Instead the workers continue on with other actives taking advantage of CPU processing power instead of waiting for the I/O response. You would not want SQL Server to read a page, add it to a sort, and write the sort page during a large sort. Instead SQL Server will post the sort I/O in motion and return to the next read page and continue sorting. The sort operations track the outstanding I/Os and check completion status at later intervals.

    The next time someone looks at the disk counters on the system and states the disk queue length is above 2 and this is a problem take a minute to look at other data points. The key for SQL Server is the I/O response times. Start by looking at the Average Disk Seconds Per Transfer for the same I/O path. Then look at the SQL Server sysprocesses and waitstats for buffer based I/O wait information. You want to see evidence from SQL Server that the disk queue length is related to an I/O bottleneck. Don’t just take the depth of 2 at face value.



  • 4.  RE: High Disk Queue Length on SQL2005

    Posted Sep 10, 2008 04:31 AM

    Many thanks for your reply, some good reading there and somewhat away from my core skillset.

    The Latency is also high 400ms Average Disk sec/Transfer during heavy times, I was reading it should be below 50ms.

    Stripe size is 128KB the default.

    Maybe this vmware forum is now not the best place for this.



  • 5.  RE: High Disk Queue Length on SQL2005

    Posted Sep 11, 2008 10:30 PM

    I changed the array from a 3 disk RAID5 to a 6 disk RAID1 it the latency and queue length became normal again.

    I've read conflicting opinions on RAID configurations for SQL DBs, but this resolved the issue.

    Thanks for your help RParker.



  • 6.  RE: High Disk Queue Length on SQL2005

    Posted Sep 17, 2008 09:53 AM

    I've recently come accross this issue in the physical world - depending on the type of database you are running , best practices for SQL are to have RAID 1 storage for Log files , tempdb and for sequential write heavy (OLTP type) environments. If you are running a more query based workload (OLAP) then RAID 5 seems to be the better option ( also the Raid 5 Disk group on our EVA's is much larger so actually runs faster than the Raid 1 group )