vSphere Storage Appliance

 View Only
Expand all | Collapse all

Need help with performance issues with HP EVA 4400

  • 1.  Need help with performance issues with HP EVA 4400

    Posted Jan 17, 2011 01:54 PM

    Hi,

    we currently have the following EVA 4400 configuration which is used for both a DWH system and our VMware vSphere envrionment:

    - EVA 4400 (09534000), 4 enclosures, 32x 10k FC 300GB disks, 1 disk group
    - attached to two independet fabrics (2x HP StorageWorks 8/8 SAN switches)

    The EVA is accessed by the following hosts:

    - DWH: 2x DL380 G6 (W2K8 SP2, 1x QLogic FC1242SR)

    Each server accesses two Vdisks on the EVA, one RAID5 for MSSQL data, one RAID1 for MSSQL transaction logs. All four Vdisks are presented to Controller 1 and there are four paths to each LUN. However, the servers are told via HP MPIO DSM Manager to only use Controller 1.

    - VMware: 3x DL380 G6 (ESXi 4.1, 2x QLogic FC1142SR)

    The ESXi servers access four RAID5 Vdisks. All Vdisks are presented to Controller 2 and there are four paths to each LUN. The path policy on the ESXi hosts is set to Round Robin which, as of ALUA, only chooses the two paths to Controller 2 as active paths. Each vDisk holds 6-8 virtual machines (mainly Windows), overall 27.

    From time to time we are having troubles with slow response times of virtual machines. In fact, this happens everytime the DWH servers are generating - what looks to me - heavy load on the EVA. As I am no expert in debugging storage performance, I am not sure whether it really is heavy load, but at least when looking at EVAperf during thoses times I can see the following:

    http://www.abload.de/image.php?img=evaperf_12o78.png
    http://www.abload.de/image.php?img=evaperf_2qo46.png

    As you can see, compared to the VMware Vdisks there is much I/O on one DWH Vdisk. One the second screen you can see that there is much load on Controller 1 which is serving the DWH Vdisks. Controller 2, which is used for VMware vDisks only, is nearly idle compared to Controller 1.

    Even though that Controller 1 is nearly idle and that there is not much I/O happening on the VMware Vdisks, all VMs feel extremely sluggish during these times. I.e. when working via RDP on a Windows VM, it feels like you were working on a computer that has a virus scanner running and therefore is slowing the hard disk down - opening the control panel for example takes seconds and you can watch every single icon appear slowly. The impact on end user applications running in these VMs is noticable (though not in all cases) but not a critical issue so far.

    The problem affects all VMs that are stored on the EVA. VMs running on the same hosts but stored on an MSA2312fc are not affected and continue to run fine. The problem disappears immediately when the I/O on the DWH Vdisks lowers.

    During the problem case, the disk latency of an ESXi host alternates between 10 and 50ms and is higher than usual (between 10 and 15ms):

    http://www.abload.de/image.php?img=esx_1zhd2.png

    Is there a way to debug this deeper to find out what exactly is limiting here? As I have written above, I am no expert in measuring and analyzing storage performance. But according to my tests the problems are definitely caused by the storage system.

    Maybe 32 spindles are just to less to serve 27 (even though low utilized) VMs and one fully loaded DWH system?

    Any suggestions are highly appreciated!
    Michael



  • 2.  RE: Need help with performance issues with HP EVA 4400

    Posted Jan 21, 2011 09:32 AM

    You can compare your performance with this white paper: [http://h20195.www2.hp.com/V2/GetPDF.aspx/4AA1-8473ENW.pdf]

    In Figure 2 you can see the performance of EVA4400 with  96 Disks with 15 K, so you must divide the values at least by 3, because you only have 32 Disks with 10 K. Your DWA is making 50% read and 50 % write IO's with a block size of 64 KB. All your IO's are "misses", so you have no sequentiell read, all reads are random and so you don't have any profit from the cache. (your Prefetch rate is 0). The performance in the grafic of Figure 2 is for 60 % read and 40 % write with 8 KB IO-size and RAID 5. Because of the greater IO-size and the higher write ratio in your DWA you can't reach a third of this values. (you are also using only 10 K devices, which makes it even worse). So I'm sure, this is the maximum your EVA can do with this configuration.

    If you buy more Spindels, may be your DWA would run even faster, but your problem will be the same. So I would try to reduce the queue-depth of the FibreChannel adapter from your DWA-hosts to a very low value (8 or 4, may be try it with 1) to reduce the IO load. Than you can hope, that the VMware environment would get more ressources and perform much better. As you can see in the white paper, the latencies grow very fast at a specific point of load. You can also try to set the speed of the Fibre channel adapter from your DWA to 1 GBit to reduce the load.

    For a long term solution I would virtualize the DWA, then you can reduce the IO-load with the new feature in V4.1 (Storage IO Control). You can than set any IO-Limit even for a RAW-device and so you can reduce the load to this point, where the DWA don't have any performance penalties for your other applications. (But you need a Enterprise Plus licenses for this feature).

    Hope you have success and may be this can help you a little bit.

    Paul



  • 3.  RE: Need help with performance issues with HP EVA 4400

    Posted Jan 21, 2011 12:40 PM

    Thanks a lot!

    So bottom line: Either throttle the DWH systems (quick and dirty way) or create a second disk group. However, we only got 16 free slots left in the EVA and it is questionable if 16 additional disks would deliver enough performance for both DWH systems. The other option would be to additionally ungroup disks from the current group and add them to the new one, so we would gain two groups with 24 disks. But as the vSphere environment grows quickly I am not sure if this will lead to similar performance problems when more virtual machines are in place. Another option would be to buy additional shelfs and more disks, what however would quite expensive!

    Maybe we should go for another MSA2312fc, completely move the DWH systems to it and keep the EVA just for VMware. With the MSA, we could create two conventional RAID5 arrays of i.e. 12 disks each for each DWH system.

    I'm not sure at this point, guess I will have to talk to the DWH team. Maybe they don't need that much performance at all and we can go for that another 16 disk DG or even with local storage in the DL380 G6...(hey, that would be great!).

    Virtualizing the DWH systems is not possible, as they are both equipped with 64 GB RAM each (and utilize it!) and the ESXi servers only have 64 GB as well.

    I have a few additional (for me important) questions:

    1. Are the total IOPS (referenced as Throughput IO/s in the white paper) calculated by adding the Write Req/s and Read Miss Req/s values in EVAperf?

    2. Does the growing latency in the white paper refer to the Read Miss Latency (ms) and Write Latency (ms) values in EVAperf? As can be seen in the screenshot, there is one VMware LUN with 40ms Read Miss Latency (ms) and one with 80ms. I guess this is an indicator for the poor performance?

    3. As I have written, I am not an expert in measuring and analyzing storage performance. Actually, I'm confronted the first time with it. Is there a good document what understandably describes what the impact of changing the queue depth is?

    Thanks

    Michael



  • 4.  RE: Need help with performance issues with HP EVA 4400

    Broadcom Employee
    Posted Jan 21, 2011 06:11 PM

    Why not add additional disks and create a second diskgroup and move those DWH disks to those so at least they aren't hitting the same spindles.

    Duncan (VCDX)

    Available now on Amazon: vSphere 4.1 HA and DRS technical deepdive



  • 5.  RE: Need help with performance issues with HP EVA 4400

    Posted Jan 21, 2011 06:45 PM

    Thanks Duncan, as written above, I am considering this but am not sure yet. Need to discuss this with the DWH team first, maybe moving the DWH databases to local storage of the DWH servers might really be an option, as aside from a performance point of view there is no need for them to reside in the EVA. Investing in local storage is cheaper than buying additional drives for the EVA (even more when additional enclosures are needed as well).

    Regarding Storage I/O Control (seems to have been the content of the post before you edited it) I actually have read your article this morning, but unfortunately this isn't an option for us as we have vSphere Advanced.

    I guess best practice would really be to have the EVA dedicated for the vSphere environment, especially when considering the growth of the environment (it is growing faster than expected in the first place).



  • 6.  RE: Need help with performance issues with HP EVA 4400

    Posted Jan 24, 2011 10:42 AM

    Making 2 Diskgroups is of course a way to separate the load, but with such a low number of disks it's a waste of capacity and in my opinion not a best practice on a EVA with only 32 equal sized disks.

    1. Are the total IOPS (referenced as Throughput IO/s in the white paper) calculated by adding the Write Req/s and Read Miss Req/s values in EVAperf?

         Yes, in the graf you will see the sum of 60% reads and 40% writes (labeled as OLTP)

    2. Does the growing latency in the white paper refer to the Read Miss Latency (ms) and Write Latency (ms) values in EVAperf? As can be seen in the screenshot, there is one VMware LUN with 40ms Read Miss Latency (ms) and one with 80ms. I guess this is an indicator for the poor performance?

         Yes, you are absolut right. Every latency above 10 ms on the EVA indicates a overload situation.

    3. As I have written, I am not an expert in measuring and analyzing storage performance. Actually, I'm confronted the first time with it. Is there a good document what understandably describes what the impact of changing the queue depth is?

         There exist some dokumentation of the Storage IO-Control feature, which works with reducing the queue depth. But you can also set an IOPS-Limit for an explicit disk to every value you want. So in your example you could set the IOPS value for your DWH-Disks to maybe 500, than your DWH's can't do more load than this. Than you can tune it to the value you need, so you can satisfy the requirements for the runtime of the DWH-Application and the performance of your Vmware environment. The value can very easily be changed online and you can vary it for special purposes, such as backup-windows or something else. So I would compare the costs of virtualizing the DWH-Servers (maybe with an memory upgrade) with the costs of more disk-devices, than you can decide what is best for you.

    Here are the links for some dokumentation for SIOC:

    [http://www.vmware.com/files/pdf/techpaper/VMW-vSphere41-SIOC.pdf]

    [http://www.vmware.com/files/pdf/techpaper/vsp_41_perf_SIOC.pdf]

    Paul



  • 7.  RE: Need help with performance issues with HP EVA 4400

    Posted Jan 24, 2011 11:25 AM

    Thanks Paul, your information is very helpful! If possible, I have one more question.

    In figure 2 of the EVA performance whitepaper I can see that the 96x 15k disk configuration reaches around 10.000 IOPS at 15ms response time with RAID 5 and 8K 60% read / 40% write random transfers. In my scenario (32x 10k disks), the DWH systems are generating about 3.000 IOPS at about 18ms response time with RAID5 and 64k 50% read / 50% write random transfers. This then results in less than 1/3 of the 96k disk performance and indicates that we're just maxed out what you have written as well).

    So my (general question) is...is this the correct way to compare the EVA results to the one in the white paper? I.e. always add all IOPS, then compare to the values at the same response time? And the 64K block size you mentioned, I guess this is because we are using Windows?

    One last thing: You wrote that I can limit a disk to given value of IOPS. I guess you are referring to a setting in Windows? I don't see any options in Command View that let me to this for a given Vdisk!

    Thanks!



  • 8.  RE: Need help with performance issues with HP EVA 4400

    Posted Jan 24, 2011 12:02 PM

    So my (general question) is...is this the correct way to compare the EVA results to the one in the white paper?

         Yes, the performance of the EVA4000 is linear to the number of disks you have. Your EVA can't do anymore, your performance data are very good.

    I think the 64 KB blocksize may come form the DB-Application, it's the default for SQL-Server, (don't know, what you are using). You reach nearly 1/3 of the maximum IO-number with a 8 time bigger IO-size, so you have reached the limit of this configuration.

    The IOPS - Limit is a new feature in VMware V4.1 with Enterprise-Plus license. The EVA and also Windows do not have a option to reduce the number of IO's. The only way to reduce IO is the Queue-depth in the fibre channel adapter. But you can't control this very much, so I would prefere the virtualization of the DWH-application, although I think you are not a fan for this solution... But this new feature is really great and you can save a lot of money, when you have the option to reduce the IO's exact to the value you want. Every time you buy more disks, your application gets faster. But do you really want this? In many cases it's no problem, when a DWH-Application runs for a few hours. With more disks it runs faster, and you have the same performance problem as before with your other applications. So the IO-Limit is a very powerful feature to work against this problem and instead of buying more disks you can tune the runtime of your IO-intensive applications.

    Paul



  • 9.  RE: Need help with performance issues with HP EVA 4400

    Posted Jan 24, 2011 12:27 PM

    Thanks again! I thought you were assuming the 64K block size? I can't see anything indicating the block size in EVAperf?



  • 10.  RE: Need help with performance issues with HP EVA 4400

    Posted Jan 24, 2011 01:54 PM

    You see 1411 IO's which makes 92,5 MB/s, so deviding these numbers gives me the size of 1 IO, which is about 64 KB.



  • 11.  RE: Need help with performance issues with HP EVA 4400

    Posted Jan 24, 2011 02:12 PM

    Thanks, why didn't I think of this! :smileyhappy:



  • 12.  RE: Need help with performance issues with HP EVA 4400

    Posted Jan 26, 2011 11:00 AM

    I've got another question regarding the IOPS values and the influence of the block siz. As discussed, with 32 10k disks we reach 3.000 IOPS at 18 ms response time with RAID 5 and 64k block size, 50% read, 50% write random transfers.

    In figure 2 of the EVA performance whitepaper we can see that the 96x 15k disk configuration reaches around 10.000 IOPS at 15ms response time with RAID 5 and 8K block size 60% read / 40% write random transfers. So with 1/3 of the disks it should be 3.333 IOPS.

    Actually what influence has the block size? I am wondering, because the IOPS values are almost identical, even though we are using a much higher block size (64k instead of 8k). Is there formular to calculate this?



  • 13.  RE: Need help with performance issues with HP EVA 4400

    Posted Jan 28, 2011 08:58 AM

    That's an interesting question, I can't answer it. But you can test it very easily with IO-Meter. I have made a test in my environment with an EVA5000 and an Disk Group with 80 x 140 GB / 15K disk and RAID1 and RAID5 with 16 Outstanding IO's, 50 % Read, 50 % Write and 100 % random access:

    RAID1:

    8   KB:  5888 IO's     46,0 MB/s      2,7 ms latency (EVA CPU 25 %)

    64 KB:  2403 IO's    150,2 MB/s     6,7 ms latency (EVA CPU 28 %)

    RAID5:

    8   KB:  5500 IO's     43,0 MB/s      2,9 ms latency (EVA CPU 29 %)

    64 KB:  1670 IO's    104,4 MB/s     9,6 ms latency (EVA CPU 34 %)

    So with IO-Meter you have your own results in a few minutes and can compare.

    Paul



  • 14.  RE: Need help with performance issues with HP EVA 4400

    Posted Jan 28, 2011 09:25 AM

    Thanks. Are these results shown by EVAperf or by IO-Meter?

    Generally I am really wondering that there is not much information regarding the impact of the block size to IOPS, i.e.:

    http://www.techrepublic.com/blog/datacenter/calculate-iops-in-a-storage-array/2182
    http://blog.aarondelp.com/2009/10/its-now-all-about-iops.html
    http://www.yellow-bricks.com/2009/12/23/iops/
    http://virtuall.eu/download-document/vdi-storage-deep-impact

    All articles mention an amount of IOPS that is possible with a drive and also mention the impact of the RAID write penality. However, let's say a 10k RPM SAS drive really delivers 125 IOPS. But at what block size? 125x8KB would be 1MB/s, whereas 125x64KB are 8MB/s.

    Or is 125 IOPS just the value that "mechanically" is possible?

    So generally, applications which need to read and write small amounts of data (like databases, email systems) will always perform slower and will need more spindles. Applications which read and write larger amounts of data, like fileservers will always perform better and can also be realized with less disks. In addition, if I understand correctly, workloads with small block size always seem to be random workloads, whereas workloads with larger block sizes seem to be sequential workloads?

    But then again, I still don't understand the following:

    Our configuration: 32x 10k rpm disks, 50% read/write, 64KB block size, RAID5 == 2.900 IOPS at 15ms

    EVA best practice whitepaper config: 96x 15k rpm disks, 60%read/40%write, 8KB block size, RAID5 == 10.000 IOPS at 15ms

    1/3 of 10.000 IOPS would be 3.333 IOPS (32 instead of 96 disks).

    Which makes 2.900 IOPS with 64KB block size (our EVA) and 3.333 IOPS with 8KB block size (whitepaper EVA). Shouldn't the EVA configuration in the whitepaper perform much better with 8KB? I mean, it's actually seems to be nearly the same, and the EVA in the whitepaper uses 15k rpm disks and a higher amount of reads!



  • 15.  RE: Need help with performance issues with HP EVA 4400

    Posted Jan 28, 2011 10:33 AM

    Are these results shown by EVAperf or by IO-Meter?

         The results are from IO-Meter, execpt the EVA CPU value is the average value from perfmon.

    Is 125 IOPS just the value that "mechanically" is possible?

         It's a value which is calculated with average seek time and average latency. So it's average and not an exact value. A disk has much more thruput at the beginning and gets slower at the end of the blocks. It's not like a CD-Rom, which rotates even faster to the center of the disk, the rotation speed is fix and so the thruput drops to the end of the disk. So the first volume you create on the EVA will be faster than the later ones. But in fact of the virtualization and the levelling you can't exactly say where your data is on the EVA. But with more data on your disk your performance will generally drop. So you can't compare the values of different EVA's exactly, it always depends what you have done with your EVA before and how much space you have used. But as I have mentioned before, your values are very good and i don't think you can get more out of your configuration. So instead of comparing performance data from different EVA's I would measure your own configuration with IO-Meter, than you have the best results for your decisions.

    Paul



  • 16.  RE: Need help with performance issues with HP EVA 4400

    Posted Jan 21, 2011 11:39 AM

    I would suggest to add more spindles....32 sound a little bit too less. You can create disk groups up to around 120 disks before HP recommends to create a new disk group.

    In a disk group all disks are actively used by the vdisks and the vdisks are spanned over all disks. As creating two disk groups to separate the load and to gain more performance this way you will face a performance loss due to less spindles at the same time. So that might not be the best solution.

    AWo



  • 17.  RE: Need help with performance issues with HP EVA 4400

    Posted Jan 24, 2011 03:13 PM

    I think we are going for the following now:

    - increase the EVA from 32 disks to 40 disks

    - use the additional space to convert our four 512GB RAID5 VMFS LUNs to RAID1

    As can be seen in the HP performance whitepaper, random performance is much much better with RAID1. If this won't help, we will buy eight more drive (going to 48 then) and split up the diskgroups.