VMware vSphere

 View Only
Expand all | Collapse all

Slow Read/Write Performance over iSCSI SAN.

  • 1.  Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 26, 2010 03:27 AM

    This is a new setup of ESXi 4.0 running VMs off of a Cybernetics miSAN D iSCSI SAN.

    Doing a high data read test on a VM, it took 8 minutes vs 1.5 minutes on the

    the same VM located on a slower VMWare Server 1.0 Host with the VMs located on

    local disk. I'm watching my read speeds from the SAN, and

    it's getting just over 3MB/s max read, and Disk Usage on the VM matches at just over 3MB/s....horribly slow.

    The server and SAN are both connected to the same 1GB Switch. I have followed this guide

    virtualgeek.typepad.com/virtual_geek/2009/09/a-multivendor-post-on-using-iscsi-w

    ith-vmware-vsphere.html

    to get multipathing setup properly, but I'm still not getting good

    performance with my VMs. I know the SAN and the network

    should be able to handle over 100MB/s, but I'm just not getting

    it. I have two GB NICs on the SAN multipathed to two GB NICs on the

    ESXi Host. One NIC per VMkernel. Is there something else I can check

    or do to improve my speed? Thanks in advance for any tips.



  • 2.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 26, 2010 08:51 AM

    My advise would be to simplify everything as much as possible before attempting multipath. Use a single NIC (at both ends), disable jumbo frames throughout, create a new LUN and test that. As you say, a single GigE should be good for about 110MB/s anyway.

    Please award points to any useful answer.



  • 3.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 26, 2010 12:31 PM

    That SAN hardware is certified for Vmware so

    get your support to look into it. Common causes for bad performance are

    overload the interface of the SAN hardware, because if you have

    multiple connections to the same SAN not all can be served at the

    maximum speed.

    Also your local disk will always be faster than your SAN in your

    setup, because even a SATA disk will have a maximum of 3Gb/s bandwidth,

    so your SAN will never match the speed of your local disks. You

    probably are also using ethernet instead of fibre which is also not

    help performance.

    You use a SAN not only because of the speed, but to have a central

    managed place where you can put all your importante data and make sure

    a suitable RAID level is being applied. There are also certain features

    like replication which is one of the advantages of having a SAN.

    from http://serverfault.com/questions/106352/slow-read-write-performance-over-iscsi-san

    Starwind Software Developer

    www.starwindsoftware.com



  • 4.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 26, 2010 12:37 PM

    Ok, thanks. I'll try some of those things. Is there a best practice for testing data throughput between a SAN and Host?



  • 5.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 26, 2010 12:51 PM


  • 6.  RE: Slow Read/Write Performance over iSCSI SAN.
    Best Answer

    Posted Jan 26, 2010 01:57 PM

    Another vote for IOMeter.

    Try test 32K 100% sequential read (and then write) with 64 outstanding IOs, this will give sequential performance. Should be near to 100MB/s per active GigE path, depending on how much the storage system can push out.

    Then 32K 0% sequential read (and then write) with 64 outstanding IOs against a good sized test LUN (say 4GB+) will give a value for IOPS, which is the main driving factor for virtualisation. Look at the latency, this needs to remain below about 50ms usually so you can work out whether the default 32 outstanding IOs (per host) is OK (say you had six hosts, the array would need to be able to deliver random IO with latency <50ms with 192 outstanding IOs (=32*6)).

    Don't use the 'test connect rate' and this effectively tests only cached throughput, which we're not so interested in anyway.

    Please award points to any useful answer.



  • 7.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 26, 2010 03:52 PM


    Multipathing may be causing your issue. Are
    you able, and have you tried, to disable multipathing and just have one
    1Gb connection to your SAN? VMware may be path thrashing when put under
    load because of a bad link, or a delay in packet delivery...


    BTW, your maximum throughput with a 1Gb link is going to be
    ~30MBytes/sec if your SAN and ESXi host were the only two devices on
    that link...

    StarWind Software R&D



  • 8.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 26, 2010 05:36 PM

    Ok, over a single connection I'm getting 117 MB/s read through iometer. I enabled Multi-pathing and now I'm getting

    with 64 outstanding IOs

    135 MB/s 100% Sequential Read - 17ms response time

    132 MB/s 100% Sequential Write - 15ms response

    26 MB/s 100% Random Read - 76ms response

    15 MB/s 100% Random Writes - 140ms response

    Are the random numbers ok?

    I believe I was mistaken with my initial belief. The operation I was performing still isn't using more than 10MB/s, it is 100% read, probably mostly random. Is there any setting that throttles a VMs Disk access speed?

    Off-topic question, should I have the swapfile located on the SAN with the VMs, or on a local 7200 rpm datastore?

    Thanks for all your help, I'm getting a much better picture of what's going on.



  • 9.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 26, 2010 08:57 PM

    Glad things are progressing. Your results show that the array cannot handle that level of outstanding IOs with reasonable latency - I would re-run the random IO tests, reducing the outstanding IOs until the write latency is about 50ms to find this (I guess about 22).

    Then consider how many hosts will be accessing each LUN. If you have two hosts potentially accessing this LUN, you would probably want to limit each of their queues to say 11 or 12 via disk.schednumreqoutstanding in advanced settings in ESX.

    HTH



  • 10.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 26, 2010 09:03 PM

    To add (sorry!) from the numbers I guess this is a RAID-5 array? Assuming this is all in test at the moment, if you can afford the space difference it may be worth reconfiguring the iSCSI device as a RAID-10 and re-running the tests, the difference can be dramatic, especially in writes. As a point, usually random tests are measured with the IOPS number rather than MB/s.

    Re swap files, there are several considerations. Perhaps most importantly, that if vSwap is on local storage, the VM cannot be vMotion'd or HA'd. Also vSwap is really a last resort for an ESX box short of physical RAM, I would make mighty sure it is never normally used since the massive disk IO will absolutely destroy performance of everything running. This is probably why it's always recommended to allocate only the minimum RAM to a VM that is needed etc. Also make sure vmware Tools are installed, since the balloon driver provides ESX with a much more gentle mechanism to get RAM back from VMs (using the running OS's native paging capabilities).

    Again, HTH :smileyhappy:



  • 11.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 26, 2010 10:49 PM

    You're good Jimbo....22 IOs brought it right to 50ms. It is Raid 5, and I unfortunately can't sacrifice the space. Perhaps next year I'll get new drives and Raid 10 it. It will be one host accessing it at a time, with about 6 VMs on it. I have another host that will be able to access it, but in most cases it won't unless the main server goes down for maintenance. Do you think this performance will be ok for 5 or 6 VMs on a single 350GB LUN? The VMs will be a Web Server, Accounting Server, File Server, and a couple very low resource hungry servers. In hind site, this may not have been a good plan, but the File Server will have it's files located on the same SAN, but a different 1.5TB LUN. I'm stuck doing it this way for now, so hopefully it's not horribly slow.



  • 12.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 27, 2010 12:33 AM

    Nothing in your list looks to be overly IO intensive. If you don't mention databases or Exchange servers, it's a stretch to suggest RAID5 will perform too poorly for a given scenario. It's likely that if there's an issue, it's somewhere else.

    What sort of switch are you using? Are your server's NICs dedicated to iSCSI and are they on a separated VLAN/physical network to your data LAN?

    Edit: You mentioned using the same switch so they obviously aren't physically separated. If you haven't separated the VLANs, I would suggest looking there.



  • 13.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 27, 2010 03:27 AM

    We use several MySQL databases for Web Programs, but only the main website database is heavily trafficed...and I use the word 'heavily' loosley.

    I'm using an HP Procurve 1400 24 port GigE switch, unmanaged. I have two NICs dedicated to the iSCSI, both on separate subnets plugged into the same switch, and two NICs on the iSCSI with subnets to match each of those. The only thing between the Host and the iSCSI is that one switch, and the only thing on those two subnets are those 4 NICs. However, that switch also passes packets for my data lan which is on a different subnet all together. I could get a separate 8 port GigE switch just for the Host and SAN if you think the 24 port switch may slow things down since it's used for my data lan as well. There are only 7 or 8 total things plugged into that switch, most of our data traffic happens in the building across the way. VLANs are something I don't fully understand, is there a good write-up on them somewhere? Thanks for your suggestions



  • 14.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 27, 2010 03:52 AM

    yeah i agree. any other new alternative solutions out there?



  • 15.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 27, 2010 04:03 AM

    yeah i agree. any other new alternative solutions out there?

    What exactly are you agreeing with? That the original poster has a performance issue?

    Edit: An alternative to iSCSI?

    Sure, Fibre and 10GbE iSCSI. The former having been around for a while.

    Message was edited by: Josh26



  • 16.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 27, 2010 03:53 AM

    What you are describing is a network where both data and iSCSI flows over one network.

    I can't say this will surely be your problem, but it's definitely not a best practise and something you should look to fix. By placing the storage network on a separate VLAN, you effectively break up your switch into two smaller ones, fixing this issue. However, as you have said it is unmanaged, this will not be an option.

    Utilising a separate switch would be useful, but also be careful here. If a smaller switch happens to be a cheaper switch, you're running into its capabilities. Really you do need something with flow control, jumbo frames and fast backplanes to run iSCSI with performance.



  • 17.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 27, 2010 04:02 AM

    Ok, I'll look into getting a decent separate switch. I read that ESXi doesn't support Jumbo frames, so I disabled them on my SAN. I would love to be able to use them though if I'm mistaken.



  • 18.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 27, 2010 04:05 AM

    Ok, I'll look into getting a decent separate switch. I read that ESXi doesn't support Jumbo frames, so I disabled them on my SAN. I would love to be able to use them though if I'm mistaken.

    Refer here:

    http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1012454

    ESXi free does not support Jumbo Frames - although you can actually enable them during your initial evaluation period and have them stay enabled.

    That said, with the network you describe, you would benefit from a purchase version of ESXi, and any edition will provide this support. Jumbo Frames make a world of difference to performacne with iSCSI.



  • 19.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 27, 2010 04:11 AM

    Thanks for your help Josh. I'll look into the pricing on it.

    Would something like the Netgear JGS516 GigE w/ 2mb Buffer, Jumbo Frames, Flow Control, unmanaged be good enough? It's about $170 on Newegg.



  • 20.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 27, 2010 05:38 AM

    I'd always suggest looking at HP, Juniper or Cisco, but if budget's involved that's probably a reasonable choice. Just be sure to research whether those options you mention below require turning on manually.



  • 21.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 27, 2010 08:37 AM

    Agree the network should be physically split to isolate iSCSI but[/b] the primary reason is security, besides anything else. Anyone with physical access to a network that has iSCSI traffic on it - mear presence of which is easily sniffed - can, through an arp poisening attack (which is simple in the scale of things), reassemble the entire volume, given enough time, for offline analysis at their leisure.

    I would personally get the very best switches (pair thereof) that you can get budget for. As said, HP, Juniper, Force10 and of course Cisco are the brands to look at (but steer well clear of 'LinkSys by Cisco'). That said D-Link do have some OK switches if you're on a budget.

    Re Jumbo frames - I wouldn't personally get too excited about them. Yes it will increase throughput, but we're talking the last few points here and as has been demonstrated this particular iSCSI device doesn't come close to saturating GigE with random workloads anyway.

    HTH



  • 22.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 27, 2010 10:00 PM

    Security aside, I would still suggest performance is a reason to do so. Referring to your brand statements, if a user's budget doesn't get them into the Cisco type area, your backplane will rarely manage wirespeed on all ports. The end result is usually that an application utilising the network hard starts to impact iSCSI performance.

    In other words, VLANs on a quality switch for security. For any other switch, a separate physical switch does more than security.

    Re Jumbo frames - I wouldn't personally get too excited about them. Yes it will increase throughput, but we're talking the last few points here and as has been demonstrated this particular iSCSI device doesn't come close to saturating GigE with random workloads anyway.

    The poster wasn't utilising GigE if I read correctly. As Gigabit speeds, there's a significant difference in my experience.

    HTH

    Agree the network should be physically split to isolate iSCSI but[/b] the primary reason is security, besides anything else. Anyone with physical access to a network that has iSCSI traffic on it - mear presence of which is easily sniffed - can, through an arp poisening attack (which is simple in the scale of things), reassemble the entire volume, given enough time, for offline analysis at their leisure.



  • 23.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 28, 2010 07:01 PM

    Everything is connected via Gigabit. The host and scsi just aren't on their OWN GigE switch. It's on a switch being shared with my data network. I'm getting a switch in to dedicate to them though.

    I ran IOMeter on two VMs, while converting a machine on the host, and it seemed to really slow the other servers down. I'm getting about 5MB/s on each computer with 100% Random Writes and 11 Outstanding IOs. I set the QueueDepth to 32, and it seemed to help some. Since my Outstanding IOs is so low with a lot of Random IO, what exactly is my bottleneck? Is it the SAN? Would changing the RAID from 5 to 10 help or is the controller that's slowing things down? Would Jumbo Frames help with this as it would be sending fewer packets? Thanks for your advice.



  • 24.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 28, 2010 08:40 PM

    Increasing queue depth might help the overall throughput but it is always at the expense of response time.

    Converting to RAID-10 will certainly improve the throughput for write intensive operations (random write IO is twice as fast). It's a question of whether this type of workload is representative of anything like what your production VMs will be doing, coupled with whether you can afford the space reduction of RAID-10.

    Consider also how much throughput is actually needed - 5MB/s with 8K block is still 600 IOPS, which is about the same as a 15k SAS mirror in that respect.

    If this is a SATA based array, RAID-10 is however a lot[/b] safer than RAID-5, since the rebuild times on RAID-5 can really drag on with large SATA based arrays, increasing the risk of a second failure (which would obviously be catastrophic). By contrast rebuilding raid-10 is a straight disk copy.

    More thoughts to chew on!

    Cheers



  • 25.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 28, 2010 08:43 PM

    Sorry in answer to your two questions, the bottleneck is the physical disk throughput and jumbo will make no difference unfortunately.



  • 26.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 27, 2010 03:52 AM

    i am looking for new stuffs out there that improve this isci san thing. any ideas?



  • 27.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 27, 2010 10:00 AM

    Which load-balancing policy are you using for your iSCSI NICs - in almost every situation one device talking to another device - each using any of the forms of bonding/etherchannel/teaming/whatever will only ever use one of however many links are available to talk - i.e. never more than 1Gbps, which by the way is almost never fully available, especially with iSCSI due to all the encapsulation involved.



  • 28.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 27, 2010 11:57 AM

    I think I'll get a pair of HP Procurve 1810G Managed switches, small buffer cache, but has all the other goodies needed, and within my budget.

    Regarding Load Balancing, I have two VMkernel's with one nic dedicated to each purely for the iSCSI. The multi-pathing I'm using is Round-Robin...if that's what you mean?



  • 29.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 27, 2010 02:29 PM

    By default with round-robin, ESX will send 1000 comands down one channel before moving to the next. In practice, as stated above, this means that the bandwidth of only one channel is available per VM. However, this can be adjusted using the esxcli nmp roundrobin --setconfig command, as detailed here.

    But I think not all storage systems will support this. Also I don't think your configuration is correct for that either - both vmk's and iSCSI target interfaces would need to be on the same IP subnet and a high-capacity trunk (say 4-port LAG) created between the two switches. I'm guessing, since it is configured with different subnets, that this manufacturer doesn't support this.

    Please award points to any useful answer.



  • 30.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 27, 2010 03:17 PM

    On VMware Compatibility list, it says the Model I have supports Active/Active Mode, with Fixed Paths, and there's a note that with ESX 4.0 it may support Round Robin depending on the vendor's config. I have the Cybernetics miSAN D8. I'm asking them if it can be configured that way, from what you said it sounds like it can't be. I'm not clear as to what is better though, when I enable Round robin, half the traffic is going down one NIC and half down the other. When I set it to Fixed, all the traffic is going down one pipe with the same total throughput. I'm not really gaining or losing anything either way. What does this mean?



  • 31.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 27, 2010 04:09 PM

    With round-robin as you have it, 1000 iSCSI commands are sent on one path, then the next 1000 via the other path, and so on. Viewed with some load with the ESX default sample window of 20 seconds, yes the paths will appear balanced. But there is practically no concurrency other than fulfilment of outstanding IOs when the paths are switched.

    By changing IOPS to 1 (instead of 1000), each alernate command will be routed through a different path and hence the overall bandwidth is agregated, at the expense of some CPU effort at each end. Depending on the array's sequential speed you may see significantly more than 110MB/s throughput with IOMeter in this case.



  • 32.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 27, 2010 05:39 PM

    Wow...I set the IOPS to 1, and I'm getting 215MB/s Sequential Read Throughput with 9ms response. Write's are about the same, and Random R/W are also about the same. So I guess as long as that's not too taxing on my CPUs, then I should leave it there?



  • 33.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 27, 2010 05:46 PM

    Yeah, you could try backing it off a bit, EquaLogic suggest a value of 3 for example, to reduce CPU hit.

    One note - there is a bug at the moment as pointed out by depping with ESX, when the host restarts this tweak is lost. There is an automation script example on this thread which could be applied at host start up.

    From your results, it seems that my comments about subnet etc were incorrect in your example, sorry about that.



  • 34.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 27, 2010 05:49 PM

    This is great, so should I be content with my performance now, or is there anything else I should try to tweak (aside from getting the dedicated switches put in)?



  • 35.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 27, 2010 08:08 PM

    The only thing left to check really is how the array deals with serious contention from competing VMs.

    You can test by cloning your test VM several times and setting of IOMeter in each - with different workloads if you like - whilst also performing an intensive host operation like a VM clone. When test concurrency, divide the IOMeter outstanding IOs accordingly - four workloads, 1/4 of the 50ms "limit" on each.

    ESX has dynamic queue depth throttling which is enabled by advanced setting disk.qfullsamplesize to something non-zero. It is only actually "supported" on a few storage arrays, but it does work even on local storage like the Perc array controllers. However whether it will actually offer you any advantage is something to test.

    Note also that if you have multiple hosts accessing the array in the future, all of them must[/b] use the exact same settings on this, otherwise hosts without the dynamic control will absorb all available bandwidth.

    HTH



  • 36.  RE: Slow Read/Write Performance over iSCSI SAN.

    Posted Jan 27, 2010 08:35 PM

    Awesome. I'll do those tests and see how it performs. Thanks for all your help!