VMware vSphere

View Only

Back to discussions

Expand all | Collapse all

NFS performance

1. NFS performance

Recommend
ehall
Posted Oct 29, 2010 04:22 AM

Reply Reply Privately
On my test setup, NFS beats iSCSI by about 10% but it's still not as fast as the back-end infrastructure allows. Iozone with a large test file shows that the local RAID array on the storage server is able to sustain >950 Mb/s of writes and >2.5 Gb/s of reads (all numbers are bits not bytes), while TTCP tests show that the ESXi host and the Linux storage server can push >980 Mb/s of network traffic each direction (they are next to each other in the rack, with a crossover cable connecting unrouted dedicated interfaces for storage traffic).
Using Iozone with somewhat smaller test files (2x the VM memory), opensuse VMs with the VMDK on the NFS volume are able to sustain 400 Mb/s writes and 560 Mb/s reads. That's pretty good, but it's only half of what is available to it.
Worse is that XP/SP3 VMs with the VMDK on the NFS volume are only able to sustain ~240 Mb/s on writes and 420 Mbs/ on reads, or about half of the Linux VMs. If I load up 4 of the XP VMs and run the Iozone tests simultanously, overall throughput only goes back up to the Linux level.
It would seem that I am hitting some kind of limit here. My feeling is that something with the NFS session is preventing better performance but I'm not sure where to begin looking. I am able to run Iozone from the ESXi console against the NFS store but the patterns are very odd and do not jibe with the guest performance data so I'm not sure what's going on there. I am doing more tests before publishing the numbers. Any ideas here? It's not network bandwidth or latency--I'm able to saturate the wire and ping times are 0.3 ms (300 nanoseconds).
Also, are there any tricks for improving the XP VMDK performance on NFS? I would like to get that closer to par with the Linux boxes.
Thanks
2. RE: NFS performance

Recommend
AWo
Posted Oct 29, 2010 09:12 AM

Reply Reply Privately
Is the NFS export set to sync or async? Async is much faster but not so secure regarding writes.

AWo

VCP 3 & 4

\[:o]===\[o:]

=Would you like to have this posting as a ringtone on your cell phone?=
=Send "Posting" to 911 for only $999999,99!=
3. RE: NFS performance

Recommend
ehall
Posted Oct 29, 2010 01:00 PM

Reply Reply Privately
The export is tweaked for performance with "rw,no_root_squash,no_subtree_check,async"
I got curious and mounted the datastore export from inside one of the Linux guests using the vmnic/vswitch data interface, then ran the Iozone tests against that mount point (as opposed to testing the "local drive" performance from vmkerrnel's NFS mount). Writes get 640 Mb/s and reads saturate the wire at 960 Mb/s. This is with no additional tweaking.
Searching for other posts on this topic, I see that I am one of hundreds with this problem. I think at this point it is pretty much proven that the vmkernel has some problems and I am unlikely to get any better numbers. What's interesting is that iSCSI performance is also choked down, so it's not just a problem with the NFS implementatino but instead appears to be some kind of datastore transport limitation.
4. RE: NFS performance

Recommend
RParker
Posted Oct 29, 2010 02:29 PM

Reply Reply Privately
so it's not just a problem with the NFS implementatino but instead appears to be some kind of datastore transport limitation.
I have been saying this for more than 3 years now.. they VM Ware MUST be limiting bandwidth somewhere.
5. RE: NFS performance

Recommend
malaysiavm
Posted Oct 29, 2010 06:04 PM

Reply Reply Privately
I am waiting them to support NFS v4
Craig
vExpert 2009 & 2010
Malaysia VMware Communities -
6. RE: NFS performance

Recommend
RParker
Posted Oct 29, 2010 06:44 PM

Reply Reply Privately
As a Windows tech going way back to early 90's, I have seen many an OS come and go. I have seen many flashes by some OS, but then they just die on the vine, with no reason and they seem to get stalled at some point
OS/2
NextWave
BeOS (yes it may still be here, but not developed)
And a few others I can't remember.
The point is, only one stands out, and I really hate to say it (or prove it) but it appears to be true even NOW.
Windows has stood the test of time, case in point. NFS. Windows supports NFS v4. Windows can host NFS data, we don't see this performance degradation on Hyper-V VM's running even unsupported Redhat (Hyper-V supports SUSE).
So now where are we? People bash Microsoft, for many things, you can say whatever you want, but even Linux has some shortcomings for many things, and when they get a stable OS, they forget the rest of the OS and don't care if it's completely done.
Windows has never stalled, maybe it hasn't been great, had a few black eyes, but it didn't stop MS from getting better. Windows Me and Windows Vista (maybe Dos 4) were horrible OS, but Windows 7 / XP have been the most stable and powerful to date. So when these things don't work in an Enterprise product like ESX I have to question what they are thinking?
Microsoft is just looking for ANY excuse to eat their lunch. Apparently VM Ware is content with that, because like you said.. NFS v4 isn't supported, and why not? It's been out a while. Why are we stuck with 2TB LUN limits.. Windows isn't, LUN yes, but ESX has limits for the number of 2TB LUN it can have.. WHY?!? Windows has very high limits well beyond what ESX can handle, so what I don't understand is what gives? I have to believe they are happy (or complacent with their CURRENT) Virtualization ranking, because it won't last much longer if this keeps up.
I am not a Apple fan (not against it either) but many people complain that there is no VI client for Apple, I am not for or against this argument, but certainly even Microsoft has Apple software, what does that tell you? MS seems to listen to their customers, VM Ware is concerned only with their own agenda, it's becoming more and more clear.
7. RE: NFS performance

Recommend
wobbe98
Posted Oct 29, 2010 08:36 PM

Reply Reply Privately
Perhaps your partitions are not properly alligned.
http://www.vmware.com/pdf/esx3_partition_align.pdf
Or perhaps there is a difference between the windows and the linux version of iozone.
8. RE: NFS performance

Recommend
ehall
Posted Oct 29, 2010 10:26 PM

Reply Reply Privately
The data is on a RAID-10 array that was wholly assigned to LVM. All data is in zero-aligned 4MB block and there are no partitions to mis-align. Linux and Windows VMDK files are read from the same directory tree in the same logical volume on that array.
It's possible that there are differences in the iozone builds (or maybe an error in the Windows port), however the tests don't use Windows application space so much as they report on local disk performance~~files are written and read, and times are recorded~~so it seems unlikely. But it's certainly possible that Iozone under Windows is throwing away half the requests or something like that.
Right now I am exploring different SCSI drivers to see about differences there. I thought maybe the default disk I/O parameters might be causing problems or that there would be some well-known performance tweaks. Thanks for the reply.
9. RE: NFS performance

Recommend
ehall
Posted Oct 30, 2010 11:51 PM

Reply Reply Privately
I captured some of the NFS traffic between ESXi and the storage server during iozone operations, and the most obvious difference is that the XP writes cause ESXi to sync after every 65k of data, while the Linux guest only causes ESXi to sync after 512k. In both cases, the outgoing TCP segments are 4k in length.
I decided to simplify things a bit and copied a 4GB ISO file to the local drives, then used "cat dvd.iso > /dev/null" and "type dvd.iso > NUL" respectively to force the large data file to be read from the VMDK, and captured 1000 packets of NFS traffic from each. What that shows is the Linux guest issues multiple parallel reads for 64k of data, which the NFS server provides to ESXi in VERY large segments (sometimes as much as 57k per segment!), On the other hand, the Windows guest issues single (synchronous) requests for 64k of file data, which the NFS server provides to ESXi in sequences of 4k segments. Remember this is the same NFS client and server (just different guests), so the difference in NFS behavior may be an important clue--perhaps thread handling is different for multiple parallel requests versus one single request.
So just with these two data points it seems that I should look into increase the size of the writes and the number of outstanding reads for XP.
I am beginning to suspect that some of this is due to XP itself. Large servers (like for Exchange or SQL Server) will have much larger cluster sizes by default, which would be different from the 4k clusters that are default for this small guest. Multiple outstanding I/O requests should also improve the read results assuming they are interpreted that way by the filesystem driver.
10. RE: NFS performance

Recommend
ehall
Posted Nov 02, 2010 04:25 AM

Reply Reply Privately
I was able to recover the performance that was lost when partitions were aligned (loss of caching) by bumping the read_ahead_kb option on the RAID volume to 1024, which is recommended for database-like I/O anyway. That does not really "fix" the XP VM and boosted all of the VMs by a few percentage points. Improvements to XP are incidental.
http://www.eric-a-hall.com/dumpster/benchmarks/XP-SP3-VM-RA-KBs.png
http://www.eric-a-hall.com/dumpster/benchmarks/XP-SP3-VM-RA-IOPS.png
I also experimented with cluster sizes on the VM a little bit, but did not get much out of it. I was able to increase performance between 0-7% on 2/3rds of the transactions by bumping the cluster size to 16K, but the other 1/3rd of the transactions decreased by 0-7%... not really a wash but not good enough to put up with problems from non-standard cluster sizes (I was unable to boot the VM with other sizes). It may be that if I combined 16K clusters with the paravirtual driver that I would improve Windows performance by 10% total, but I wouldn't be able to work on the partition with any tools whatsoever.
I thought I'd give Win7 a try and it is much better than XP but still not competitive with Linux. On win7 the ESXi client is able to write 512KB before forcing a sync (same as Linux) but it does not produce the same data rate so still not as fast. On reads it is now using 2 threads for linear reads, and the server will periodically use large messages in the replies, which is better than XP using 1 read thread, but still not on par with Linux and its multiple threads. Win7 will ramp up to ~440 Kb/s writes and ~700 Kb/s reads, but Linux pretty much starts off higher than that.
http://www.eric-a-hall.com/dumpster/benchmarks/Win7-VM-KBs.png
http://www.eric-a-hall.com/dumpster/benchmarks/Win7-VM-IOPS.png
The disk I/O in the windows clients is the bottleneck. Just to prove the point I did some tests with a file size of 10MB (small enough to easily cache in VM memory) and performance was 2-3 Gb/s. I tested incrementally larger sizes and basically anything that goes to client disk kills performance, even if the NFS server has the dataset completely cached in it's memory. Basically the disk I/O on Windows clients is crap. I am almost interested enough to test whether the server flavors are implemented any differently but I that is beyond the scope of my current project.
11. RE: NFS performance

Recommend
ehall
Posted Nov 02, 2010 09:03 PM

Reply Reply Privately
Still nailing down some loose items here. The LSI SAS uses a storport Windows driver instead of a scsiport driver, so in theory it should be much faster, and it is for Win7 clients (and Linux too) but is a couple of points slower on XP. Also weird is that Server 2003 R2 does not produce any better throughput than XP, even though it uses the storport driver model. Final best numbers came from Win7 with the bundled LSI SAS driver (which is not shown as storport), with average throughput of 400 Mb/s on writes and 528 on reads, and peak throughput of 480 Mb/s on write and 712 Mb/s on read. IOPS were roughly 11k with 4k blocks.
On the other hand, with all of the other optimizations in place, and with the LSI SAS kernel module compiled into the initrd, the Linux VM is showing AVERAGE writes of 848 Mb/s and reads of 928 Mb/s, and peak writes of 904 Mb/s and reads of 968 Mb/s. These are using iozone's data throughput numbers, and with the NFS/TCP/IP overhead included I am bouncing off the wire limit. IOPS are pushing 27-28k for 4k blocks.
So basically after everything Linux performance is still roughly 2x that of average Windows performance. Everything in consideration (small number of threads, lack of benefit from storport driver model, etc), I suspect that there is a problem with interaction between the Windows disk I/O and the VMware storage susbsystem. Clearly ESXi NFS client is able to push the traffic... there is something peculiar with the Windows subsystem in particular.
ps--As an aside, I also experimented with jumbo frame sizes, and the best numbers come from 4k frames (4136 MTU which leavs 4096 payload after 40 bytes of header overhead). 8k frames increase the maximum application-layer throughput due to the reduction in overhead, but cache operations drop and the final number of IOPS is lower by about 5%.
12. RE: NFS performance

Recommend
J1mbo
Posted Nov 02, 2010 10:42 PM

Reply Reply Privately
This is great info, please keep it coming :smileyhappy:

http://blog.peacon.co.uk
Please award points to any useful answer.
Unofficial List of USB Passthrough Working Devices
13. RE: NFS performance

Recommend
ehall
Posted Nov 03, 2010 05:22 AM

Reply Reply Privately
I'm glad somebody is finding this info useful. Unfortunately I'm out of ideas and am now catching myself testing the same things all over again.
Is anyone here seeing any Windows guests get near gigabit wire speeds on "local" disk I/O with the VMDK mounted over NFS (and without excess client-side caching)?
14. RE: NFS performance

Recommend
LucasAlbers
Posted Nov 03, 2010 06:49 AM

Reply Reply Privately
I was looking at some of the tweaks that have been done by the vmmark winners, you might get some ideas from the vmmark results.
For example on this particular one, i found a few to tweak network and disk io.
http://www.vmware.com/files/pdf/vmmark/VMmark-Dell-2010-09-21-R715.pdf
Disk.schedNumReqOutstanding=256 default 32
Net.MaxNetifRxQueuLen=300 default 100
Net.MaxnetifTxQueueLen=1000 default 500
Net.vmxnetThrougputWeight=255 default 0
Buffer.cachesoftmaxDiry=85 default 15
Net.TcpipHeapMax=120 default 64
Their are a number of settings you can tweak for disk caching and network buffer size, adjusting these might affect both or push one closer to an optimal configuration. So not sure which way to adjust them.
You could switch the data file system from ntfs to exfat, which windows 7 supports.
15. RE: NFS performance

Recommend
J1mbo
Posted Nov 03, 2010 08:48 PM

Reply Reply Privately
Could you post your iozone command line so the tests can be replicated?

http://blog.peacon.co.uk
Please award points to any useful answer.
Unofficial List of USB Passthrough Working Devices
16. RE: NFS performance

Recommend
ehall
Posted Nov 04, 2010 12:57 AM

Reply Reply Privately
Could you post your iozone command line so the tests can be replicated?
I'm using the 2.0.24 windows build, which is actually a cygwin port. I've thought about compiling my own but get the same basic numbers from other Windows disk benchmarks (iozone is reproducible and charts are more informative, which is why I use it).
iozone -L64 -S1024 -i0 -i1 -a -s1G -f iozone.dat -b output.xls
The -s1G says to generate a 1GB test file, while the the -a tells it use block sizes from 4KB to 16MB. For the file size use 2x the VM memory (or anything that is large enough not to be cached by the VM), since you want to test the storage network throughput.
Insert -O somewhere in there to log IOPS instead of KB/s
17. RE: NFS performance

Recommend
J1mbo
Posted Nov 04, 2010 10:24 AM

Reply Reply Privately
For you reference, here are some results I've just gathered. I can't see anyway of controlling the queue depth with iozone though.
The auto testing program per the above command line:

A manual test with two threads:

http://blog.peacon.co.uk
Please award points to any useful answer.
Unofficial List of USB Passthrough Working Devices
18. RE: NFS performance

Recommend
ehall
Posted Nov 04, 2010 12:02 PM

Reply Reply Privately
Thanks. The numbers from your single thread broad spectrum test are in striking distance of what I see here.
It is interesting that your dual thread test got such high numbers though, it looks like near wire speed for each thread independently. You did not force constant processor behavior with -L and -S so there may be a difference there (I use those options so that all platforms test the same).
I have been forcing the thread count up by running the single broad-spectrum test with multiple VMs simultaneously, which keeps from getting exaggerated numbers from a favored size, and also keeps the per-VM throughput constant. In my tests the overall throughput goes up quite a bit (see the original post with 4x VMs) but the per-node tests stayed pretty much constant.
19. RE: NFS performance

Recommend
ehall
Posted Nov 04, 2010 04:45 PM

Reply Reply Privately
J1mbo is that version of iozone a cygwin compile or a port?
I have been revisiting iometer tests and it seems that performance is directly related to the number of outstanding threads, which is something that has been talked around at several points. Trying to mesh that observation with fast Linux and slow Windows readings under iozone, I am wondering if cygwin has limited number of threads in comparison to the native compilations. Maybe iozone native compile would actually produce "correct" numbers
20. RE: NFS performance

Recommend
ehall
Posted Nov 04, 2010 12:48 AM

Reply Reply Privately
Their are a number of settings you can tweak for disk caching and network buffer size, adjusting these might affect both or push one closer to an optimal configuration. So not sure which way to adjust them.
This is a good suggestion however all of these tweaks are for overall performance which isn't my problem (my scale anyway).
It would probably be a good idea to investigate tweaking the individual *.vmx files. I'll have to drum up some guides.
21. RE: NFS performance

Recommend
ehall
Posted Nov 05, 2010 05:20 AM

Reply Reply Privately
I spent some time getting iometer to work right (mostly involved unpublished patch to the Linux agent), and did some tests against the guests. VMware uses 16 outstanding I/Os when they publish "max performance" numbers so I used that as a static value. Then I did 100% write and 100% read tests of various block sizes against the guests, and converted the data into charts.
With all of the past optimizations in place, and using the above settings, iometer shows XP/SP3 getting an average of 776 Mb/s on reads and 500 Mb/s on writes, with peak of 908 Mb/s on read and 561 Mb/s on write. That's much better than iozone was showing.
http://www.eric-a-hall.com/dumpster/benchmarks/XP-SP3-VM-IOM-KBs.png
http://www.eric-a-hall.com/dumpster/benchmarks/XP-SP3-VM-IOM-IOPS.png
For the Win7 guest, iometer showed an average of 751 Mb/s for reads and 427 on writes, with peak of 895 Mb/s on read and 479 on write. That is also much better than iozone. What's weird is that this is slower than XP, when iozone was showing Win7 consistently faster. Also the Win7 guest has the LSI SAS storport driver so it should be faster there too. Maybe it's just bad sample.
http://www.eric-a-hall.com/dumpster/benchmarks/Win7-VM-IOM-KBs.png
http://www.eric-a-hall.com/dumpster/benchmarks/Win7-VM-IOM-IOPS.png
Once I got the Linux dynamo to process more than 1 outstanding I/O, it reported average read of 700 Mb/s and write of 425 Mb/s, with peak read of 986 Mb/s and peak write of 529 Mb/s. The peak numbers are close to what iozone reported, although the average numbers from iometer are lower. On the other hand the iozone tests went up to 16 MB block size, while I stopped testing iometer at 64 KB.
http://www.eric-a-hall.com/dumpster/benchmarks/Linux-VM-IOM-KBs.png
http://www.eric-a-hall.com/dumpster/benchmarks/Linux-VM-IOM-IOPS.png
At this point it seems that the problem is with iozone for Windows limiting the number of threads or the buffer size. I pulled down the current version of iozone for Windows and it is also compiled under Cygwin, so perhaps there is a limitation there. I will bring it up on their mailing list.
Overall it seems that iometer is able to get the vmkernel NFS client to crank out 900+ Mb/s of application data on the wire with XP and Win7 guests (this is not counting TCP/IP and NFS overhead), and is able to push wire speeds with Linux guests (confirmed with iometer and iozone both). Hard to complain about that
Thanks for the discussion
(wobbe the forum won't let me upgrade your "Helpful" response to an "Answer")
22. RE: NFS performance

Recommend
J1mbo
Posted Nov 05, 2010 12:58 PM

Reply Reply Privately
Please publish details of the iometer linux fix, this is something I've been looking at for a while (without success).
Also the sustained numbers seem low, especially on writes (assuming these are sequential workloads).

http://blog.peacon.co.uk
Please award points to any useful answer.
Unofficial List of USB Passthrough Working Devices
23. RE: NFS performance

Recommend
ehall
Posted Nov 05, 2010 03:25 PM

Reply Reply Privately
In the 2006-07-27 common source archive, grep for O_DIRECT. There is one file that uses it as an option twice. Remove the O_DIRECT option (and the pipe characters separating it from the other options), save the file, copy the correct Makefile-* to Makefile and then make dynamo
I agree that writes are slow, however I think that is on the back-end. Also all three guests showed the same curve in iometer.
24. RE: NFS performance

Recommend
ehall
Posted Nov 05, 2010 10:16 PM

Reply Reply Privately
Server performance measures somewhat slower under iometer than it does under iozone, about 2.5 Gb/s reads and 960 Mb/s writes. That is raw data rate since no TCP/IP or NFS when using iometer locally.
http://www.eric-a-hall.com/dumpster/benchmarks/Linux-Server-IOM-KBs.png
http://www.eric-a-hall.com/dumpster/benchmarks/Linux-Server-IOM-IOPS.png
NFS reads are bouncing off the wire limit (+900 Mb/s without counting overhead). NFS writes are averaging about 550 Mb/s so that is about 30% slower than direct I/O. I will have to look into this some more but it's probably good enough for my application.
25. RE: NFS performance

Recommend
ehall
Posted Nov 12, 2010 07:39 PM

Reply Reply Privately
Few more tests to close out end points. First, the reason for going to NFS in the first place was that it was faster than iSCSI by about 10%, so I wanted to see how it performed after the tweaks. Surprisingly it beats the snot out of NFS on writes
All of the following data is with Server 2003 R2, using the LSI SAS driver. Tests were performed with iometer using a 2GB test file (2x the VM RAM but small enough to fit into server RAM cache), with block sizes between 4KB and 64KB, with fixed 16 outstanding I/Os. The server storage pool is the same for all tests. Numbers are in bits (not bytes) since I am probing the SAN throughput ceiling of 1000 Gb/s.
For NFS, write performance peaked at 576 Mb/s with an average of 489 Mb/s, while read performance peaked at 902 Mb/s with an average of 796 Mb/s. That's pretty close to wire speed on big-block reads and about 3/5th wire speed on big-block writes.
Then I tested iSCSI with guest mapped to raw LUN, and then again with host using LUN for VMFS. For raw iSCSI mapping at the VM, write performance peaked at 880 Mb/s :smileyalert: with an average of 774 Mb/s, while read performance peaked at 900 Mb/s with an average of 792 Mb/s. This is a huge jump over NFS writes (approx 65% improvement), while reads stayed about the same. I also tested iSCSI mapping on the host with VMFS, and it was about 5% slower than raw iSCSI mapping at the guest but still way faster than NFS. There is not a single area of improvement--write performance on iSCSI follows the same curve as NFS, but it's much faster at each data point.
http://www.eric-a-hall.com/dumpster/benchmarks/W2K3-R2-VM-NFS-IOM-KBs.png
http://www.eric-a-hall.com/dumpster/benchmarks/W2K3-R2-VM-Raw-IOM-KBs.png
http://www.eric-a-hall.com/dumpster/benchmarks/W2K3-R2-VM-VMFS-IOM-KBs.png
In terms of IOPS, VMDK on the NFS mount showed about 9k writes and 15k reads with 4KB block sizes, while raw iSCSI showed 14.5k reads and writes, and host-mapped iSCSI showed about 14k reads and writes (give or take a few hundred ops). So, read IOPS with NFS are still faster, although not by much (2% maybe).
http://www.eric-a-hall.com/dumpster/benchmarks/W2K3-R2-VM-NFS-IOM-IOPS.png
http://www.eric-a-hall.com/dumpster/benchmarks/W2K3-R2-VM-Raw-IOM-IOPS.png
http://www.eric-a-hall.com/dumpster/benchmarks/W2K3-R2-VM-VMFS-IOM-IOPS.png
Altogether it would appear that the write performance throttling I saw earlier is because of something with NFS, not because of the back-end storage server. I don't know if it's because of the VMware NFS client or the Linux NFS Server. I will test the server with raw NFS Linux clients to check it's ceiling.
If I could get NFS writes to move at the same speed as iSCSI writes, then NFS overall would be faster due to the higher number of IOPS.
If I can't get NFS writes to improve I will definitely switch back to iSCSI.
26. RE: NFS performance

Recommend
ehall
Posted Nov 13, 2010 04:48 AM

Reply Reply Privately
I can get better numbers with a Linux NFS client on bare metal, but not a whole lot better. So either the Linux NFS server is the bottleneck, or ESXi and Linux share the same NFS client code.
FWIW the optimal rsize/wsize for me is 256KB. NFS starts to hit wire speed on read for all block sizes there, and tops out at about ~680 Mb/s on writes there. Going larger does not provide any benefits.
iSCSI is so much faster I am definitely going to switch over. Ridiculous that little old IET with fileio is ~65% faster than NFS
27. RE: NFS performance

Recommend
J1mbo
Posted Nov 13, 2010 07:34 AM

Reply Reply Privately
Have you tuned your NFS server (128 threads, 256K rmem_default, appropriate IO sceduler, optimimum underlying FS mount options)? Also I find much better performance with ext4 or xfs than ext3.

http://blog.peacon.co.uk
Please award points to any useful answer.
Unofficial List of USB Passthrough Working Devices
28. RE: NFS performance

Recommend
ehall
Posted Nov 13, 2010 04:10 PM

Reply Reply Privately
Yes I spent a few days tuning the RAID array, the filesystem, and the network options before my original post (which was looking for optimizations within ESXi and the VMs directly). Rather than rehash all of the prior decisions, look at the results--the filesystem tests show consistent performance with a variety of tests, and the TCP/IP and iSCSI throughput testing also show that I can fill the wire in both directions. That suggests that the write limit with NFS is probably in the NFS application layer. Unfortunately that is also the area that is the least tunable, and I have not found many options for tweaking other than stuff I already have in /etc/exports ("no_root_squash,no_subtree_check,async").
I'm sure I could get a few more points out of the filesystem and the network and even the VM but I don't think it will matter until I can figure out why NFS is slower than other protocols.
29. RE: NFS performance

Recommend
J1mbo
Posted Nov 13, 2010 09:27 PM

Reply Reply Privately
I've tried to match your testing except that I ran the tests with physical disk IO on the NFS server instead of from cache:
- Windows Server 2003 guest VM, VMDK provided by NFS, guest partition 64KB aligned
- 30GB test file
- 16 outstanding IOs
- Test definitions all 100% sequential and 4KB aligned
- 120s ramp and 5 minute run time per test
IO-Meter results are, in MB/s,
IO Size
Read
Write
4K
69.0
68.4
8K
102.4
87.5
16K
108.9
103.6
32K
111.4
104.8
64K
112.0
101.6
With a 1514 byte MTU this equates to network throughput in Mbps (considering data & ethernet overheads, but not requests or ACKs):
IO Size
Read
Write
4K
634
629
8K
926
793
16K
979
932
32K
994
936
64K
998
905

http://blog.peacon.co.uk
Please award points to any useful answer.
Unofficial List of USB Passthrough Working Devices
Message was edited by: J1mbo - error found in formula converting NFS MB/s to network Mbps.
30. RE: NFS performance

Recommend
ehall
Posted Nov 14, 2010 01:29 AM

Reply Reply Privately
Looks like you ramp up pretty well. What are the numbers for the storage filesystem directly
31. RE: NFS performance

Recommend
J1mbo
Posted Nov 14, 2010 09:41 AM

Reply Reply Privately
Unfortunately I can't get IO-Meter to run properly on this host, even building it with the O_DIRECT references removed, but measured using bonnie++ directly sustained sequential throughput varies from ~220MB/s at the inside of the disks to ~300MB/s at the outside (read and write the same). The machine has 6 sata drives.

http://blog.peacon.co.uk
Please award points to any useful answer.
Unofficial List of USB Passthrough Working Devices
32. RE: NFS performance

Recommend
ehall
Posted Nov 14, 2010 07:25 PM

Reply Reply Privately
I also get about 280MB/s (Bytes) in sustained raw (local) writes across the board if I disable strict journaling on the Ext4 partition. But I don't get near wire-speed on application-layer network writes until large messages, also like you.
In my ongoing tests (which have all been data=journal mount option), I am pulling 60% over the network. With Ext4 default setting (data=ordered), raw writes go way up (2x or more), and application-layer writes over the network also go up significantly but they do not get to wire speed until write sizes get larger. Actually with 4KB write sizes it's the same speed. I am pushing 100 MB/s (Bytes) a little sooner than you but it's probably explainable by the difference in local writes.
It looks like 60% of local performance is a hard limit on NFS, and can go as low as 30% (assuming best local and worst NFS at the same time). That is the apparent normal.
I did make a boo-boo on the earlier iSCSI numbers, in that those tests were using a file on another partition that was not using data=journal, which is why the writes were so much faster. With the files on the same partition as the vmdk files, the overall throughput numbers are still a little better (5-10%) but are not massively better. At 4KB they are pretty much the same.
33. RE: NFS performance

Recommend
J1mbo
Posted Nov 14, 2010 08:52 PM

Reply Reply Privately
I'm not sure I follow exactly, but for completeness my mount options for ext4 are rw,noatime,nobh,nobarrier.
I was cloning some VMs today to another NFS server (Debian, XFS, 4-drive RAID-10), which was peaking at 110MB/s. It would be interesting to run some tests with 9K frames on 10GbE.

http://blog.peacon.co.uk
Please award points to any useful answer.
Unofficial List of USB Passthrough Working Devices
34. RE: NFS performance

Recommend
ehall
Posted Nov 15, 2010 05:16 PM

Reply Reply Privately
The default journalling type is "ordered", and the other options provide more or less journalling in relation to that (I have been using the strictest so that my VMs are not nuked by power supply failure, which I am somewhat afraid of)
Here are some benches you might be interested in. They show the penalty (or lack of) for using certain options in timed operations. There is very little argument for disabling barriers there. http://natzo.com/doku.php?id=categories:linux:filesystem However the real question for guys like us is the latency that is introduced to every operation, so we need to see small operation times and these kinds of single-user local I/O tests are not really useful.
35. RE: NFS performance

Recommend
J1mbo
Posted Nov 16, 2010 10:24 AM

Reply Reply Privately
Since my machines have BBWC, the use of nobarrier seems to make sense to me. Benchmarking is complex though - for example, the difference between ext3 and ext4 file delete performance is almost non existent in that paper, yet ext3 will fail on my hardward to delete files (in the time allowed by ESX) as small as 50GB, whilst ext4 will delete up to 1.3TB basically because of its use of extents (the operation is then CPU bound).
I've updated the Mbps numbers in the table in the above post because I'd failed to take into account ethernet preamble and frame spacing.

http://blog.peacon.co.uk
Please award points to any useful answer.
Unofficial List of USB Passthrough Working Devices
36. RE: NFS performance

Recommend
ehall
Posted Nov 16, 2010 07:59 PM

Reply Reply Privately
I also have BBU. It's not a guarantee of anything, especially if the battery is old. My hope was that if local write limits were above wire speed limits then I could use strict journalling with no performance penalty. Since we are seeing 60% wire speed limits I'm starting to think that the plan is not going to be workable.
I took some tests with IOmeter to gather up latency data (this is with default journalling). The readings in the chart are for local I/O on the storage server, NFS I/O from a Linux PC on the same network, and "local" I/O from a Linux VM (NFS I/O from the ESXi host). These tests are with 4K and 8k writes, mixed r/w, and reads. I used rsize=4,wsize=4 on the PC NFS mount to force common data sizes (larger sizes would impose queue penalties, and seem to break IOMeter anyway). Ping shows RTT latency for the Linux PC is approx 115us, ESXi server is approx 175us. I subtracted those values from the recorded readings so what we have left is NFS latency not network latency.
http://www.eric-a-hall.com/dumpster/benchmarks/Latency-Comparison-1.png
ESXi shows a whole 1ms penalty on 4k writes, and a significant penalty on 4k reads too. That probably explains why small operations have been showing so slow in all of the other tests. From my packet traces earlier I think the ESXi is using rsize/wsize of 64, so this may be queue penalty.
The other significant observation is that the ESXi and Linux NFS clients both show about the same performance for 8k, and they both have 1ms penalty over local I/O. That suggests processing penalty on the NFS server.
I am pretty sure that the mixed scores are high due to seek times. The latency penalties for remote access are not very extremely high relative to local I/O (half a millisecond variance for 4-5 seconds of overall wait times) so even though the numbers are high they are not wildly different from the other numbers.
37. RE: NFS performance

Recommend
harryj
Posted Dec 20, 2010 11:24 PM

Reply Reply Privately
I setup a ESXi 4.1 lab server to experiment with, and thought the NFS writes when using ghettoVCB where a little slow, so I did some performance testing on my own today. I was glad to see others are not happy with their NFS performance as well.
Below are my test results. They aren't as nicely done as ehall's, but it pretty clear to see things aren't very good.
Note that my tests were done from the ESXi 4.1 tech support console, not from a VM, as that's where ghettoVCB runs.
So this is my setup:

Server1 : Asus P5NT WS, Q6600, 8GB, Open Solaris 2009.06, ZFS, 2TB Seagate LP drive, tuned with 200 NFSD threads
Server2 : Asus X58 Sabertooth, Core I7 950, 8GB, Ubuntu 10.10 X64, EXT4, 2TB Seagate LP drive, tuned with 200 NFSD threads, using nfs-kernel-server package

Client: MSI 785GB, Athlon II X4 640, 8GB, VMware ESXi 4.1

All three servers use Intel Pro 1000 NICs on PCI-E interfaces.

I'm using the following command to test write performance:
time dd if=/dev/zero of=pt count=5000000

Solaris local disk: 1min 5 sec
Ubuntu local disk: 1min 25 sec

ESXi -> Solaris : 2 min 2 sec
ESXi -> Ubuntu: 6min 22 sec <- Holy heck what's going on here?
ESXi -> Ubuntu: 1 min 21 sec when exported with async option

Why is Ubuntu so slow when using the async option? Solaris uses noasync and only has ~25% performance penalty over Ubuntu's async.
OK, let's try some reads:
Solaris -> ESXi: 1min 55sec
Ubuntu -> ESXi: 2 min 35 sec
I was surprised to see such a difference in read performance here.
So for this little test, it seems like Open Solaris 2009.06 running on "ancient" hardware seems to have overall better NFS performance than Ubuntu running on "modern" hardware. Unfortunetly, I don't have any other disks than the slow 2TB Seagate LP drive to test with.
I'm going to try Solaris 11Express as an NFS server sometime soon and see how that goes. It has ZFS sync configuration options now, that 2009.06 does't have, supposed to make it work more like Linux's async option, so we'll see what that does to get the write performance closer to Linux's when async is used.
38. RE: NFS performance

Recommend
harryj
Posted Dec 21, 2010 06:22 PM

Reply Reply Privately
I upgraded my Solaris NFS server to Solaris 11 Express build 154 last night.
This version now adds a new sync option to ZFS data sets to allow you to turn async on or off.
With sync=standard, the write speed was slightly higher, around 2min 20 seconds.
with sync=disabled, writing a 2400M file went from 2 minutes to 39 seconds over NFS!
Running the same command locally, it was only 32 seconds. Therefore with sync=disabled, I was almost able to acheive the same speed over
NFS as I was on the local disk.
I take this to mean that my bottle neck is now the single disk.
So if you want decent NFS performance, it looks like Solaris beats Linux by a 2:1 margin in write speeds and is about 25% faster in reads, at least for this test scenario.
I wish ESXi would do NFSv4....
39. RE: NFS performance

Recommend
J1mbo
Posted Oct 29, 2010 10:56 PM

Reply Reply Privately
I just wanted to comment on RP-Parker's post.
I really agree with the points about vmware being blinkered by their success. The 2TB limit in particular is a complete joke.
However Microsoft are themselves seemingly on a self-destruct mission too..., all will change in the next decade I feel!

http://blog.peacon.co.uk
Please award points to any useful answer.
Unofficial List of USB Passthrough Working Devices
40. RE: NFS performance

Recommend
ehall
Posted Oct 30, 2010 05:56 AM

Reply Reply Privately
Two changes with some results.
First, I rebuilt the drives/VMDKs using LSI Parallel controllers, which seems to have helped a great deal (they were originally imported from VM Server and had been whacked at quite a bit). I also tested with the Paravirtual SCSI controller, and while it yielded consistently better data it wasn't a huge increase (maybe 2-5%) which does not justify the extra difficulty in managing the systems.
Second, I read through the link from wobbe98 which advised aligning the the guest partitions too, but that did not seem to make any statistical difference except that cached operations fell a bit (probably because fewer underlying blocks and stripes being processed). I'll have to look into that more. It may be that I can improve things by boosting write-ahead, or by using larger blocks, which would mimic some of the earlier sloppy behavior.
For the Linux VMs, local disk performance increased by 60% just by rebuilding the drives (2.6 kernel drivers). I'm currently pushing 650 Mb/s on writes and 900 Mb/s on reads, which is pretty good. This puts the write performance of the vmkernal NFS above the raw NFS mount write numbers, and close to same with the read numbers. I'm still missing about 30% of the write capacity but at this point I'm much less worried about it.
The Windows VMs also improved from recreating the drives (LSI provided drivers), but only by 15-20% and that is only if you look at the data cross-eyed. Write performance stabilized but did not go much higher than it was originally, while the read performance did improve noticeably in some areas. The performance is clearly better but it's still bad, and still below the original Linux numbers. I bet if I bumped up cluster size to 64k it would jump, but I'd like to know what's holding it back right now.
Keep the ideas coming
41. RE: NFS performance

Recommend
J1mbo
Posted Oct 30, 2010 07:57 AM

Reply Reply Privately
The big problem I see here is that the testing methodology is looking only at sequential throughput.
I've been using NFS quite a bit and have found that the underyling NFS server configuration is hugely important. File system choice is also critical in some configurations as well as the partition (or volume) mount options. In general I tend to mount noatime,nodiratime and fs specific such as data=writeback, nobarrier etc (but my machines have both BBWC and UPS). When formatting ext3/4 can tune it to the underlying RAID volume, for XFS can specify bigger log buffers.
The NFS server can also be tweaked, for example providing enough threads, 256k window sizes, and setting the IO scheduler depending on the hardware (use noop for RAID controllers as these will do the re-ordering etc).
However IMO the testing needs to be focused on random workloads, for which I use IOmeter. Testing 8K random, say 70% read, over a good size test file to avoid cache (maybe 8GB, maybe 30GB depending) with various queue depths will show (possibly profoundly) the effect of guest partition alignment. Set the access pattern to be 4KB aligned (presumably your guest file systems are using 4KB blocks) and hit these with moderate queue depths (say 16 IOs) against aligned and unaligned partitions.
Another issue I've come up against recently is the interaction between file systems and software RAID (i.e. mdadm). Although I usually use XFS, this had truely awful random performance specifically when running on mdadm array for reasons I don't yet understand, whilst JFS worked well but with sequential write workloads jfscommit used progressively more and more CPU, eventually bringing performance to its knees (<10MB/s). Ext4 has proved immune to both these problems but takes too long to delete files approaching 2TB, causing timeouts in ESX - nothing it seems is perfect!

As it happens sequential write speed for vmkernel type operations (for example, copying a vmdk from local to NFS) seems to run at about 60MB/s tops for me. Writing 32k sequential in a guest with IO meter can saturate the link however, with sufficient queue depth.

Anyway, a bit off topic, but HTH

http://blog.peacon.co.uk
Please award points to any useful answer.
Unofficial List of USB Passthrough Working Devices
42. RE: NFS performance

Recommend
ehall
Posted Oct 30, 2010 09:15 PM

Reply Reply Privately
J1mbo, My guests aren't email servers or database servers, they are clients that are used for various kinds of profiling tests. I need to know the limitations of the infrastructure to do that work. I haven't found them yet, because whatever I'm bumping into is clearly too low, especially in comparison to the numbers that are obtainable from other tests (such as local I/O and network throughput). Thanks for the thoughts though.
43. RE: NFS performance

Recommend
529122530
Posted Dec 21, 2010 10:54 AM

Reply Reply Privately
Good job,I think NFS vStorage will have a good future

IO Size	Read	Write
4K	69.0	68.4
8K	102.4	87.5
16K	108.9	103.6
32K	111.4	104.8
64K	112.0	101.6

IO Size	Read	Write
4K	634	629
8K	926	793
16K	979	932
32K	994	936
64K	998	905

VMware vSphere

NFS performance

ehallOct 29, 2010 04:22 AM

AWoOct 29, 2010 09:12 AM

ehallOct 29, 2010 01:00 PM

RParkerOct 29, 2010 02:29 PM

malaysiavmOct 29, 2010 06:04 PM

RParkerOct 29, 2010 06:44 PM

wobbe98Oct 29, 2010 08:36 PM

ehallOct 29, 2010 10:26 PM

ehallOct 30, 2010 11:51 PM

ehallNov 02, 2010 04:25 AM

ehallNov 02, 2010 09:03 PM

J1mboNov 02, 2010 10:42 PM

ehallNov 03, 2010 05:22 AM

LucasAlbersNov 03, 2010 06:49 AM

J1mboNov 03, 2010 08:48 PM

ehallNov 04, 2010 12:57 AM

J1mboNov 04, 2010 10:24 AM

ehallNov 04, 2010 12:02 PM

ehallNov 04, 2010 04:45 PM

ehallNov 04, 2010 12:48 AM

ehallNov 05, 2010 05:20 AM

J1mboNov 05, 2010 12:58 PM

ehallNov 05, 2010 03:25 PM

ehallNov 05, 2010 10:16 PM

ehallNov 12, 2010 07:39 PM

ehallNov 13, 2010 04:48 AM

J1mboNov 13, 2010 07:34 AM

ehallNov 13, 2010 04:10 PM

J1mboNov 13, 2010 09:27 PM

ehallNov 14, 2010 01:29 AM

J1mboNov 14, 2010 09:41 AM

ehallNov 14, 2010 07:25 PM

J1mboNov 14, 2010 08:52 PM

ehallNov 15, 2010 05:16 PM

J1mboNov 16, 2010 10:24 AM

ehallNov 16, 2010 07:59 PM

harryjDec 20, 2010 11:24 PM

harryjDec 21, 2010 06:22 PM

J1mboOct 29, 2010 10:56 PM

ehallOct 30, 2010 05:56 AM

J1mboOct 30, 2010 07:57 AM

ehallOct 30, 2010 09:15 PM

529122530Dec 21, 2010 10:54 AM

1. NFS performance

2. RE: NFS performance

3. RE: NFS performance

4. RE: NFS performance

5. RE: NFS performance

6. RE: NFS performance

7. RE: NFS performance

8. RE: NFS performance

9. RE: NFS performance

10. RE: NFS performance

11. RE: NFS performance

12. RE: NFS performance

13. RE: NFS performance

14. RE: NFS performance

15. RE: NFS performance

16. RE: NFS performance

17. RE: NFS performance

18. RE: NFS performance

19. RE: NFS performance

20. RE: NFS performance

21. RE: NFS performance

22. RE: NFS performance

23. RE: NFS performance

24. RE: NFS performance

25. RE: NFS performance

26. RE: NFS performance

27. RE: NFS performance

28. RE: NFS performance

29. RE: NFS performance

30. RE: NFS performance

31. RE: NFS performance

32. RE: NFS performance

33. RE: NFS performance

34. RE: NFS performance

35. RE: NFS performance