We have two identical servers:
IBM System x3650 M4, dual Xeon E5-2650 CPUs, 128 GB RAM
ServeRAID M5110e RAID controller with 1 GB cache
8x600GB 10k SAS disks in RAID6 configuration
Server 1 is running Debian Linux 8.0
Server 2 is running ESXi 5.5 Update 2.
I ran some IO benchmarks with iozone both on Server 1 (Physical) and on a Debian 7 VM in Server 2 (Virtual).
Results:
Sequential read
Physical: 848 MB/s
Virtual: 387 MB/s
Sequential write
Physical: 957 MB/s
Virtual: 58 MB/s
Random read
Physical: 3.5 MB/s
Virtual: 1.3 MB/s
Random write
Physical: 46 MB/s
Virtual: 3 MB/s
The tests were done with a file size exceeding OS cache size, and using 4 kB blocksize.
There are some other VMs running on the ESXi host, but according to esxtop they are quite idle.
Is such bad performance in VM compared to physical server 'normal'? I appreciate that some overhead from the hypervisor is expected, but this seems excessive.
Especially interesting are sequential write and random write tests. It almost looks like on physical server these tests benefit from RAID controller cache, but on virtual server the cache is not being used...
I have verified that system BIOS and RAID controller firmware are identically up to date on both servers, and that the ServeRAID M5110e controller is on VMware HCL.
There are some error messages in vmkernel.log that at first glance look like they may be relevant:
2015-05-12T12:32:21.312Z cpu10:32868)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x1a (0x412ec5c685c0, 0) to dev "naa.500507603e831429" on path "vmhba0:C0:T11:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
But device naa.500507603e831429 is the "enclosure device" not disk, so it doesn't seem likely that this affects performance.
I don't know where to look next...