Hi,
to transfer a disk IO from the ESX Server to the array the HBA does use a command queue.
Even when an IO is immediatly answered/acknowledged (R/W) by the array, this will allocate a slot in the HBA command queue.
With default settings, the queue depth per LUN is set to 30 (Emulex) or 32 (QLogic)
So if you need to transfer a higher amount of parallel IO's to your array, you should have enough physical LUNs to handle the load.
Here's a good document about ESX and queues, but keep in mind the the virtualised OS also uses queuing which isn't part of the document.
Storage Queues and Performance
JumpR wrote:
So does it mean that each LUN (aka SAN Datastore) is allocated a thread vs each individual vmdk inside the LUN?
This way having a small number vmdk LUNs improves the performance.
No, as adatastore with less VM's isn't faster than a datastore with more VM's on it when both are created with the same specs.
If the SAN disk i.e. could server 150 IO/s it wouldn't server 20 IO's faster than 100 IO's, the IO response time will be nearly steady at 6ms (when not served from array cache).
When your VM's would generate more IO's than the LUN could deliver you would face a performance impact, but the LUN doesn't care if the load is created by few or many VM's.
Performance design is an art, that's why a lot of storage companies decide to implement features which enables their arrays to automatically move hot LUN's (or even better tracks) between storage tiers.
I totally agree with mcowger that these features are much better then a manual design, simply because they are dynamic while the other design is static.
Hth
Ralf