We run ESX 4.0 on Dell R710s connecting back to an IBM N-Series SAN on FC.
This week we had a major outage due to a RedHat MySQL server falling over.
First off the server would not boot past virtual POST so - knowing it had an RDM disk I removed this from the configuration. The server then would boot to OS but everytime the RDM was reconnected it would either fall over or not boot.
I remembered we had farily recently expanded the space on this server and the RDM was now 300GB in size. However the space used was only 85% on the lun (255GB). Then I remembered block size limitations. The server was sitting on a vmfs partition with a 1MB block size (max 256GB). So I removed the RDM and svMotion'd the server to a datastore which had a 2MB block size, reconnected the RDM and the server booted fine. After a DB consistency check we were back up and running.
So from this fault investigation I went on to read the whitepaper maximums document
http://www.vmware.com/pdf/vsphere4/r40/vsp_40_config_max.pdf
which to be honest- had vague information on RDMs
and then I found VMware KB
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1029697
which surprising to me, clearly states the RDM pointer files are affected by vmfs block sizes.
However there is a difference of opinion in this thread
http://communities.vmware.com/message/1507498
And looking further at my own environment, we have multiple running production server with connected RDMs which are in excess in size to the stated block size limits.
I need defined clarity on this because as it sits we could have many production machines in an unstable state, or I am simply left with no explanation for the disk lock we witnessed and an outage which caused a substantial loss to the business.
regards
wolfsonmicro
Message was edited by: wolfsonmicro
accidental click - this is still an unanswered query