Hi John,
OK so that's consistent with what I've seen. Annoying but not causing down time. If it really bothers you, you can try something, reduce the minimum number of IOs that go down each path. The default is 1000. There was a joint storage vendor "paper" (Dell/EMC/NetApp) that suggested changing that to three (3) instead. Since you IO load is so low at that time you're tripping over the bug. Getting more consistent IO going over the available paths will likely reduce the frequency of the alerts.
<![endif]><![if gte mso 9]>
*Question
3: “I’ve configured Round Robin – but the paths aren’t evenly used”*
Answer: The Round Robin policy doesn’t issue I/Os in a simple “round
robin” between paths in the way many expect. By default the Round Robin PSP
sends 1,000 commands down each path before moving to the next path; this is
called the IO Operation Limit. In some configurations, this default
configuration doesn't demonstrate much path aggregation because quite often
some of the thousand commands will have completed before the last command is
sent. That means the paths aren't full (even though queue at the storage array
might be). When using 1 Gbit iSCSI, quite often the physical path is often the
limiting factor on throughput, and making use of multiple paths at the same
time shows better throughput.
You can reduce the number of commands issued down a particular path before
moving on to the next path all the way to 1, thus ensuring that each subsequent
command is sent down a different path. In a Dell EqualLogic configuration, Eric
has recommended a value of 3.
You can make this change by using this command:
esxcli
--server <servername> nmp roundrobin setconfig --device <lun ID>
--iops <IOOperationLimit_value> --type iops
Note that cutting down the number of iops does present some potential problems.
With some storage arrays caching is done per path. By spreading the requests across
multiple paths, you are defeating any caching optimization at the storage end
and could end up hurting your performance. Luckily, most modern storage systems
don't cache per port. There's still a minor path-switch penalty in ESX, so
switching this often probably represents a little more CPU overhead on the
host.
-don