vSphere Storage Appliance

 View Only
Expand all | Collapse all

ESX4 swiscsi MPIO to Equallogic dropping

jgeiser

jgeiserNov 24, 2009 04:22 PM

dwilliam62

dwilliam62Jan 29, 2010 05:33 PM

s1xth

s1xthJan 29, 2010 06:10 PM

DwayneL

DwayneLJan 31, 2010 10:59 PM

s1xth

s1xthFeb 01, 2010 04:54 PM

grcumm

grcummFeb 01, 2010 07:41 PM

dwilliam62

dwilliam62Feb 05, 2010 05:45 PM

Edificom

EdificomFeb 17, 2010 10:40 AM

s1xth

s1xthMar 02, 2010 02:06 PM

johnz333

johnz333Mar 02, 2010 02:46 PM

J1mbo

J1mboMar 02, 2010 08:55 PM

grcumm

grcummMar 03, 2010 08:42 PM

s1xth

s1xthMar 10, 2010 01:17 PM

johnz333

johnz333Apr 02, 2010 02:53 PM

grcumm

grcummApr 02, 2010 04:28 PM

s1xth

s1xthApr 05, 2010 12:03 AM

johnz333

johnz333Apr 05, 2010 08:13 PM

grcumm

grcummApr 15, 2010 01:40 PM

johnz333

johnz333Apr 15, 2010 01:53 PM

  • 1.  ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jun 11, 2009 07:31 AM

    We've updated to ESX4 and have implemented round robin MPIO to our EQL boxes (we didn't use round robin under 3.5), however I'm seeing 3 - 4 entries per day on the EQL log that indicate a dropped connection. See logs below for EQL & vCenter views on the event.

    EQL Log Entry

    INFO 10/06/09 23:50:32 EQL-Array-1

    iSCSI session to target '192.168.2.240:3260, iqn.2001-05.com.equallogic:0-8a0906-bc6459001-cf60002a3a648493-vm-exchange' from initiator '192.168.2.111:58281, iqn.1998-01.com.vmware:esxborga-2b57cd4e' was closed.

    iSCSI initiator connection failure.

    Connection was closed by peer.

    vCenter Event

    Lost path redundancy to storage device naa.6090a018005964bc9384643a2a0060cf.

    Path vmhba34:C1:T3:L0 is down. Affected datastores: "VM_Exchange".

    warning

    6/10/2009 11:54:47 PM

    I'm aware the the EQL box will shuffle connections from time to time, but these appear in the logs as follows, (although vCenter will still display a Lost path redunancy event.)

    INFO 10/06/09 23:54:47 EQL-Array-1

    iSCSI session to target '192.168.2.245:3260, iqn.2001-05.com.equallogic:0-8a0906-bc6459001-cf60002a3a648493-vm-exchange' from initiator '192.168.2.126:59880, iqn.1998-01.com.vmware:esxborgb-6d1c1540' was closed.

    Load balancing request was received on the array.

    Should we be concerned or is it now normal operations for the ESX iscsi initiator to drop and re-establish connections?



  • 2.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Broadcom Employee
    Posted Jun 11, 2009 06:40 PM

    If the initiator gets the load balancing event (i.e async logout request) from the array, then the initiator has to honor by dropping and re-establishing the connection. If the connection drop is not due to async logout event, then it is a problem.



  • 3.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jun 14, 2009 08:12 AM

    Thanks for the response, I have reverted from round robin to fixed and will monitor to see if that solves the problem.

    I understand Equallogic are developing their own MPIO module for vsphere so if the above works I will probably wait for that to be released.

    Regards,

    Iain



  • 4.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jun 14, 2009 08:40 AM

    I understand Equallogic are developing their own MPIO module for vsphere so if the above works I will probably wait for that to be released.

    True. See this thead for some info:

    Andre

    **if you found this or any other answer useful please consider allocating points for helpful or correct answers



  • 5.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Broadcom Employee
    Posted Jun 14, 2009 10:40 AM

    Yes they are. Beta has just started. Received an invitations a week ago :smileyhappy:

    Duncan

    VMware Communities User Moderator | VCP | VCDX

    -


    Blogging:

    Twitter:

    If you find this information useful, please award points for "correct" or "helpful".



  • 6.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jun 14, 2009 01:15 PM

    Beta has just started. Received an invitations a week ago

    Good for you :smileywink:

    I've sent a mail last week for ask to beta program...

    But still no reply...

    Andre

    **if you found this or any other answer useful please consider allocating points for helpful or correct answers



  • 7.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jun 16, 2009 09:25 PM

    I was told the new EqualLogic MPIO module for VMware will require the new VMware Enterprise Plus license though. That could limit the usage to those willing to upgrade though there do seem to be a couple of nice features with the Plus license.

    *Edit*

    Thanks.



  • 8.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jun 17, 2009 03:35 AM

    I was told the new EqualLogic MPIO module for VMware will require the new VMware Enterprise Plus license though.

    Actually the only 3th part module is Cisco Nexus, but in this case there is also a techical requirement: the distribuite vSwitch support (available only in Enterprise Plus).

    We have to wait the product to see where it could be applied :smileywink:

    Andre



  • 9.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jun 17, 2009 01:46 PM

    Are the clocks sync'd? The vCenter event lines up with the load balancing event. However, the connection failure shows up earlier.

    Best thing is to open a case with Equallogic and let them look at the array diags and try to line up the events. Having the servers and array sync'd to an NTP server would be helpful as well.

    Don



  • 10.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jun 17, 2009 03:02 PM

    Are the clocks sync'd? The vCenter event lines up with the load balancing event. However, the connection failure shows up earlier.

    Thanks, yes the clocks are NTP synced. The vCenter event is always the same whether it is a load balancing or connection failure.

    I've pulled more detailed vmkernel logs (attached) from one of the hosts that relate to the following sequence of EQL logs...

    INFO 14/06/09 14:06:00 EQL-Array-1 iSCSI session to target '192.168.2.245:3260, iqn.2001-05.com.equallogic:0-8a0906-350eb8f01-25d000000484a27a-vm-vcenter' from initiator '192.168.2.126:61651, iqn.1998-01.com.vmware:esxborgb-6d1c1540' was closed. iSCSI initiator connection failure. No response on connection for 6 seconds.

    INFO 14/06/09 14:06:11 EQL-Array-1 iSCSI session to target '192.168.2.245:3260, iqn.2001-05.com.equallogic:0-8a0906-07a459001-9cc0005391b48e48-vm-store-workstation' from initiator '192.168.2.126:61160, iqn.1998-01.com.vmware:esxborgb-6d1c1540' was closed. iSCSI initiator connection failure. Connection was closed by peer.

    INFO 14/06/09 14:06:35 EQL-Array-1 iSCSI login to target '192.168.2.241:3260, iqn.2001-05.com.equallogic:0-8a0906-07a459001-9cc0005391b48e48-vm-store-workstation' from initiator '192.168.2.126:55046, iqn.1998-01.com.vmware:esxborgb-6d1c1540' successful, using standard frame length.

    INFO 14/06/09 14:06:53 EQL-Array-1 iSCSI login to target '192.168.2.242:3260, iqn.2001-05.com.equallogic:0-8a0906-350eb8f01-25d000000484a27a-vm-vcenter' from initiator '192.168.2.126:57993, iqn.1998-01.com.vmware:esxborgb-6d1c1540' successful using standard-sized frames. NOTE: More than one initiator is now logged in to the target.



  • 11.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jun 17, 2009 03:08 PM

    Thanks.

    The '6 second timeouts' mean that the array and server couldn't communicate and the initiator didn't respond to the EQL Keepalive Packets. That typically is a problem on the network.

    What kind of switches are you using? If more than one, how are they interconnected?

    Best thing to do is open a case and they'll review the diags from those arrays.



  • 12.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jun 17, 2009 03:16 PM

    The '6 second timeouts' mean that the array and server couldn't communicate and the initiator didn't respond to the EQL Keepalive Packets. That typically is a problem on the network.

    What kind of switches are you using? If more than one, how are they interconnected?

    Its a stack of 2 Procurve 2900's with a single connection from each ESX host to each switch (2 iSCSI ports per ESX host) and then 2 EQL boxes connected to switches.

    Yes probably best to throw it at EQL.

    Thanks.



  • 13.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jun 17, 2009 03:26 PM

    Its a stack of 2 Procurve 2900's with a single connection from each ESX host to each switch (2 iSCSI ports per ESX host) and then 2 EQL boxes connected to switches.

    Yes probably best to throw it at EQL.

    Thanks.

    I have a very similar setup with 2 ProCurve 2824 switches. I have 2 PS100E arrays and 4 NIC ports connected to the switches from each of my 3 ESX servers. Most likely overkill at this point. We also have a 3 GB trunk setup between the switches (though it should be larger). I just implemented the multipathing last night and I am not seeing any problems with connections dropping yet.

    This may be a dumb question but do you have your 2 2900 switches linked together? The EqualLogic team also has lots of good tips for how to optimize the ProCurve configurations but a 6 second timeout would seem to be more than just a lack of an optimized configuration. I could compare configs with you though.



  • 14.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jun 17, 2009 03:47 PM

    On your 2824's have you enabled the #qos-passthrough-mode one-queue on them? If you have the latest HP firmware, that setting will improve performance. By default, the 2824/2848's divde up the buffer memory into four pools for QoS. So by default you only get 1/4 of the available buffers for iSCSI. Enabling passthrough realigns them into one large memory pool. You have to reboot to make the change effective. (same with firmware upgrades) Increasing the trunked ports would definintely be a good idea.

    Don



  • 15.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jun 17, 2009 03:45 PM

    The interswitch trunking is VERY important for proper operation. How many ports have you trunked between the two 2900's? With two arrays, four would be minimum.

    Also is flowcontrol enabled on the switch ports used in the SAN? (Array and servers)



  • 16.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jun 17, 2009 03:51 PM

    No problem with trunk capacity, the 2900's are stacked via 2 x 10Gb connections.

    And yes flow control is enabled on all ESX & EQL ports, but no jumbo.

    Regards,

    Iain



  • 17.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jun 17, 2009 04:01 PM

    Then definitely, please open a case. They'll need diags from both members. If you have a map of the connections that would be helpful.

    Don



  • 18.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Oct 07, 2009 01:12 PM

    Posted in case anyone with a similar problem discovers this thread. We believe we have finally resolved this problem thanks to Arnaud in VMWare tech support.

    It appears that Dynamic Discovery is the source of the problem. No errors have appeared in the 48 hours following the removal of the EQL host ip from the dynamic discovery configuration screen.

    Regards,

    Iain



  • 19.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Oct 14, 2009 02:57 PM

    Hi Lain,

    Did you leave your MPIO setting to fixed or back to round robin? I am having simlar issues but removing the Dynamic Discovery san ip did not help.

    thx



  • 20.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Oct 15, 2009 07:59 AM

    Hi

    We are having the same issues here also. Yesterday VMware tech support changed our MPIO to fixed but they left SAN ip in Dynamic Discovery.

    This didn't fix our problem.

    Have you experienced any data loss because of this?



  • 21.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Oct 16, 2009 04:51 PM

    This server is a new build with no VM's so we have had no data loss but during our tests I have many failures just trying to create a data store on the SAN. If I switch to one nic/path everything is lightning fast, with two nics MPIO round robin the whole ESX server is dog slow and SAN connections drop in/out. I notice FCS errors on our switch for the ports the SAN uses during this time but only when the ESX server is thrashing them. I have two other servers using this SAN for another volume (non vm) and its been rock solid with no FCS errors. Not sure why this is happening, I am going to open a call to either VMWare or Dell next week to get to the bottom of it. many users online have this setup with no problems but I can't even get a test server up before we go production. All settings on switch have been verified: Jumbo Frames/Flow Control/No Sapnning tree. All cables into one switch (Nortel 5698).

    John



  • 22.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Oct 28, 2009 03:50 PM

    For anyone who does find this thread we determined that the issue is with our network switch. When we use 1500 MTU on the initiator everything works fine. 9000 MTU causes connectivity issues. Jumbo frames is enabled on our switch. If our SAN and clients are placed on the production VLAN 9000 MTU works fine, when we move everything back to the SAN VLAN problems arise. We are still troubleshooting this issue with Equallogic support but it looks like it will point to Nortel 5698 switch settings. Jumbo frames is activated by the whole swicth not by VLAN so its cofusing......



  • 23.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Oct 28, 2009 05:14 PM




  • 24.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Oct 28, 2009 06:55 PM

    I have a ticket open with Dell/EqualLogic support for the exact same issue. I was told that this is a known issue and that VMware is supposed to be working on a fix for it. However, they would not give me a timetable.

    Also, why is this thread marked as answered??? It is most definitely not resolved and should be kept open!



  • 25.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Oct 28, 2009 07:58 PM

    My issue is more with the switch and Jumbo frames. I was able to connect a workstation using an iSCSI initiator and had the same issue with dropped connections. VMWare not even in the picture. My failure is pretty solid though not interm.

    I do have the occasional drop with VmWare so I am guessing this is the issue they are working on for you. Please post if they resolve it.

    Thanks.



  • 26.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Oct 30, 2009 02:01 PM

    I am just curious, I just purchased a ps4000 and I have bene doing tons of research on how to correctly configure the vswitches for use on the ps4000. Could someone post a screenshot of their vswitch configuration for their iscsi network?

    Thanks!!



  • 27.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Oct 30, 2009 02:04 PM

    Hei

    Olen lomalla 30.10. - 16.11. välisen ajan.

    I'm on vacation 30 Oct - 16 Nov.

    - Riku



  • 28.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Oct 30, 2009 02:12 PM

    there is a guide and a video from dell/equallogic how to configure iscsi with vsphere and equallogic

    http://www.equallogic.com/resourcecenter/assetview.aspx?id=8453

    http://www.delltechcenter.com/page/A%E2%80%9CMultivendorPost%E2%80%9DonusingiSCSIwithVMwarevSphere should also be interesting.



  • 29.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Oct 30, 2009 02:53 PM

    I dont want jack this thread with my different questions. I have a separate post on my iSCSI setup questions. Thanks for the videos though...I ran through them and dont completely answer my iscsi setup questions with the EQL.



  • 30.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 17, 2009 05:37 PM

    2 helpful posts: #1 and #10.

    I'm also experiencing the same issue. I already called Dell/EqualLogic and referred me to 2 VMware case IDs. I have yet to call VMware for more info.



  • 31.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Nov 24, 2009 04:22 PM

    Well...

    Update 1 didn't fix the problem.



  • 32.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Nov 24, 2009 04:32 PM

    Hei

    Olen koulutuksessa 25.11. ja 26.11. joten luen postejani epäsäännöllisesti.

    - Riku



  • 33.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Nov 24, 2009 04:41 PM

    I am going to hopefully be firing up my eql array over the holiday

    weekend so I will let you guys know if I see the same problems.

    (PS4000x / 2x5424 switches / separate network / esxi 4.0 u1 / r710 /

    2x intel pro 1000pt nics)

    Jonathan

    Sent from my iPhone

    On Nov 24, 2009, at 11:21 AM, jgeiser <communities-emailer@vmware.com



  • 34.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 10, 2009 12:02 AM

    Hello!

    I was wondering if anyone had any success with figuring out this issue? I am experiencing the same issue. I am using a pair of VMKernel ports shared between 2 physical nic's. (Dell R710 server) I am connecting to a ps4000 storage array. It is interesting, because I am also connecting the server to a LeftHand SAN on the same vmhba, and I have not seen any issues with connectivity to the LH. But each of my 4 servers seem to randomly drop paths to the ps4000.

    I also will probably be opening a ticket in the next day or so, and will update if I get any info.

    Thanks!

    Andrew Watson

    Sr. Systems Administrator

    The Colorado College



  • 35.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 10, 2009 12:39 AM

    Just to update for anyone's benefit. With my Jumbo frames issue:We have an original rev of Nortel's 5698 switch, it turns out this early rev of hardware ,not code but the hardware, didn't play nice with our ps100e san. When we used Nortel's 5520 or a newer rev 5698 the Jumbo frame issue is fixed. I still have interm. dropping of the paths even after all this. I am very interested in its resolution. Thank you for the future post with a fix.....

    John Z.



  • 36.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 15, 2009 08:40 PM

    John,

    What hardware rev of the 5698 did you have? We have 2 5698's here and are seeing Frame Too Long errors on the iSCSI ports on the switch. Running 6.1.1.017 code.



  • 37.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 16, 2009 02:28 PM

    Hi Steve,

    We tried several version of code but I was told it was the actual hardware Rev. and an incomparability with the EQL controller. We saw the same thing you did with frame lengths too long but only on the ports that the EQL was attached not the client side ports. They did a packet trace and must have discovered the issue between the EQL and the switch. Our Nortel Rep/Tech will be coming out this Friday with our replacements so I will ask him and get the answer for you. We received our equipment early spring when this model was first released. Its the first I have ever heard of a hardware rev. within the same model instead of just a code update.

    John Z

    From: SteveH15547 <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 12/15/2009 03:40 PM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 38.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 16, 2009 03:51 PM

    Ours have a 20091018 manufacture date so I'm guessing they're pretty recent hardware rev (if not latest). I would appreciate you updating the thread, it would be helpful. In our case we're not presently seeing peformance issues (although the system is barely on its feet). In my packet captures there are TCP packets with 802.1q tagging headers (we're not tagging on these ports). I'm wondering if that might be incrementing the FTL error counter. Still trying to determine why these packets are there.

    On a somewhat related note, we're not trunking any of our multi-port uplinks (iSCSI or vmNetwork). We've got 4 iSCSI to the Equallogic, 4 iSCSI to each ESX and 3 vmNetwork to each ESX. It's working but I can't wrap my head around why. The ESX's are configured to do NIC teaming with a "round-robin" configuration. Do you guys have a similar set-up?



  • 39.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 16, 2009 04:08 PM

    Steve,

    I have a Manu. dates of 5/2009 and 6/2009. HDW:01 which I think is hardware rev 01 just above this date. We are not trunking either. Right now all my servers and iSCSI are in the same switch with iSCSI on its own Vlan in the same switch. I have six ports from EQL box (3 redundant, across 2 controllers) and we have two ports for each ESX host with 3 paths on two physical nics 3:1. When jumbo frames is enabled and its failing you will know it. When I connect to my SAN and create/delete a test VMFS3 volume from ESX my FCS errors start incrementing by the 10's. The volume will fail to create 9 out of 10 times with Jumbo's on, without its rock solid. We moved my setup to 5520 switches using Jumbo and its solid as well. We tried removing the Vlan on another 5698 and it also failed bad with defaults set. I will let you know what rev we get this Friday.......

    John Z

    From: SteveH15547 <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 12/16/2009 10:51 AM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 40.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 16, 2009 04:34 PM

    We had issues with connection stability using jumbo frames when we first connected our EQL PS4000 up. We had all our MTU sizes (ESX & DELL powerconnect) set to 9000. As far as I know, 9000 is the max for vmware anyway. The connections were showing as "jumbo" on the EQL, but they repeatedly dropped and then reconnected, and SAN operations were disrupted. In the end, we fixed it by raising the DELL powerconnect MTU values to the switch's max (9216). Everything was fine from that point onwards. Still not sure why it wouldn't stay connected at 9000 though which matches the ESX setting.



  • 41.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 16, 2009 04:38 PM

    Ben...

    How did you change the Jumbo Frame size on the PowerConnect switches? I was looking for that option and couldnt find anything...I am sure I just missed it.

    Thanks!



  • 42.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 16, 2009 04:52 PM

    Hi,

    You should just be able to change the MTU value through the switch's web gui, but I used the console. Here is an example of the commands you would use. Assuming you want to apply the MTU value to ports 1-6 on a powerconnect 6224. Also assumes the switch is standalone or trunked and not stacked using 10g stacking ports.

    -


    enable

    config

    interface range ethernet 1/g1-1/g6

    mtu 9216

    exit

    exit

    -


    I also applied the following 3 commands to our iSCSI ports as per EQL documentation best practices:

    storm-control broadcast

    storm-control multicast

    spanning-tree disable



  • 43.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 16, 2009 08:34 PM

    Yeah I am not sure if you can do it from the GUI or not. I am running 5424 switches. I will increase my jumbo frame size on my switches to see if that helps with the problem via the command line.



  • 44.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 17, 2009 05:58 PM

    John,

    Thanks for the update. We're running HW#02 on ours. We haven't seen any showstopping issues but the system is barely on its feet. I'll check with Sys Admins as they get further along. I'd like to know what rev you end up with and whether you see the frame too long errors go away. We're not tagging on those interfaces but we are seeing packets with 802.1q headers on the ports (always from the the other devices on the iSCSI vlan, not the device on the ports being mirrored).



  • 45.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 18, 2009 02:20 PM

    Any word from VMware on a patch yet? I am hoping we see some update in the next two weeks, if we dont we are going to have to wait until January... which stinks.



  • 46.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 20, 2009 05:08 PM

    Steve,

    Our new 5698'a were hdwr version 2 and 3. My SAN & ESX hosts are in the version 2 5698 and my Jumbo frame issues are a thing of the past. I did not have time to check frame errors yet as it was late Friday when we finished up. I did create/delete some 100GB vmfs3 vols and it was lightning fast with no errors at all so you should be ok with your rev 2 hardware. The tech said the issue was with an oscillator in the switch and at higher frame sizes the timing was an issue with certain nic's but only with Jumbo frames so we are going to leave the other closets for now. I still have interm. paths dropping from the SAN to ESX as everyone else in this forum is experiencing so this issue was not related to the main topic of this thread. Hopefully VMWare will have a patch soon to address that.

    John Z

    From: SteveH15547 <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 12/17/2009 12:58 PM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 47.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 23, 2009 09:51 AM

    Well looks like the patch wont be until 2010 now - lets hope its released quickly. Also hopefully the MPIO module from Equallogic.



  • 48.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 23, 2009 12:54 PM

    John,

    Good news on the Jumbo Frame issue. Seems like the 1st Rev of hardware/software these days is what we used to call beta hardware/code (or alpha). Have you seen Frame Too Long errors since the switch upgrade? We've still got them but my admins tell me the system seems to be running fine.



  • 49.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 28, 2009 01:38 PM

    Steve,

    Sorry, just got back from vaca. I did a quick look at my switch stack and there are no errors at all on those ports now.......very nice...I will be doing some P2V conversions today so I will check Tuesday night and see how it looks.

    John Z

    From: SteveH15547 <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 12/23/2009 07:54 AM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 50.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 28, 2009 01:43 PM

    Anyone that has a ticket open with VMware on the ROOT cause of this post, have you heard anything on an ETA for a patch??



  • 51.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 28, 2009 03:21 PM

    John,

    Thanks for the update and congrats on getting things working. What version of code are you guys running on the stack now?



  • 52.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 28, 2009 04:38 PM

    Steve,

    We are running what I think is the latest code v6.1.1.016. I reset all the port stats today so I have a clean slate going forward.

    John Z

    From: SteveH15547 <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 12/28/2009 10:21 AM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 53.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 30, 2009 01:46 PM

    I was just searching for this error online and found this thread...great stuff guys.

    My situation is as follows:

    • 2xPE1950 with vSphere

    • 1xEQL PS400

    • 1xEnterasys N7 switch (will be adding another switch later on)

    I've followed the documents for setting up the EQL with vSphere...but I do notice that I have seen some of these disconnections happening, especially after testing taking part of the network down...it seemed to take a couple of hours for the messages to stop and everything to settle down. Looking in vSphere...in general it seemed ok, but wanted to look into the messages.

    If I find anymore information on this over the next few weeks, I'll post back on here.

    Thanks



  • 54.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 30, 2009 05:11 PM

    We are still waiting for a patch from VMware. Hopefully it will be in January's patch release schedule. Hopefully.



  • 55.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 30, 2009 05:28 PM

    I hope no one minds a quick side question somewhat related to MPIO. I noticed you can only enable vmotion on one of the six paths, is this true or am I missing something? It makes sense to me you should be able to enable down all paths for redundancy during vmotion in case one path fails but this is not the case? Has anyone noticed this also? Thanks.

    John Z

    From: s1xth <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 12/30/2009 12:12 PM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 56.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 30, 2009 07:43 PM

    You shouldnt be using your iSCSI network for vmotion. You need a separate network or vlan for vmotion. Many on here recommend a dedicated pNic, which is also what I recommend on its own layer 2 vlan.



  • 57.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 30, 2009 08:48 PM

    Wow, I didn't realize. I guess I have some reading to do. I have a separate VLAN just for iSCSI separate from our production network (soon to be separate switches). I have two physical nics for the console on the VLAN maybe I will use one of these for vmotion. I have six nics, 2 for console, 2 for production, 2 for iSCSI. Console and iSCSI on same VLAN for isolation fro production network.

    John Z

    From: s1xth <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 12/30/2009 02:43 PM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 58.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 31, 2009 01:50 AM

    I would assume that by console you mean the service console. In the cluster of 3 servers that I run, I also have 6 GbE NIC ports (onboard dual plus add-in quad). Until I added iSCSI, 2 NIC ports (one in each NIC) were for the service console and vMotion with the remaining 4 for production networks (in one vSwitch with multiple VLANs). At that time, I moved 2 ports (one on each NIC) from the production networks to iSCSI as I felt that iSCSI traffic is more important than the production networks as I feel data loss on the storage would be more likely to lead to data loss and corruption.

    Since you have 6 NIC ports, I'd think that this configuration would be best. If you had fewer, it would be acceptable to separate iSCSI from production networks by VLAN if the production network is throttled and with the understanding that it is not a recommended configuration (if not an unsupported). I'm pretty sure this is documented. I will try to find this and post.



  • 59.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 31, 2009 02:03 AM

    vsp_40_iscsi_san_cfg.pdf p30 ("Configuring iSCSI Initiators and Storage", "Setting Up Software iSCSI Initiators", "Networking Configuration for Software iSCSI Storage"):

    "VMware recommends that you designate a separate network adapter entirely for iSCSI." I believe the recommendation is also that vMotion be separated from iSCSI and production networks but is commonly combined with the service console.



  • 60.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 31, 2009 04:08 AM

    Yeah, what I recommend is dedicating two physical nics for your service console vSwitch traffic (i.e. vmnic0 and vmnic1). What you can do is assign the service console port to use vmnic0 as the active adapter and make vmnic1 a standby on the nic teaming tab. Then, make a vmotion port and use vmnic1 as active and vmnic0 as standby. This way your service console traffic will always be on one adapter and vmotion traffic on another yet you still have redundancy in case of a path failure.

    I prefer this setup because when performing a vmotion you can max out your bandwidth on the physical network card and you still want to be able to communicate with your ESX host via the service console port. So, if you only use 1 physical nic for both SC and vmotion traffic in my opinion your taking an unecessary risk.



  • 61.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 31, 2009 04:16 PM

    Never thought to do that but it makes even more sense from a predictability standpoint while still offering the redundancy. Since I only have two virtual ports used on that vSwitch, it probably is already separating it but that behavior is less predicable than the manual assignment. Thanks for the idea.



  • 62.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 03, 2010 03:30 PM

    Regarding the dropping issue...I am only in test currently, but wanting to go live ASAP. What are the implications of this drop in connections I am seeing every few days?

    Will I risk losing data? Will it actually cut connections and disconnect clients/services when this happens?

    I want to go live with this, but if there are chances of data loss, then obviously I will have to delay until this patch arrives.

    Thanks for everyones comments.



  • 63.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 03, 2010 07:31 PM

    Edificom -

    Are you using a 3:1 iSCSI configuration or are you using a 1:1? If you are using a 3:1 setup I would switch to a 1:1 as you wont see any drops in the connections. If you decide to keep it at 3:1 you should not have any data problems as the array will always have other connections to use. I went the safe route and just removed 2 of the vmK ports until a patch is released. Re-adding the ports is a simple process anyway.

    Hopefully we see a patch in this months January patch release.



  • 64.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 07, 2010 01:42 PM

    Well we now have the January patches...and no iSCS fixes mentioned. What gives?!?!



  • 65.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 07, 2010 02:09 PM

    I logged a call with vmware and was told yesterday:

    "At the moment there is no confirmed release date for a fix to this issue."

    I am still trying to see what this really means. They did ask if I would be interested in testing a possible fix, waiting to hear about on that also.

    p.s. unrelated but is there a way to get alerted on the new patches? I added my email somewhere on the vmware site to get alerts but it doesn't seem to be working.



  • 66.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 07, 2010 04:06 PM

    That stinks. I think this is more serious issue with the MPIO than originally expected. Really disappointed. The only good side to this problem is that I can still use MPIO in a 1:1 configuraton and not have drops. Still, there is no reason why it shouldnt work the way they intended.

    I also signed up to be notfied about patch releases via email and also did not recieve any thing. I follow a few VM guys on twitter and I saw them post the patch releases this morning.



  • 67.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 07, 2010 04:20 PM

    Hmmm...I don't want to scare you, but I am using the 1:1 scenario and seeing drops. Not very often, but every couple of days.

    I'm having to hold back on going live with this now, and the client is obviously getting restless.

    I REALLY hope vmware get this sorted soon.



  • 68.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 08, 2010 01:30 PM

    I havent seen any drops in my 1:1 .... they definitly need to get this fixed soon. I have tickets open with Equallogic on this still and they are also being told the same thing, there is a patch 'in the works' but no release date or time frame.



  • 69.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 12, 2010 10:58 AM

    Well I just talked to the vmWare support rep in charge of our case and he confirmed in our 1:1 scenario we do indeed have the same issue. He confirmed it is with vSphere only, so reinstalling to 3.5 is a possible "workaround". other than that he said 1 client at least is testing a possible fix, but they have no idea when a fix could be available, could be weeks or months.

    Not really the news I/we were looking for.



  • 70.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 12, 2010 01:39 PM

    I have an open case with Dell EqualLogic support and yesterday I received an email regarding this issue:

    "I have heard that VMware will not put the fix for this problem in their VMware V4.1 release. Which means that the fix won't be available for quite a while."



  • 71.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 12, 2010 01:48 PM

    So what is the recommendation, that we run 1 path across two nics? I am running production now on 6 path/two nics, scary but its working.

    John Z

    From: tWiZzLeR <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 01/12/2010 08:39 AM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 72.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 12, 2010 01:51 PM

    I couldn't get any recommendation apart from "it works fine in 3.5" but then 3.5 has lower iSCSI performance anyway, so I'd rather not rebuild everything again.

    Amazed this is happening really.

    As before, I am waiting to go into production using the 1:1 approach, but I really don't want to risk this if I am going to have connection drops and possible data issues.

    Then again, waiting months isn't an option either...



  • 73.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 12, 2010 02:17 PM

    I will be writing a blog post on this issue sometime this week. Its really ashame with how highly used iSCSI framework is, how much VMware has PUSHED the use of MPIO in their marketing, and benefits of this rewritten iSCSI iniator. For the people that are using Equallogic boxes, the only option we have is wait and hope Eql comes out with their 3rd Party Plugin, and from what I have heard recently that is not going so smoothly.

    I am surprised people are still seeing drops in a 1:1. I havent seen any drops in my configuration, running Dell PC5424's switches and EQL PS4000. Anything above that though and I get drops connstantly. Lets not forget though, that even with drops there is always an active patch available for communication. If you are seeing a complete drop and loosing your volumes from the host then may have other issues going on. When I was experiencing the drops in a 3:1 I never lost a volume etc. Lets also not forgot that 1000 commands go down each patch before switching to another path in RR.

    I havent heard anything from EQL regarding my ticket that is also open, but I have a feeling we wont be seeing a fix for this until U2.



  • 74.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 13, 2010 04:57 AM

    I'd be curious to know who you are working with at Dell/Eql that's saying that 4.1 won't have the fix. Since even VMware support hasn't been saying exactly when it will be released. However, IMO, I doubt it will take very long for them to release the patch. Also there are EFFECTIVE workarounds, other than going back to 3.5 until they release the fix.

    Let's make sure we're talking about the same issue. If you have multple GbE physical interfaces in one vSwitch for iSCSI and you've followed the guide that Dell issued and is one also found in the wild on the web. VMware reference number: PR484220.

    First thing. The dropped connections occur during extremely LOW levels of IO, if there's IO going on all the time, you don't see the drops. Which is why I suspect that people running 1:1 Vmkernel ports to GbE interfaces aren't seeing the problem as much or at all. Since you are more likely to have traffic going across the links. Also, some customers have lowered the number of IOs in the RR config, from 1000 IOs before switching to as low as 3 or even 1. Also forcing more IO across all the VMkernel ports. In the /var/lib/vmkiscsid.log you will see "no-op" errors and terminated connections. Then later you see the connnections get restablished. On the Dell/EQL event log you will "reset" on iSCSI connections at the same time.

    One sure way to avoid the issue, comes from Vmware support. Don't have more than one GbE interface per vSwtich. If you have three GbE interfaces for iSCSI, create three vSwitches with one Vmkernel port and one GbE interface. If you have already configured it, remove the iSCSI vSwitch and all the iSCSI vmkernels then reboot the ESX server. Then recreate from scratch. Otherwise some people have reported terrible performance.

    -don



  • 75.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 13, 2010 10:04 AM

    Thanks for the info Don, looks really interesting. You said this info came from vmware support?

    Was that from a KB article or via a support tech?

    Cheers,



  • 76.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 13, 2010 01:49 PM

    Don- Very interesting information. Did you get this work around from VMware? If so is there a KB yet?

    I have to agree with you, I saw drops in a 3:1 configuration under very low I/O. The array wasn't in production yet and I was testing it out. With only three VM's running on the array the i/o was only 10iOps so I was still seeing drops. This makes sense that when I switch to a 1:1 the connection is stable because the i/o is low.

    You state that VMware is recommending using TWO vSwitches instead of having one vSwitch with multiple vmK ports under it and the nics teamed across, exactly as stated by the Dell PR document. How does this fix the problem? Have you tried this yet and created two vSwitches each with one nic and each with 3 vmK ports to see if the drops still occurred? I am trying to wrap my head around WHY this configuration would NOT have drops but the other configuration would. All we are doing is separating the vSwitches.

    Thoughts?

    I may try this configuration on one of my lab machines connected to the same PS storage to see I see any drops. Anyone try this yet?



  • 77.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 13, 2010 04:08 PM

    The reason the change "fixes" the problem is that the problem is NOT in the MPIO code. Where it is, I can't say as I'm under NDA. However, if you talk to VMware support they will probably tell you.

    I don't have a KB article number. I didn't even think to ask the VMware support guy. duh. I understood why they suggested it and continued on. I've setup several customer sites this way and had no issues.

    So yes, if you are using 2x GbE interfaces for iSCSI with one vSwitch, you would remove that entirely. Reboot, then create two vSwitches, each with a vmkernel port and one GbE interface. Then re-do the binding of the vmkernel ports to the iSCSI HBA, then enable Round Robin.



  • 78.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 13, 2010 04:27 PM

    Don....

    Would you recommend only having ONE vmK port under each nic and not have multiple vmK ports under each nic? For exp, for each nic have 3 ports to each nic?



  • 79.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 13, 2010 04:41 PM

    To start I would use 1:1. Measure the results, verify that you're not seeing the problem. Then add the additional vmkernel ports and check again.

    -don



  • 80.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 13, 2010 04:44 PM

    Would doing this 1 Vswitch per Physical Nic be the full time solution, or would the recommendation be to put everything back within one vswitch once the patch/fix has been released?



  • 81.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 13, 2010 05:02 PM

    That would be up to each adminstrator. The benefit of one vSwitch is it's less "clutter" in the GUI and slight less memory overhead. Each vSwitch takes up X amount of memory.

    -don



  • 82.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 13, 2010 05:18 PM

    So with that being said this more of a 'fix' for NOW until they fix the problem with one vswitch and multiple nics on a single vswitch?



  • 83.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 19, 2010 03:42 PM

    Update:

    Dell / Equallogic have been very responsive to my recent blog post regarding this problem. I have just recieved an official response from them on my blog at:

    If anyone has any additional questions for EQL I would respond to them there as they will be monitoring any readers questions on the matter, but in the end we are just going to have to wait for a patch from VMware on this issue and use the work around that was posted above. (which I personally will be testing shortly).

    Thanks!!



  • 84.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 20, 2010 06:30 PM

    I reconfigured two of my test servers that connected to a PS4000 yesterday afternoon and set them up with two separate vSwitches and two vmK ports per vSwitch. So far I have not seen any drops in RR/MPIO under my EQL event log and the iSCSI connection times confirm this. I will continue to monitor.



  • 85.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 20, 2010 07:08 PM

    Thanks for the update. I have been considering this option. I have three hosts and 12 vm's in production under the original Dell document. I only see one or two drops during the wee hours of the morning and no degradation or data loss noticed yet. I have these is an HA cluster so I am not to keen on changing the config on all three just yet. If you have no issues over the next few weeks I may consider doing this. We are a K-12 school so I have a week coming up that I could get this done. I was hoping that the patch would be out before I needed to consider this. I would definitely keep my 3:1 but have each physical nic on a different vswitch as suggested. has anyone tried this new config 3:1?

    John Z

    From: s1xth <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 01/20/2010 01:29 PM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 86.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 20, 2010 07:23 PM

    Johnz-

    I actually have these two hosts in an HA setup also, running a couple production VM's. I did one server at a time to make sure there were no issues. I just added another vmK port to each vSwitch for 3 vmk's to 1 vSwitch to see if there are any connection drops for a total of 6 active i/o paths. I will keep you posted on my results through the week after I perform more tests and monitoring.



  • 87.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 20, 2010 07:28 PM

    Thank you! That is awesome.

    John Z

    From: s1xth <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 01/20/2010 02:23 PM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 88.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 20, 2010 08:04 PM

    s1xth - Is the limition by Dell/EqualLogic six active paths per volume on the SAN or six active paths from each ESX host?

    I currently have 3 ESX hosts in a HA environment with each host having two VMkernel ports in RR for a total of 6 active paths.



  • 89.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 20, 2010 08:18 PM

    tWiZzLeR,

    I have your setup now. According to the Dell setup doc 8 was a limit from the VM side. I want to say that the limit on the EQL side was 256/lun. I have 20 connections to one lun on my box now: 1 Backup (VCB), 18 VM (6 paths from each host) and a 1 older VM (ESX4) host with only one nic for iSCSI.

    John Z

    From: tWiZzLeR <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 01/20/2010 03:04 PM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 90.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 20, 2010 08:22 PM

    What limitation are you referring to? The array has a limit of 512 connctions per pool up to 2048 connections using 4x pools. You can add more paths to the storage. However at some point you're not going to go any faster.

    Are you talking about the VMware Round-Robin or the Dell/EQL MPIO beta?

    -don



  • 91.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 20, 2010 08:36 PM

    What limitation are you referring to? The array has a limit of 512 connctions per pool up to 2048 connections using 4x pools. You can add more paths to the storage. However at some point you're not going to go any faster.

    Are you talking about the VMware Round-Robin or the Dell/EQL MPIO beta?

    -don

    VMware Round Robin. On page 9 of Dell's vSphere Configuration Guide it states "VMware vCenter has a maximum of 8 connections to a single volume". Am I misreading this or what does the 8 connections mean?

    I attached the guide.

    Message was edited by: tWiZzLeR



  • 92.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 20, 2010 10:01 PM

    That's a VMware limit, not an array one. All iSCSI vendors have this limit, since it's with the OS. See Page 4 on the attached PDF from VMware.

    -don



  • 93.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 21, 2010 12:17 AM

    It would be very hard to even use more then 8 connections unless you have a huge array setup...so I never looked at this as a 'limitation' espically with the advancements of 10GB ethernet.

    I am now at 1 day + and no disconnects....which is good.



  • 94.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 21, 2010 01:19 AM

    OK, so if 8 active connections is a VMware limit then if I have 3 ESX hosts all accessing the same volume on the SAN then really I can only have 2 VMkernel ports for iSCSI on each host, right? (3 hosts x 2 VMK = 6 connections).



  • 95.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 21, 2010 12:10 PM

    That's 8 connections per host not total number of connections.

    -don

    Sent from my iPhone



  • 96.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 21, 2010 01:27 PM

    That's 8 connections per host not total number of connections.

    -don

    Ahhhh, thanks for the info! I have asked that question before in other threads and did not get a clear answer. So, with my 3 ESX hosts then I can have up to 24 connections to a single shared volume on the SAN (8x3=24). Now that makes more sense!!!



  • 97.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 21, 2010 02:23 PM

    I am currently setup 1:1 as in the picture below. I have two physical NICs in each ESX server for iSCSI traffic andI created one VMkernel port for each vSwitch with Jumbo Frames enabled. I do see a few paths being dropped (maybe once a day) but no actual connectivity loss as I have these setup in RR MPIO and it just fails over to the other path. I can say that since I created a second VMkernel Port, second vSwitch and moved the second nic to that vSwitch that the number of dropped paths has been greatly reduced.

    BTW, the reason that I also have a VM Port Group connected to each vSwitch is so that I can also use guest iSCSI access for VMs that have SQL and Exchange installed in them in order to use the Microsoft iSCSI Initiator and Dell's Auto-Snapshot Manager Microsoft Edition (ASMME) for quiesced snapshots.



  • 98.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 21, 2010 03:13 PM

    I have 2 hosts, one running as before using multiple Nics to one vSwitch, and the other setup as in the above diagram from tWiZzLeR with 2x vSwitches with 1 nic each.

    I will run this in test and see how it goes, and let you guys know what the results are.



  • 99.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 22, 2010 09:54 AM

    Its good that this is proving a good work around solution. However I think VMWare should be at least giving us at least an indication of when they expect to release this patch, given that this has been a bug since Vsphere was released.



  • 100.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 25, 2010 01:10 AM

    Well after about 4.5 days of monitoring, it seems the connections are slightly more stable in this configuration. I have one vmK port that drops out, but this may be because of low i/o on my array, it still shouldnt happen but its better. So in the end, I am still seeing a drop...but much less frequently.



  • 101.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 25, 2010 02:08 PM

    To be honest, I can't see much difference between the fix and without. I have very IO across the SAN as it's not in production, maybe that is why, but I'm getting quite a lot of drops everyday.

    Running out of patience here as not getting a response from vmware and wondering if I am going to have to downgrade everything to 3.5 just to get this working now.



  • 102.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 25, 2010 03:50 PM

    I guess after hearing this I will continue to run the way I have mine setup, 3:1 on a single switch. I see one or two drops per night, usually not from the same host. I now have 10 production VM's across three hosts in HA cluster and no degradation in performance or data loss noticed yet......I am using Backup Exec 12.5 with Virtual Infr. Agent using VCB and have full backups to restore...knock on wood...

    John Z

    From: Edificom <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 01/25/2010 09:08 AM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 103.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 25, 2010 04:17 PM

    I just made a post on my blog about the configuration changes I made. I will be posting my testing results shortly. http://bit.ly/8SYsGD

    I agree with the others, this is definitly not a 'fix' or a GOOD 'workaround' but it is what VMware is teling its customers to do. Hopefully we see a patch to this soon...really soon because this is getting out of hand.



  • 104.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 27, 2010 05:40 PM

    Ok well I tried to escalate this today with vmware and they told me they have found the problem, they are testing the fix currently and it is planned for release in U5. That will be several months away yet.

    I was told it is a problem only with Dell Equallogic series, which was interesting...

    Unfortunately a few months is too long for me, so when I asked what they suggesed as a workaround, they recommended to downgrade the systems to ESX 3.5.

    Not at all what I want to do, but I don't want to risk any data loss with clients either.



  • 105.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 27, 2010 06:23 PM

    What!?! U5!? As in update 5?! That could be a year OFF.....I cant believe it! What do others think about this? I am going to try to get EQL's opinion on this.

    Thanks so much for posting your information.



  • 106.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 27, 2010 06:30 PM

    For me, the issue does not occur that frequently and there is NO way that I'm going back to 3.5! Again, whenever I have had a path fail it instantly switches over to the redundant path.



  • 107.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 28, 2010 10:13 AM

    Yes update 5. This is only what I got from the vmware support guy I talked to about our case, if anyone else can get more/different information then please post here.

    I'm going to try to hold off from the rebuild and see what other feedback people get, maybe from Dell also.

    Rebuilding everything as 3.5 is not at all ideal, especially as I wanted to take advantage of the improved iSCSI performance in vSphere, but not at the risk of data loss.



  • 108.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 28, 2010 11:50 AM

    One odd thing is that I don't see this disconnect issue. Configuration is 3x ESX servers build 208167 each having,

    - One vSwitch for iSCSI with 2 physical NICs

    - 2 vmk iSCSI ports on same subnet

    - 2x Force10 switches with 4-port LAG between them

    - PS4000 with both interfaces active, management interface on seperate network

    - Several LUN configured as RR and IOPS set to 3 on each host

    It's not live yet so mostly idle with occasional massive IO for testing. No errors reported??



  • 109.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 28, 2010 12:00 PM

    When you say 2 vmk ports...is that total? If so..you might not see

    much immediately but you will eventually.

    Add another vmk port so you have two or three PER nic...then you

    should see a lot more drops.

    Sent from my iPhone

    On Jan 28, 2010, at 6:50 AM, J1mbo <communities-emailer@vmware.com



  • 110.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 29, 2010 10:56 AM

    Yes there is 1 vmk per pNIC. Not too sure I understand why n:1 would be beneficial anyway, since both ends are the same @ 2x GbE.



  • 111.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 29, 2010 10:57 AM

    Yes there is 1 vmk per pNIC. I don't understand why n:1 is beneficial anyway, since in the case of the PS4000 both ends are 2x GbE.



  • 112.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 29, 2010 10:58 AM

    Hi, yes there is 1 vmk per pNIC. I don't understand why n:1 is beneficial anyway, since in the case of the PS4000 both ends are 2x GbE.



  • 113.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 29, 2010 11:03 AM

    Hi, yes there is 1 vmk per pNIC. I don't understand why n:1 is beneficial anyway, since in the case of the PS4000 both ends are 2x GbE.



  • 114.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 29, 2010 11:04 AM

    Hi Sixth, yes there is 1 vmk per pNIC. I don't understand why n:1 is beneficial anyway, since in the case of the PS4000 both ends are 2x GbE.



  • 115.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 29, 2010 11:44 AM

    Hi S1xth, indeed I have 1 vmk per pNIC. As I have PS4000, there are two GigE's at each end, so I don't think there is any advantage to increasing that.



  • 116.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 29, 2010 10:40 AM

    So, has anyone got any more information on this as yet?

    Would be interesting to know if vmware/Dell have told you guys anything new, or how the prolonged testing is going.



  • 117.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 29, 2010 02:49 PM

    IMO,you will not have to wait for U5 for the fix to be released.

    No one that I have ever talked to, with a properly configured system has had data loss of any kind. Again, during very LOW I/O periods is when you see the issue on a single path. It recovers, if you have redundant paths you don't lose connectivity to the storage. If you have more VMkernel ports than physical NICs you might actually see the issue occur more often, with more VMkernel ports you're less likley to be able to keep them all running all the time, but with multiple VMkernels per NIC you're less likely to suffer and all paths down scenario.

    The alerts are annoying and generate allot of noise, but I haven't heard that someone has gone down because of it. Has anyone on the list suffered an All Paths Down (APD) due to this bug? Where the log shows that all the ports where disconented by NOOP failures at the same time? (/var/log/vmkiscsid.log)

    -don



  • 118.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 29, 2010 04:43 PM

    I have never had more than one path fail per day per host so far. I have 6 virt/2 Phys nic, so I have 6 paths per host for iSCSI. I have a log entry once per day usually the wee morning hours and only 1 path is lost per host. Most times only 1 host has a failed path overnight.

    John Z

    From: dwilliam62 <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 01/29/2010 09:49 AM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 119.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 29, 2010 05:33 PM

    Hi John,

    OK so that's consistent with what I've seen. Annoying but not causing down time. If it really bothers you, you can try something, reduce the minimum number of IOs that go down each path. The default is 1000. There was a joint storage vendor "paper" (Dell/EMC/NetApp) that suggested changing that to three (3) instead. Since you IO load is so low at that time you're tripping over the bug. Getting more consistent IO going over the available paths will likely reduce the frequency of the alerts.

    <![endif]><![if gte mso 9]>

    *Question

    3: “I’ve configured Round Robin – but the paths aren’t evenly used”*

    Answer: The Round Robin policy doesn’t issue I/Os in a simple “round

    robin” between paths in the way many expect. By default the Round Robin PSP

    sends 1,000 commands down each path before moving to the next path; this is

    called the IO Operation Limit. In some configurations, this default

    configuration doesn't demonstrate much path aggregation because quite often

    some of the thousand commands will have completed before the last command is

    sent. That means the paths aren't full (even though queue at the storage array

    might be). When using 1 Gbit iSCSI, quite often the physical path is often the

    limiting factor on throughput, and making use of multiple paths at the same

    time shows better throughput.

    You can reduce the number of commands issued down a particular path before

    moving on to the next path all the way to 1, thus ensuring that each subsequent

    command is sent down a different path. In a Dell EqualLogic configuration, Eric

    has recommended a value of 3.

    You can make this change by using this command:

    esxcli

    --server &lt;servername&gt; nmp roundrobin setconfig --device &lt;lun ID&gt;

    --iops &lt;IOOperationLimit_value&gt; --type iops

    Note that cutting down the number of iops does present some potential problems.

    With some storage arrays caching is done per path. By spreading the requests across

    multiple paths, you are defeating any caching optimization at the storage end

    and could end up hurting your performance. Luckily, most modern storage systems

    don't cache per port. There's still a minor path-switch penalty in ESX, so

    switching this often probably represents a little more CPU overhead on the

    host.

    -don



  • 120.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 29, 2010 05:58 PM

    Don,

    I have done some research on the IOPS setting mentioned in the multi-vendor iSCSI post, and talked to Dell and EqualLogic support about it, and they don't know where the recommendation for setting this to 3 came from. If anything they recommended 300. I experimented with the setting and found that changing it from the default left a random number as the setting. So in short I believe leaving it at the default is best, and if you are going to change it, do it on a single test volume. There was also another thread on the delltechcenter.com site where someone used IOmeter to test the various settings, and if you are interestd in seeing the results. See the thread at the bottom of the multi-vendor post.

    As you have mentioned the drops don't appear to cause data loss. That is true for us as well.

    -Rob



  • 121.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 29, 2010 06:10 PM

    Rob...

    Even if you let the value at defualt 1000 and reboot the host with RR configured the number changes to a crazy value. That is what I have seen and have read and this supposdly is a reported bug in U1.

    Jonathan



  • 122.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 31, 2010 10:59 PM

    We run mulitple PS storage arrays and we have not had any issue as of yet. We however have not switched to jumbo frames. It seems that the load on the wire seems to be an issue for the drops so once the patch is released we will recreate the vswitches with jumbo frames.

    Keep this below article in mind when using jumbo frames

    the below excerpt is from http://www.networkworld.com/forum/0223jumbono.html

    Although proponents claim larger packets improve performance "on the wire," the impact is relatively insignificant. Compare the efficiency of a 9,000-byte large-packet system with a standards-based 1,500-byte system. The standard packet gets 1,500 data bytes out of 1,538 bytes of frame and overhead, or 97.5% efficiency. The nonstandard packet gets 9,000 data bytes out of 9,038 bytes, or 99.6% efficiency. To put it another way, the difference in time required to send a 1M-byte file is only 0.1 msec.






    -Dwayne Lessner



  • 123.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Feb 01, 2010 04:47 PM

    Hi Dwayne,

    I think several users include myself had the issue even using standard frames. We did not see an increase in the number of drops between non Jumbo vs Jumbo setups on our end.

    John Z.

    P.S. Is the date of the that article correct? Feb. 98? Or is that a typo? With all the recomendation out there and the newest tech I can't beleive that this article is still valid.......they are talkig about 10M connection when we are pushing 10G today.....just wondering...I'm no network expert by any stretch.



  • 124.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Feb 01, 2010 04:54 PM

    I agree, even in my testing without Jumbo frames the drops still occur so this is not part of the problem. Jumbo Frames are fully supported across the storage stack, from the Host (running esxi 4) to the switches, to the EQL Hardware. There is NO problem running Jumbo frames with the MS initator and using MPIO with the HIT kit.

    This IS a VMware problem. Period.



  • 125.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Feb 01, 2010 07:41 PM

    I haven't enabled Jumbo frames and normally notice at least 1 host dropping at least one path overall every early morning. I've got 5 active EqualLogic LUNs and 3 ESX 4 servers with the standard 6 VMKs per host (for a total of 90 paths) and haven't noticed anything other than the usual path drops.

    Correct me if I'm wrong on this as this is my impression and understanding from what I've read about Jumbo frames and iSCSI:

    I believe that Jumbo frames are primarily relevant for squeezing the last few percent of throughput (as stated before by DwayneL; about 2% per that calculation) and slightly relevant for reducing the CPU overhead from the TCP and iSCSI calculations (including checksums/digests). Changes from the iSCSI data digests should be zero (or near zero) as the same amount of data must still be processed.

    There is the possibility that it also helps in reducing the potential delay incurred from these calculations and (probably least of all) reducing the overhead and delays on the network switch. The changes in these last two I believe would be so small as to be indistinguishable from statistical noise, especially considering the delays of a HDD head seek should be several orders of magnitude larger.

    Overall, the difference of everything, except possibly the throughput, should be negligible except in memory to memory (or memory-like, ie SSD) or large sequential transfers where delays like a head seek can be negated.



  • 126.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Feb 05, 2010 05:45 PM

    You are correct about Jumbo Frames. A good example where I've seen it help, is on larger scale Exchange servers. Say 5000 active users. When doing simulations at that level, even a high end server was running out of CPU cycles. Enabling Jumbo Frames freed up the CPU enough to pass jetstress as a first step, then loadgen after that to better verify that the configuration (server/network/SAN) would support the expected load.

    Also in the mix, is the OS itself. In my direct experience, not lab tested, I've seen more improvement with Jumbo Frames enabled on Linux vs. Windows for example.

    What I suggest is, that unless the configuration is known to work well with Jumbo Frames, leave them off initially. Generate a baseline, then enable them and observe the results. Jumbo Frames and standard frames can co-exist on the same SAN without a problem. Since it's done as a per session basis.

    -=Don=-



  • 127.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Feb 17, 2010 10:40 AM

    Has anyone heard/got an updates on this recently?

    It seems from this thread people are running this in production, and although they see drops occuring, no-one has reported any "all paths down" situations. Would that be a fair estimation of the situation now?

    Some update from VMware would be nice for sure.



  • 128.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Feb 18, 2010 01:43 AM

    I'll be attending the Dell/Equallogic User Conference in Dallas, Tx in 2 weeks. I'll be making it a point to speak with the Eql engineers on this issue since VMware is now sayings its a 'equallogic' problem. I'll let you all know what I hear!



  • 129.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Feb 19, 2010 10:48 AM

    Be good for some updates from Equallogic/VMWare - its bad this has dragged on for so long.

    Not to mention the MPIO Module from Equallogic.



  • 130.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Mar 02, 2010 02:06 PM

    Update! - Thought I would share this information with you guys, from what I am being told the fix for this problem will be released in PATCH 5. NOT Update 5 which has been passed on incorrectly by Dell/VMware. We should hopefully see this patch included in the next patch release cycle or following.

    I mentioned the EQL MPIO module in one session already and got a response "we are working on it". I am going to be in an upcoming MPIO dedicated session soon and I will mention more!!



  • 131.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Mar 02, 2010 02:46 PM

    Thanks for the update!

    John Z

    From: s1xth <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 03/02/2010 09:06 AM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 132.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Mar 02, 2010 08:55 PM

    Someone on the ESXi forum mentioned that patch-5 is due mid-March.



  • 133.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Mar 03, 2010 08:42 PM

    Between e-mail messages on Wednesday and Saturday of last week, I did get an update from VMware Support regarding my case.

    1) "unfortunately the Equallogic documentation has an error. There is a limitation in ESX4.0 that causes disconnects under certain iSCSI configurations, including the one they chose. As per that limitation, and our guide (http://www.vmware.com/pdf/vsphere4/r40_u1/vsp_40_u1_iscsi_san_cfg.pdf page 34), you may only use 1-1 mapping of vmkernel ports to physical nics." "This limitation will not be present in the next version of ESX (4.1)."

    Oops. That really causes some issues for Dell/EqualLogic customers when Dell/EqualLogic says something that VMware specifically says not to do. For reference, vsp_40_iscsi_san_cfg.pdf p30,p32 appears to be the same documentation but unfortunately it seems like it merely alludes to the fact that you're only supposed to do 1:1 (VMK:pNIC) rather than being explicit.

    2) "In addition, there is currently a bug open due to a second issue with connectivity with the 100E series from Equallogic. To use multipathing with vmkernel ports on the same subnet, you will need to be on Patch P03 (the latest out), or go to a single vmkernel port and disable multipathing."

    I'm uncertain if this applies to the newer PS models or not but I'm waiting to hear back.

    Thanks for the news on Patch 5.



  • 134.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Mar 10, 2010 01:17 PM

    I recieved some pretty 'solid' information regarding the patch release date. At this time, it is tenatively scheduled for end of March. I dont want to give exact date specifed as I am under NDA. This MAY change, but at this time the patch is ready to go and just didnt make the latest round of patches released a few days ago.



  • 135.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Apr 01, 2010 11:13 PM

    Finally...after months of anticipation and countless posts discussing workarounds, the most recent Patch 5 from VMware for vSphere 4 has been released!! The very first fix mentioned is related to our exact problem, multiple vmk's used for the swiscsi adapter connecting to the same subnet. There is no mention of Equallogic or any vendor names that this effects (as I expected). This patch release just came out late this afternoon, April 1st. I haven't applied this patch yet to my environment, but I will try it on a host in a HA cluster over the holiday weekend.

    Anyone else try the patch yet? I will post any results if I see any difference!! Lets hope the drops are gone for good.



  • 136.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Apr 02, 2010 02:53 PM

    Just installed the update on 3 hosts in my HA cluster....now to watch my email for the drops that will never come again.......I will post back Monday the results.....

    John Z

    From: s1xth <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 04/01/2010 07:13 PM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 137.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Apr 02, 2010 04:28 PM

    S1xth, I'm pretty sure I speak for most out here: Thank you for the notification of the update. I thought I had subscribed myself in the last 1-2 weeks but either I didn't or VMware hasn't sent notification.

    I thought that I heard from someone that the issue was within the software iSCSI initiator and that it was possible for customers with just one vmKernel port assigned to iSCSI to experience what we've been seeing for months and that it got worse as you went from 1 vmk to 2 vmks on independent switches to 2 vmks on one switch to the 4-8 vmks on one switch that the Dell/EQL document recommended in the first place and that most that were seeing the issue had multiple vmks assigned. I also heard that the procedure Dell/EQL recommended was reached after working with VMware and that having 4-8 vmks in a single switch would become the recommended practice once this patch was released.

    Can anyone out there speak on this rumor?

    Also, considering how wide spread this issue is and the severity of its impact, I'm surprised it's not -Update2. Maybe this is coming soon? Thoughts, comments?



  • 138.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Apr 05, 2010 12:03 AM

    grcumm - Glad I could be some help! That patch alert email never worked for me, I was alerted to the release from a twitter feed.

    The issue itself should be corrected with this patch. EQL supports both methods of configuring the vmK's on vswitches, either a single vswitch or multiple vswitches. The main benefit of having a single vSwitch is slightly less memory overhead along with a better looking networking GUI (less clutter).

    I just updated one of my hosts in a cluster this evening, I will be monitoring to see if this issue is resolved, and hopefully we can close this forum post out for good! :smileyhappy:

    -sixth

    www.virtualizationbuster.com



  • 139.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Apr 05, 2010 08:13 PM

    So far so good. Went all weekend through backups and a whole day of normal production activity without a single connection dropped! Looks good so far.

    John Z

    From: s1xth <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 04/04/2010 08:03 PM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 140.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Apr 15, 2010 01:40 PM

    After about a week of normal use, absolutely no issues noticed.

    It looks like it's fixed and the Dell/EqualLogic document will probably be the recommended direction, assuming this patch is installed.



  • 141.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Apr 15, 2010 01:53 PM

    I agree. I have had the patch installed on production servers for over 1 week now without a single issue. This appears to be a solid fix.

    John Z

    From: grcumm <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 04/15/2010 09:40 AM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 142.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 12, 2010 02:33 PM

    Hi everybody

    Some questions:

    - How many LUNs do you have on your EQL?

    - How many VMkernel ports do you have on your vSwitch for the swiscsi traffic?

    - How many ports do you have configured on the vSwitch?

    The reason I'm asking is following:

    I've had the same problems as everybody in this thread. The vSwitch for the swiscsi traffic, created with 8 ports, contains 6 VMkernel ports. We have 16 LUNs on our PS5000. I never saw all LUNs when rescanning and I had to reduce the amount of vmkernel ports. However in the document Equallogic_vSphere.pdf they tested with the same config (6 vmkernel ports, 2 pnics), but with one difference: They had 56 ports configured on the vswitch, however they didn't mentioned how many LUNs the had.

    Is it possible that the vSwitch port value matters? With 16 LUNs and each LUN has 6 possibilities (VMkernel ports) this gives 96 connections.

    I changed today the port value to 120 which is the next higher value after 96. I hadn't any errors so far, but this isn't a scientific explanation.

    Maybe someone with deeper knowledge of the internals of vswitches can explain if there is a correlation between port value and swiscsi connections or not?

    Bye

    Daniel



  • 143.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 12, 2010 02:37 PM

    Good thing there, but I had the drops with just one 'LUN' or Volume behind the PS4000/vSphere so there was definitly enough vSwitch ports.



  • 144.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 12, 2010 02:59 PM

    Ok, if you don't have a vm network group with a couple of VMs on the same vswitch as the swiscsi vmkernel ports (which wouldn't be best practice...) or a couple of service console ports (which doesn't make sense for vsphere...) then the theory was wrong.

    Until now I don't see any drops anymore, but this may change during week. If VMware confirms that this is a known issue, so it is one...



  • 145.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 01, 2010 12:28 AM

    Excellent. Thanks for the suggestion. I am going to set it up this way.

    John Z

    From: tWiZzLeR <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 12/30/2009 11:08 PM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 146.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 01, 2010 12:18 AM

    Thanks for the info. I was doing some reading today on my day off and I think I will combine the vmotion and service console across the 2 onboard nics. I have two dual Intels I split for production/iSCSI on seperate VLANs.

    John Z

    From: grcumm <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 12/30/2009 09:03 PM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 147.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 16, 2009 05:48 PM

    Steve,

    I spoke to our Nortel tech today and this is what he had to say: The issue was specific to 5650 and 5698 HW Rev 0 & Rev 1 and Jumbo Frames. It may or may not have been NIC dependant to the Equal Logic boxes as well; I have read that Jumbo frames worked fine with some products.

    On Nortel 5650, 5698 you only have the option to enable Jumbo's for the entire switch regardless of Vlan's. The MTU size is preset at 9216, which is what it should be. Hope this helps. When we have our new gear I will post a rev. I have 12 of these switches building wide so I hope we don't end up having to replace all the gear......

    John Z

    From: SteveH15547 <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 12/16/2009 10:51 AM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 148.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 10, 2009 12:40 AM

    When you contact VMWare support reference PS484220.

    Equallogic seems to think it's an issue with multiple nics on the same subnet.

    All I can say, from my testing, is that the Microsoft iSCSI initiator doesn't have an issue on the same hardware.

    R710 -> Broadcom 5709 -> Cisco 3750G -> PS6000

    It's inexcusable that this wasn't fixed months ago.



  • 149.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 10, 2009 12:50 AM

    Edit-

    iqn.2001-05.com.equallogic:0-8a0906-7325b9d04-168000cfdd44b155-esxvol1' from initiator '10.10.5.18:55563, iqn.1998-01.com.vmware:vh3psrv3-1903c5bc' was closed. iSCSI initiator connection failure. Connection was closed by peer.

    Happening to me too...FRESH setup...brand new everything. Using Intel nics PT1000's, PS4000x, 2x5424 switches, jumbo frames, 4GB LAG between switches.

    Guess I will open a ticket with EQL.



  • 150.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 10, 2009 01:07 AM

    Yes, I am using 2 of the 4 on-board Broadcom NIC's. That would be very weird if that was the issue, but i have seen stranger things :smileyhappy: I'll see if I can dig up an extra Intel NIC to test with, and let you know.

    Thanks!

    Andrew



  • 151.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 10, 2009 01:07 AM

    I only use the internal Broadcom's for console mgmt, I have two dual port Intel Pro nic's one port on each card for Production VM and 1 on each for iSCSI. I did setup one single Intel NIC with two vmknic's and I still experienced drops. Currently have two physical ports with 3 virt nics bound for 6 paths. Always drops one path usually the highest ip but not always. No effect noticed on performance that I can see yet. But I get 2 drops per day consistently. Jumbo frames or not I get drops. I have two identical servers one setup jumbo one not both drop equally from SAN.

    John Z

    From: s1xth <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 12/09/2009 07:49 PM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 152.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 10, 2009 01:10 AM

    John/Andrew...

    Have either of you opened a ticket yet with EQL??



  • 153.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 10, 2009 01:18 AM

    I opened a call to troubleshoot the Jumbo Frames issue which a open call with Nortel has fixed. As soon as they replace my switches and I am back up I will open another call with Dell EQL. I was hoping an update from VMWare was going to turn up. I was also hoping Dell would release its beta plug in for MPIO as I have Enterprise Plus license for VM. As soon as I have my new 5698's and I am satified that the Jumbo issue is fixed I will start a call with EQL and maybe by then the MPIO plug in will be set.....

    John Z.



  • 154.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 10, 2009 01:20 AM

    I am planning on opening a ticket with them tomorrow. I will be sure to update.

    Andrew



  • 155.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 10, 2009 01:28 AM

    Excellent...lets keep the thread updated with what EQL says to us all. Seems that we are all having the same issues and have done all the usual troubleshooting. I just opened a case via the EQL customer portal site, prolly wont hear anything until tommorrow.

    I will also ask about a beta of the MPIO plugin also, would be great if that would come out soon, it seems like it is taking forever. *not saying that would fix the problem though.

    I really pushed this EQL box to upper management, EQL better fix this...quick.



  • 156.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 10, 2009 06:10 PM

    Not sure if this is related, but for others in this thread, there is a post here on the storage forums regarding [Change the value

    of DefaultTimeToWait parameters. |t-243091]

    http://communities.vmware.com/thread/243091

    According to Andy in the post, there is a patch coming out to fix this problem. There are a couple EQL users posting in this post also with very similiar scenario. They both may be related.....or not. Just wanted to point it out.



  • 157.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 10, 2009 06:47 PM

    Thanks for the update. After reading that thread it looks like we will see an ESX patch soon.......maybe...

    John Z

    From: s1xth <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 12/10/2009 01:10 PM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 158.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 10, 2009 06:55 PM

    Yeah, I just got off the phone with the Equallogic support folks. They indicated that this was a known problem with the SW iscsi initiatior in ESX 4, and that I should contact VMWare support with the reference number PR484220 to be placed on a list to be notified when a patch was released. I will give them a call this afternoon to verify... He also suggested that I review the attached document just to be sure I did not miss anything on my config, which I will also do this afternoon.

    Hope this helps!

    Andrew



  • 159.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 10, 2009 07:37 PM

    I just noticed that everyone is mentioning ESX as their version,

    anyone running ESXi 4? That is what I am running an still seeing the

    issue, so I am assuming this patch will take care of box versions ESX/

    ESXi.

    Sent from my iPhone

    On Dec 10, 2009, at 1:55 PM, tawatson <communities-emailer@vmware.com



  • 160.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 11, 2009 03:36 AM

    I'm running ESXi and see this bug. I too contacted EQL and they referrenced the same VMware bug. Opened a case with VMware and am on the waiting list to be notified when a patch is available.



  • 161.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 10, 2009 08:56 PM

    Just for reference, how many vmKernal ports do you guys have configured for your iSCSI traffic?

    I currently am using the 3:1 setup as descirbed above in the document. I am using two pNics with 3 vmK ports assigned to each nic for a total of (6) paths, enabling RR giving me (6) active i/o paths.

    What is everyone else doing? 1:1 or 3:1 or other?



  • 162.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 11, 2009 01:07 AM

    My setup is exactly that. 6 paths, 3:1 on two nics. One server jumbo one regular. I used one port on two seperate dual nics in case one fails.

    John Z

    From: s1xth <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 12/10/2009 03:56 PM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 163.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 11, 2009 01:27 AM

    John...same thinking here...3:1..separate cards.

    This is definitly not a EQL issue looks more to be vmware. I called

    Dell (I bought my vmware licenses through Dell when I bought my

    server) and they are pushing the case right to vmware with high

    piority. I can't believe this hasn't been fixed yet!!! Iscsi is the

    most used protocol and mpio is being used with all newer deployments.

    Can't believe this wasn't fixed in U1 either. Really not happy about

    this.

    Sent from my iPhone

    On Dec 10, 2009, at 8:07 PM, johnz333 <communities-emailer@vmware.com



  • 164.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 11, 2009 01:47 PM

    Same here, we bought ours through Dell. I was surprised too as iscsi is the meat of why virtualization is so awesome. I have to say that MPIO is fairly new with vSphere 4 but this is no excuse.

    John Z

    From: s1xth <communities-emailer@vmware.com>

    To: <jzolnows@slcr.wnyric.org>

    Date: 12/10/2009 08:27 PM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 165.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 11, 2009 02:08 PM

    John..

    Agreed. I would love to know vmware's thinking behind this. I am hoping this 'patch' will be included in the next round of updates, hopefully this month. I just cant believe with how big iSCSI is, and like you said, shared storage is what makes virtualization that this isnt a top pirioty. Let alone the fact that vmware has been pushing their 'totally rewritten' initiator for iscsi.

    Lets all cross our fingers we see something this month, I dont want to push my deployment back any further, there is no way I can sign off saying that everything is solid and working when I have connections dropping.



  • 166.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 11, 2009 02:13 PM

    Its ridiculous this patch was not included with Patch 1 a few weeks ago.

    I bought this issue up with VMWare not long after Vsphere went gold - VMWare claimed it must be an issue with my storage, or the network switches or config. Quite ironic it turns out to be a VMWare issue after all.

    As you say lets hope this is fixed during December. Hopefully it wont be long before Equallogic also release the MPIO module for Vsphere.



  • 167.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 11, 2009 07:27 PM

    I wanted to run something else by you guys as well. I came across a document that explains that ESX sends 1000 or so commands down each path during the round robin. Some commands are short and therefore do not benefit from this round robin. The document I came across had a suggestion from a Dell EQL tech that said the optimum setting for each path was 3 and you could tweak this in the ESX advanced settings on the Initiator. I would be reluctant to do it while we are dropping paths but its sounds logical.

    The command is this: esxcli -- server

    Date: 12/11/2009 09:08 AM

    Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"



  • 168.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 11, 2009 08:01 PM

    I recommend to anyone with questions about the IOPS setting mentioned in that multivendor blog post should read this thread.

    The gist of the thread is that 3 is probably too low for the IOPs setting, and that 300 might be better, but that it also depends on your storage system.

    We are running ESX4/Dell PE R710/Intel quad port/PS5000s/Cisco 3750/Jumbo Frames with the latest firmware, and generally we don't have any iSCSI connection problems. We are using Round Robin and my iSCSI port group to physical NIC ratio is 1:1, not 3:1, as I had vSphere up-and-running before that TR doc came out of EqualLogic.

    I have set a test volume to 300 IOPs with no detrimental effects. I haven't done any i/o meter testing and cannot tell you if it increased or decreased performance (yet), but the setting has not caused connectivity issues to that volume.

    I have from time-to-time had the "connection closed by peer" error message and an EqualLogic tech informed me that was the system load balancing. It does not seem to actually cause a disconnection, and I've noticed it rarely happens now. I run Virtu-Al's daily PowerShell script report and most days it is clear of errors. If you are not using this script I highly recommend it.[Virtu-Al daily report v2 PS script|http://www.virtu-al.net/2009/08/18/powercli-daily-report-v2/]

    And for those that want to further info about SAN storage for VMware, there is a great post on the subject by Chad Sakac, VMware I/O queues, “micro-bursting”, and multipathing



  • 169.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 11, 2009 08:40 PM

    Interesting RobVM. Thanks for the post.

    I just made the change to my setup and set the iops to 300 like you have yours set to. Is there a way to view what this setting is at to confirm the change was made successfully? I am corrent in saying that the default iops is 1000 correct?

    I think the thing that is throwing us all of is that vmware is now saying that this IS indeed a bug in sw initiator and that is causing these drops and they shouldnt be happening.

    I also want to add, that I believe we all have been told the same thing, that this is the EQL balancing the connections. This is not true, as the connection drop IS being registered on the VMware side as a drop and path's are being lost in result of this.



  • 170.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 11, 2009 08:50 PM

    Yes, the default is 1000 as far as I know.

    To check your setting for a particular volume, use the getconfig instead of setconfig command:

    esxcli –server esxhostname nmp roundrobin setconfig –d naa.xxxxx

    To list all of your devices

    esxcli nmp device list



  • 171.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 11, 2009 09:09 PM

    Thanks...those commands work. I will monitor to see if this helps at all...but I am assuming the connections will still drop.



  • 172.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 11, 2009 09:17 PM

    I have much the same setup as RobVM except with two PS4000s and two DELL Powerconnect 5424 switches with 4 gig ports set up in a LAG between them; 1:1 iscsi port to physical NIC, using three physcial NIC port, jumbo frames, etc.. I see the same drops everyone else is seeing, but Equallogic support told me it was part of the load balancing that the PS4000s do when I contacted them a couple months ago when first setting them up. I've not seen any performance issues or any other issues so far, but our VMs are not that demanding.



  • 173.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 15, 2009 08:02 PM

    I just wanted to mention, I switched from a 3:1 configuration to a 1:1 configuration and the drops haven't happened as much as they have before. I am sure this will hurt my performance somewhat, but at least I dont have connection drops as frequent as before.



  • 174.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 11, 2009 01:10 AM

    I am using 1:1 with 2 nics. No jumbo frames (yet).

    Andrew



  • 175.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Dec 14, 2009 11:08 AM

    We're seeing this exact same issue with ESX4 U1 / DELL R710 poweredge servers / 2 x DELL powerconnect 6224 switches / Broadcom NICs / PS4000. Path redundancy being lost from time to time. I'm glad it's been recognised as a bug and a patch is in the pipeline. We've deployed 3 VMs so far, but I'll probably postpone any further P2V until this is resolved. Good to know I'm not the only one having these issues though.



  • 176.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 12, 2010 02:26 PM

    This latest development about the long ETA on a fix is definitely concerning. We have been in production with 3 VMs since before christmas, and are still undecided whether to proceed with more P2V, or hold at our current position.

    Has anyone ever seen all paths to a LUN drop at the same time? I'm assuming that this would be the (only) nightmare scenario where data loss could occur. I've gone over our SAN logs, and so far we have never seen an instance whereby we completely lost all paths between an ESX4 host and a LUN, but maybe that's just luck. I would of thought that the more paths that are configured, the less of an issue this is, as the iSCSI commands will just be delivered via a different connection in the event of a drop. Even so, the only real safe option has to be with falling back to fixed paths instead of round robin, and losing the performance. I guess that's our most likely course of action until this is sorted.

    I remember an earlier poster mentioned that DELL were pursuing this agressively with vmware, but that seems hard to believe after hearing this latest development. Surprising really, as the EQL SAN is DELL's flagship iSCSI product range, and I know they're pushing it along with the R710 servers and vsphere at the moment. I'm in agreement with others, it really does seem amazing that this issue is still unresolved, as it seems fairly critical to me.



  • 177.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 12, 2010 02:39 PM

    I have two PS4000s in a RAID 6 group; three esxi boxes with three 1:1 iscsi setups each; and two Dell 5424 switches dedicated to the iscsi network. The DELL switches have four ports etherchanneled between them. I do see drops, but I have not seen any problems caused by the drops. Performance is still very good and so is throughput. I have ~17 VMs running on this setup.

    fyi,

    Brian



  • 178.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Jan 12, 2010 02:42 PM

    Sorry we have 11 LUNs defined presently on the two PS4000s.



  • 179.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Apr 15, 2010 02:42 PM

    I've been running the patch on a single ESX host for the past week, and looks like it's done the trick. No disconnects at all. I will be deploying to our remaining ESX hosts at the weekend.

    Thanks everyone (especially s1xth) for all the updates and commentary on this, it's been most helpful (and reassuring) to know that there are others in the same boat.



  • 180.  RE: ESX4 swiscsi MPIO to Equallogic dropping

    Posted Apr 15, 2010 02:49 PM

    I am just glad that everyone is experecing the same results and that the problem has been resolved. It to a while but glad that it is fixed!

    Blog: www.virtualizationbuster.com

    Twitter: s1xth