ESXi

 View Only
  • 1.  Software iSCSI Multipath and VMKping tests not working

    Posted Apr 05, 2011 09:44 PM

    I think I am going crazy... I spent much of Sunday trying to figure this out and after rebuilding a ESX host from scratch I get the same problem.

    I have setup the host for software iSCSI multipathing with jumbo frames using these instructions

    http://www.daemonchild.com/iscsi-multipathing-with-jumbo-frames

    I thought it would be a good idea to test connectivity using VMKping, so I ran a test with both vmnics conencted (vmnic2 & vmnic3) and I get a response each vmnic is associated with a single vmk port in Active/unused mode as per the instructions.

    vmk1 -> vmnic2

    vmk2 -> vmnic3

    When I run the test with vmnic3 discoonected pings still work. When I disconnect vmk2 I get no pings.

    Pings are being sent to other devices on the same iSCSI network. I have checked the VLAN config about a million times now and tested vmnic2 in the port vmnic 3 is using (vmnic2 still works in this config)

    I have done some searching and see other users complaining about this, but no official response if this is normal.

    Please help, I have been working on this for too long and think I am chasing a ghost



  • 2.  RE: Software iSCSI Multipath and VMKping tests not working

    Posted Apr 06, 2011 06:16 PM

    Please post your results of:

    1. vmkping -D with all NICs connected, vmnic3 disconnected and vmnic2 disconnected.

    2. esxcfg-nics -l

    3. esxcfg-vswitch -l

    4. esxcfg-vmknic -l

    5. esxcli swiscsi nic list -d  <iscsi hba name> eg. esxcli swiscsi nic list -d vmhba33



  • 3.  RE: Software iSCSI Multipath and VMKping tests not working

    Posted Apr 06, 2011 07:39 PM

    beovax,

    I ran through the setup in my lab.  Theres is a typo in article.

    # esxcli swiscsi nic add -n vmk0 -d vmhba35 >> should be # esxcli swiscsi nic add -n vmk1 -d vmhba35. vmk0 is management by default.
    # esxcli swiscsi nic add -n vmk1 -d vmhba35

    At first I had the same symptoms as you did.

    I disconnected one of my NICs and did a vmkping -D.  With ping failing for the vmk for that portgroup.

    I re-nenabled the NIC, did a vmkping -D again. All up.

    I then disabled the second NIC and vmkping worked for everthing. All up.

    I went back to disabling the first nic again but ping worked this time.

    I have a feeling it was my arp tables but did not check it until after the fact.  --# esxcli network neighbor list

    I will remove the setup and run through the setup again and report back.



  • 4.  RE: Software iSCSI Multipath and VMKping tests not working

    Posted Apr 06, 2011 08:02 PM

    Hi,

    Thanks for the test. To confirm I adjusted the vmk ports to the ones which are being used in our configuration. I think in our case it is actually vmk2 and vmk3

    I won't be able to test again until Monday, but I will update the thread.

    Thanks again for taking a look



  • 5.  RE: Software iSCSI Multipath and VMKping tests not working

    Posted Apr 06, 2011 08:08 PM

    beovax,

    Its working as intended. It makes sense when you think about it. If that NIC fail that path is down and so is the vmk port, but the second path will be up which should still provide access to storage.

    Its definately arp refresh why you can ping a specific vmk IP when you disable the NIC.

    I had my nic disabled during my first reply and just checked the arp tables again.  My interface is missing and I get no result from pinging that vmk porgroup.

    I'll post my example below:

    .204 is not in the arp tables and can't be pinged.

    ~ # esxcli network neighbor list
    Neighbor    Mac Address        vmknic  Expiry(sec)  State
    10.9.8.1    00:50:56:c0:00:01  vmk0    1198
    10.9.8.202  00:0c:29:ed:b7:52  vmk0    1148
    10.9.8.205  00:0c:29:22:a3:46  lo0     4294962794
    ~ # vmkping -D

    -

    --edited:

    PING 10.9.8.203 (10.9.8.203): 56 data bytes
    64 bytes from 10.9.8.203: icmp_seq=0 ttl=64 time=0.648 ms
    64 bytes from 10.9.8.203: icmp_seq=1 ttl=64 time=0.319 ms
    64 bytes from 10.9.8.203: icmp_seq=2 ttl=64 time=0.385 ms

    --- 10.9.8.203 ping statistics ---
    3 packets transmitted, 3 packets received, 0% packet loss
    round-trip min/avg/max = 0.319/0.451/0.648 ms

    PING 10.9.8.204 (10.9.8.204): 56 data bytes

    --- 10.9.8.204 ping statistics ---
    3 packets transmitted, 0 packets received, 100% packet loss

    ---edited

    ~ #

    .204 is back in the arp table after reconnecting and can be pinged.

    ~ # esxcli network neighbor list
    Neighbor    Mac Address        vmknic  Expiry(sec)  State
    10.9.8.1    00:50:56:c0:00:01  vmk0    1180
    10.9.8.202  00:0c:29:ed:b7:52  vmk0    1167
    10.9.8.203  00:50:56:73:e8:71  vmk0    224
    10.9.8.204  00:50:56:73:cf:e9  vmk0    1193
    10.9.8.205  00:0c:29:22:a3:46  lo0     4294961801

    ~ # vmkping -D

    --edited

    --- 10.9.8.203 ping statistics ---
    3 packets transmitted, 3 packets received, 0% packet loss
    round-trip min/avg/max = 0.376/0.398/0.431 ms

    PING 10.9.8.204 (10.9.8.204): 56 data bytes
    64 bytes from 10.9.8.204: icmp_seq=0 ttl=64 time=0.583 ms
    64 bytes from 10.9.8.204: icmp_seq=1 ttl=64 time=0.387 ms
    64 bytes from 10.9.8.204: icmp_seq=2 ttl=64 time=0.309 ms

    --- 10.9.8.204 ping statistics ---
    3 packets transmitted, 3 packets received, 0% packet loss
    round-trip min/avg/max = 0.309/0.426/0.583 ms

    PING 10.9.8.1 (10.9.8.1): 56 data bytes
    64 bytes from 10.9.8.1: icmp_seq=0 ttl=128 time=0.275 ms
    64 bytes from 10.9.8.1: icmp_seq=1 ttl=128 time=0.596 ms
    64 bytes from 10.9.8.1: icmp_seq=2 ttl=128 time=0.524 ms

    --- 10.9.8.1 ping statistics ---
    3 packets transmitted, 3 packets received, 0% packet loss
    round-trip min/avg/max = 0.275/0.465/0.596 ms

    --edited

    ~ #

    When you say that you get no ping do you mean to that specific vmk port IP or to all vmk portgroup on that vSwitch?

    The only other method I could see is to set the second NIC in each vmk portgroup to standby instead of unused.  That way if the active NIC fail it sends traffic for both portgroups over that standby NIC.  It works but I do not know how efficient that is.



  • 6.  RE: Software iSCSI Multipath and VMKping tests not working

    Posted Apr 06, 2011 08:52 PM

    As you say an active standby configuration may get around the ping issue but this isn't supported for sw iscsi multipathing. The two vmnics must be in active/unused if you are using a single switch with 2 port groups and 2 vmknics.

    I am pinging from the host with the disabled vmnic to a vmk port in another host. Both vmk ports on the same network so in theory the pings should go out of the second vmk port when it sees the "primary" has lost connectivity

    I think the test you have ran is to ping the vmk port with the disabled vmnic which as you say won't work

    In these tests the paths to the storage seem ok it's just vmkpins which fail



  • 7.  RE: Software iSCSI Multipath and VMKping tests not working

    Posted Apr 06, 2011 09:51 PM

    Correct. My test did not involve pinging outside of the host. I just did a diagnosis ping.  In theory the ping should go over the second vmk port and update the arp table as required. I'll see about testing that tomorrow.



  • 8.  RE: Software iSCSI Multipath and VMKping tests not working

    Posted Apr 07, 2011 08:33 PM

    Hi Robert

    Did you get a chance to test? I found some people at the end of this guide with the same problem.

    http://www.yellow-bricks.com/2009/03/18/iscsi-multipathing-with-esxcliexploring-the-next-version-of-esx/?replytocom=11539

    They are saying on the first vmk port on iscsci subnet can ping regardless of how many vmk ports are on that subnet

    Cheers

    Michael



  • 9.  RE: Software iSCSI Multipath and VMKping tests not working

    Posted Apr 07, 2011 09:59 PM

    Yes. I did some more testing. I was in the middle of composing this message but got caught up with some other system that I manage.

    I was able to confirm the issue going to an outside node.  Its it definately on the ESXi host side.

    Here is what I see happening:

    1. When the host choose one of the vmk (seems the first created) and store it in the arp (neighbor list) for the destination node (neighbor) it does not refresh the list when the interface is down; even if there is a second interface connected to that destination network.

    2. The neigbhor address is set to expire for 20minutes. After that 20minutes a ping did not work.  ESXi did not add the second iSCSI vmknic to the arp list.

    3.  esxtop networking view (press N) shows that its still trying to send traffic over the disconnected nic listed as "failback"

    Here comes the big piece.  We cannot trust vmkping/ping with this setup.

    I connected to my netappsim.  Disconnected one of the NIC.  vmkping/ping fails to all IP on the iscsi segment, but I was able to create/clone VMs and browse on the iscsi lun.  It showed the second path (2 paths, I had a total of 4) was up and active path.

    The network performance chart and esxtop showed both NICs were actively reading and writing when both NICs were connected.

    It works, but trouble shooting with ping fails.



  • 10.  RE: Software iSCSI Multipath and VMKping tests not working

    Posted Apr 07, 2011 10:23 PM

    Thank you for your detailed findings. This is exactly the issue we see. Spent hours in a horrible datacenter trying to get vmkpings to work for no reason :smileyhappy:

    I will probably log a call with vmware tomorrow to see if they are aware of this, since vmkpings are a recommendation for checking the network health could be worth adding this behaviour to the documentation to save other people time chasing problems that are not there.

    I'll post back here with their response

    Thanks again