VMware vSphere

 View Only
Expand all | Collapse all

High Ping Latency to VMs

  • 1.  High Ping Latency to VMs

    Posted Aug 17, 2010 09:53 PM

    I have just recently seen an increase in network latency for a large number of virtual machines on our 4 ESX hosts.

    It seems quite sporadic with spikes of very high ping times. The latency is seen between hosts SC's, VM's on each host and out to the physical network. The latency is most apparent on a couple of servers that run web applications and users are logged out or receive network errors.

    At first my investigation centred around the physical infrastructure but I think I have ruled out any issues there as pinging physical servers on the LAN show <1ms latency with little variation, so the issue seems isolated to the ESX environment.

    I can plug directly in to the ESX host switch and still see some latency when pinging hosts or VMs

    The drop in performance just began over a weekend and I am struggling to find the cause.

    Can anyone advise what could be causing this, and what can I do about it?

    Are there any troubleshooting steps I can take to confirm if this a problem within the ESX environment ?

    Any help much appreciated



  • 2.  RE: High Ping Latency to VMs

    Posted Aug 18, 2010 04:44 AM

    I doubt that there is an issue on the ESX and would think it's more likley a physical issue. what changed? and who changed it? ?:|

    What type of switches are you running? how many uplinks? how are these uplinks configured?

    What policies are you running on the vSwitches? do you have any virtual firewalls you are hopping across?

    Once the packet leaves the physical layer there are no boundaries to slow things down except for you server itself, how are resources on your Hosts? what version of ESX are you running?

    Have you tried a trace? are packets bouncing around?



  • 3.  RE: High Ping Latency to VMs

    Posted Aug 18, 2010 08:58 AM

    I must admit I have thought it unlikely to be an ESX issue but I just want to rule it out altogether.

    As far as I am aware there have not been any changes made, it started over a weekend (early hours of Saturday morning)

    In terms of the network it is fairly uncomplicated to be honest. We have 4 gigabit netgear switches in total. These house our physical servers and physical connections to the ESX hosts. One of these switches houses our gateway connection out and 3 of them connect in to that switch.

    We also have 7 client switches and each one of these uplinks to one of the 4 server switches.

    In terms of vSwitches the policies are default and we are not crosiing any firewalls.

    We are running ESX 4.0.0 and host resources are fine.



  • 4.  RE: High Ping Latency to VMs

    Posted Aug 18, 2010 10:09 AM

    Ok sounds straightforward enough!

    Correct me if I am wrong but you sound like its' a fairly flat network, no VLAN's etc?

    Is the latency perhaps only affecting VM's on one host? how many vmnic's per vSwitch / VDS uplinks?

    Are any of the physical connections flapping? perhaps a faulty cable/nic/port?

    Try eliminating complexity, perhaps removing any secondary uplink interfaces from the vSwitch and testing each step of the way. Could you move one of the ESX Hosts off the switch to another just to test? place vmnics in a predictive failover order on vSwitch and patch the redundant link to another switch/port then change the order of failover this will ensure that packet loss is kept to a minimum and you do not have to wait for network protocols like spanning tree to allow the port. or do it out of hours :smileyhappy:

    Do trace routes sow and traffic bouncing? do you have anything monitoring traffic like MRTG? to give you an idea of the amount of data traversing the network?



  • 5.  RE: High Ping Latency to VMs

    Posted Aug 18, 2010 11:08 AM

    Yes it is a very flat structure, no VLAN's.

    The latency seems to be affecting all host's, each host has 4 vSwitches - Production, DMZ, Cluster and Vmotion

    Vmotion and Production each have 2 vmnics, spread across 2 dual port physical NICs for redundancy.

    From what I have seen on the switches, hosts, physical servers and within vCenter there does not appear to be a faulty NIC or port - although it does have the feel of something churning out unnecessary traffic somewhere.

    I have attempted moving hosts across 3 different switches, and it has made no real difference. As we speak all hosts connections are sat in a completely different switch than yesterday with a different uplink.

    I have noted there is no latency on the Vmotion network - these connections go to a seperate switch with an unroutable subnet.

    I carried out a vmkping between vmnics here and I only saw a max of 1.660 ms over 500 pings, whereas I have seen max responses of 30/40 ms and upward on the production network.

    I only currently use PRTG for network monitoring of all hardware devices at main and remote sites. Problem is I don't really have a baseline of data traversing the network beforehand so I would have no comparison. I will take a look at MRTG. Trace routes do not show anything up



  • 6.  RE: High Ping Latency to VMs

    Posted Aug 18, 2010 12:54 PM

    Ok just so I am clear you have 3 x ESX Hosts connected to a Netgear 24 Port Gig switch. with 4 vSwitches. Production, DMZ, Cluster and Vmotion. I am guessing Cluster is the Management Interface (Old Service Console). 2 Physical nics on vmotion and Production, vmotion non routable. I am also assuming that there are 2 interfaces for DMZ and Production? is DMZ on the same IP Network or isolated to another switch?

    Can you try moving one of the Prod interfaces to another switch to eliminate the switch?

    Is there any latency to the physical servers at all? even the ESX service Console?

    I note you have moved servers around, and I guess that the issue remains, how about the uplinks between switches? How many are there? cables ok?

    What hardware is this on?

    This is very perplexing, have you rebooted the switches? I am guessing these are managed switches, how do they respond to a ping?



  • 7.  RE: High Ping Latency to VMs

    Posted Aug 18, 2010 02:48 PM

    Nearly, 4 ESX Hosts connected to a 24 port Gig switch.

    The cluster vSwitch is used for doubletake replication for the file server and also Lotus domino cluster.

    We only have 1 interface on each host for DMZ and yes it is isolated to another switch. Each ESX host has 2 interfaces for the Production network.

    They were spread with each host having one production connection to each of 2 different gigabit switches, curently they are all in 1 switch (8 production connections).

    I have already tried moving connections amongst 3 different gigabit switches to eliminate any of the switches involved.

    Switches have all been rebooted - client and server.

    Uplinks have all been checked and appear ok, usually 1 uplink per switch apart from the main gateway switch which has a couple of server switch uplinks and a couple of client uplinks in to it. However I can ping a physical server which also sits in this switch with no latency whatsoever.

    I have also done a couple of interesting pings from the service console on ESX2 (see below);

    From ESX02 Console

    --- Brunt-TS ping statistics --- (Physical Server, connected in to another switch)

    500 packets transmitted, 500 received, 0% packet loss, time 499395ms

    rtt min/avg/max/mdev = 0.139/0.265/5.937/0.364 ms

    --- BruntDC02 ping statistics --- (Virtual Server, all hosts production connections currently in the same switch)

    500 packets transmitted, 500 received, 0% packet loss, time 499037ms

    rtt min/avg/max/mdev = 0.202/0.721/39.989/1.960 ms

    I am going to try pinging all the managed switches now and see how they respond individually.



  • 8.  RE: High Ping Latency to VMs

    Posted Aug 19, 2010 02:21 AM

    Running out of ideas here how about the nic's themselves are the auto-negotiate? have they done something stupid like com up at half duplex?



  • 9.  RE: High Ping Latency to VMs

    Posted Aug 19, 2010 11:48 AM

    Yeah i am too, I had another look at isolating switches last night and re-running uplinks.

    I have my suspicions about 1 switch which constantly spikes every 10 or so pings, and its a core Gigabit switch with uplinks and gateway traffic running through it.

    I am going to try and swop it out OOH and continue to investigate.

    I have checked all the ESX nics and they are set to auto negotiate and all look to be running at full duplex.

    Think this must be a physical issue somewhere, but its a nightmare to get to the bottom of. Thanks for all your input its much appreciated



  • 10.  RE: High Ping Latency to VMs

    Posted Sep 22, 2010 07:52 PM

    I am having the same problems with my ESXi hosts. The VM's are also very unresponsive and take FAR too long to reboot. I have all of my servers hooked up with two NICs. One is for our regular network and the other is our iSCSI connection that is connected to a Dell MD3000i. The MD3000i is our shared storage between the ESXi hosts that have all of our vmdk's and vmx's. The iSCSI nework is on its own switch seperate from the normal LAN switches. I am also noticing that when I ping the host on the LAN it gets 1ms some of the VM's get 400ms+. Both of these are on the same NIC so I don't understand how they can be different.

    Has there been any headway with your servers VirtualRed?



  • 11.  RE: High Ping Latency to VMs

    Posted Sep 24, 2010 02:23 PM

    Hi Ned

    It turned out the latency I experienced was down to an issue with network hardware and not the hosts themselves.

    Obviously its not much help but it may be worthwhile looking at your network again and eliminating this as a cause of the problem.

    Hope you get things sorted



  • 12.  RE: High Ping Latency to VMs

    Posted Aug 05, 2011 06:26 AM

    Hi there,

    I know it is a bit late, but could you share with us what the physical problem was?

    We are having high pings on our vm's and not on our physical machines. But as I am reading your posts it seems you can have physical network issues that affect mainly vm's?



  • 13.  RE: High Ping Latency to VMs

    Posted Aug 08, 2011 08:42 AM

    Hi

    From memory I think it was just a single failing switch that was generating broadcast traffic and once we had located that and disconnected we saw normal ping times again.

    We had some rather strange uplinks between physical switches and this caused some confusion at first as we had varying ping times from different VM's.

    This meant I was questioning the virtual environment, but in the end it still turned out to be physical.

    Sorry I can't be of any more help.

    Matthew Oldham | Network Engineer

    T 0161 233 7874 | F 0161 212 2223 | M 07827 851 957

    City Tower, Piccadilly Plaza, Manchester, M1 4BT



  • 14.  RE: High Ping Latency to VMs

    Posted Aug 08, 2011 09:24 AM

    Thanks for the info.

    I don't think the switches are at fault on our side, they are all new Catalyst switches and have just been checked by a network specialist. However: I think he did mention we have above average broadcast traffic.

    Maybe vm's are a bit more sensitive to broadcast traffic than a physical machine, we'll look in to it :smileyhappy:



  • 15.  RE: High Ping Latency to VMs

    Posted Aug 19, 2011 01:47 PM

    So VLANs will help with the broadcast traffic by creating multiple broadcast domains, but if you're seeing a ton of broadcast traffic I would put a sniffer on the network and try to find the source of the broadcast traffic.  Wireshark works well for this on a laptop.

    Also, you can turn on storm-control on the switches, which prevents broadcast sotrms (large spikes in broadcast traffic) from destroying the bandwidth of the network.  Also, check your port-channeling and trunking protocols and make sure that you have dot1q for your trunking protocol and that the ports you have coupled to a single vSwitch are port-channeled together.