ESXi

 View Only
  • 1.  Vmotion and NTP drift

    Posted Jul 10, 2013 08:27 PM

    Hello; Long time lurker here and first time posting.

    I have a couple of clusters that are beefy 128GB of ram each 16+ cores per host and 3 servers per cluster and a few different clusters in different environments. All running off 10G fiber to a V7000 SAN appliance.

    Now I built these clusters a few months back. So far its been fairly painless process. Have a few hard hitting vm's on them with way more to come.

    When I move VM's from host to host my servers are running into NTP drift issues, some nearly 5 seconds. This is on esx5.1 on IBM x series servers. These servers are RHEL/centos 6.3. The guest os's are syncing to a local NTP source not the web.

    No matter how I search the only mention I see about NTP drift are from 2004 for esx 3.x if im not mistaken. Anyone encountered this? Anyone have advice? To be fair I do not have vmware tools on the hosts as I really don't see much benefit to them.

    Thank you.

    -Bruno



  • 2.  RE: Vmotion and NTP drift

    Posted Jul 10, 2013 08:32 PM

    Welcome to the Community - First off best practice is to have the VMs get their time from the ESXi hosts and the host point to a commone NTP source - Do you have your ESXi hosts using the same NTP servers as your VMs?



  • 3.  RE: Vmotion and NTP drift

    Posted Jul 11, 2013 06:43 AM

    "...best practice is to have the VMs get their time from the ESXi hosts and the host point to a commone NTP source..."

    I respect your opinion, but what you can find in VMware KB: Timekeeping best practices for Linux guests is different:

    Note: VMware recommends you to use NTP instead of VMware Tools periodic time synchronization.

    And I agree with KB. If we want to follow the principle of light-weight bare-metal hypervisor, then every non-critical service should be off-loaded from esxi-host. If you put this task (time-syncing of VMs) on esxi-host, you are just adding component which might fail or cause troubles. And problem with esxi-host is much more serious than problem with VM.

    Moreover, with vmware-tools your VMs can never have time of the same "quality" as esxi, because they are (using ntp-language) "one stratum higher" than esxi-host. No time-syncing is perfect (be it vmware-tools or ntp), so if you chain them (first esxi syncs time with ntp, then time in VM is synced using  esxi-time), you are multiplying error. And what is even more important, this error (deviation of local system time with respect to accurate time) on VMs is different from that of esxi-host.

    What I recommend is to set-up own ntp-server (can be for examlpe vMA). Then both esxi-hosts as well as VMs will use it to synchronise their local system time. Any time deviation will be then consistent over the whole local network...



  • 4.  RE: Vmotion and NTP drift

    Posted Jul 10, 2013 11:27 PM

    Hi Bruno,

    VMware Tools is like a set of drivers for a VM. Without them, the VM cannot communicate optimally with the virtual hardware. I highly recommend their installation, but, that's not really going to prevent your VMs from talking to a time server.

    This document has some good settings - have you looked at this at all?

    VMware KB: Timekeeping best practices for Linux guests

    Not being as familiar with *nix as I am with Windows, I wonder if perhaps the system is somehow reverting to the local hardware clock? Maybe you could run a cron job to capture the time source every n minutes and output it to a file? Maybe also run netcat and dump the output to a file? I know I've seen w32tm decide it's going to revert to a hardware or free-running clock from time to time.

    Something that may be of interest to read as well:

    http://www.vmware.com/files/pdf/techpaper/Timekeeping-In-VirtualMachines.pdf

    -Brian



  • 5.  RE: Vmotion and NTP drift

    Posted Jan 27, 2015 03:59 PM

    Hi Bruno,

    I too have found my Linux VMs suffer the same fate when a vMotion occurs.  I have been following VMware's best practices (guest NTP servers sync'ing to a non ESXi host and not a VM time server) as well as shutting of VMtools time sync.  Still no luck.  I have tried many things. with many permutations (differnt OSes, different time servers, using VMtools time sync, etc).  Same issues.  I have found the problem doesn't last as long if you set the minpoll/maxpoll options in the ntp.conf file on the guests, however. But not a solution.

    I have a support case open with VMware and they have not been able to provide any more insight or options.

    Frustrating.

    Charlie



  • 6.  RE: Vmotion and NTP drift

    Broadcom Employee
    Posted Jan 27, 2015 11:08 PM

    The best way to address tight guest time synchronization requirements after a vMotion is to enable one time time sync via the tools.

    To do this, ensure that your ESXi servers are properly synchronized to your upstream NTP time servers.

    This is necessary so that the guest running on the ESXi server will get the proper time from the tools running in the guest.

    The easiest way to get a consistent time across your virtual infrastructure is to have your guests and ESXi hosts use the same upstream NTP servers.

    Next, in the guest configuration file, add:

    time.synchronize.continue = "1"

    time.synchronize.restore = "1"

    time.synchronize.resume.disk = "1"

    time.synchronize.shrink = "1"

    time.synchronize.tools.startup = "1"

    time.synchronize.tools.enable = "1"

    time.synchronize.resume.host = "1"

    These will enable tools time sync for the various events affecting your guest VM.

    Then, be sure that periodic time sync via the tools is disabled:

    tools.syncTime = "0"

    You'll want to do this because you are already running an NTP client in your guest VM.

    The last thing to do in the guest configuration is to set the guest time lag preference.

    If the tools running in the guest detects a time difference greater than this value, the tools synchronization will run and step the guest clock to the correct time.

    pref.timeLagInMilliseconds=500

    For example, this line in your guest configuration sets the preference to 0.5 seconds.

    So, when the guest move onto the target ESXi server, if the time difference in the guest is greater than 0.5 seconds,

    the tools will step the guest clock instead of waiting for NTP to slew (slowly adjust) the time.

    If you don't set this parameter, the default value is 1000 msec or 1 second.

    Finally, in your guest NTP configuration, configure the poll interval to something tight, such as 16 (2^4 = 16) seconds:

    server ntp1.example.com minpoll 4 maxpoll 4

    Increasing the NTP poll frequency will allow the NTP daemon to more quickly slew any minor guest clock adjustments after a vMotion.

    You may want or need to choose different values for the time lag or NTP poll interval depending on your environment and guest workload.



  • 7.  RE: Vmotion and NTP drift

    Posted Jan 28, 2015 03:45 PM

    Thanks for the timely and detailed reply.

    All this makes sense.  I have been in a monitoring world that never checked for NTP output and since doing so a lot of alerts got triggered to what seemed to be random.  After a bunch of trouble shooting and isolation and networking investigation I stumbled upon some VMware KB articles and other Google finds.  (took a while to see it was just virtual machines)

    Up until my posting on this thread I have tried a few things including setting the "tools.syncTime = 0" and updating NTP settings (min/max poll specifically) and engaged VMware support.

    The other thing I tried (at VMware support's request and other KB articles) was to set the "time.synchronize.*" values to "0", not "1".  "1" seems to make more sense to me and validated with your post (again thank you).

    Odd ting is when I shut the VM down and update it's VMX file all the values got reverted after I did a vMotion (or perhaps it was a VM boot up).  I did it again and everything stuck.  Weird (not the problem for this thread but something to note).

    Indeed my ESXi hosts utilize the same in-house corporate time servers as the guests so that all jives.

    A couple of vMotions after updating seems to kinda address the issue.  I will have to do more tests (currently investigating with a single CentOS 6 VM).  After vMotion the offset and jitter values (ntpq -p) do spike but the jitter goes back down to normal levels fairly quickly (much quicker than without this update).  Offset still stays high, but not at critical levels (200ms-ish).  However one vMotion caused the NTP daemon to lose "lock" (i.e. it had to re-sync with the NTP servers completely).

    Perhaps these are all "normal" or expected NTP behavior after such actions.  Again this has probably been going on forever but I was blind to it until the recent change in monitoring solutions.  Perhaps some tweaks to the sensitivity of the monitoring checks is what is needed also.



  • 8.  RE: Vmotion and NTP drift

    Posted Oct 30, 2015 04:06 PM

    It seems I'm dealing with the same issues here.

    VMware tools are not installed and it's RHEL 6.x for client OS

    I have 2 internal NTP servers (based on GPS) for synchronizing with the clients.

    We monitor closely NTP sync parameters (offset / Jitters) from our VMs.

    After a vmotion, jitter parameter is increasing, and after a while the timeserver is marked as 'false ticker'.
    Same thing for the second timeserver, so after approx 30seconds/ 1 minute the client loses its NTP syncronisation since the two servers are flagged false ticker.

    Few minutes later, time that NTPd relaunch the election process, everything come back to green, but between thoses times we got lot of alerts.

    Between this time, Observed offset are up to 200ms.

    Tried to install last vmware-tools with timesync plugin (hoping that synchronizing the guest clock after vmotion suspend/resume will help) but no luck.

    Has someone managed successfully with this problem ?

    For the first part a workaround is to put 127.0.0.1 in ntp.conf with stratum 10, so we never lose the Sync from a "monitoring perspective".

    But that's not a "sexy" solution (and not recommanded by the vmware KB but ... it simply works).  :smileyhappy: