vCenter

 View Only
Expand all | Collapse all

operation timed out... or did it?

  • 1.  operation timed out... or did it?

    Posted Jan 30, 2007 08:13 PM

    My problem is the following:

    I recently started upgrading to VI3, i'm currently running VC 2 and have two esx 3.01 hosts and two esx 2.x host.

    Until this morning everything was working fine and i was about to continue upgrading the esx hosts.

    But then strange things started to happen. I could no longer edit VM settings, use VMotion, power on VM's etc, anything and i everything i try to do, more than often result in Operation timed out, after waiting for 10-15 minutes and watching it being "in progress".

    Funny thing though, if i connect directly to my esx host through the VI client, it all seems fine.

    Some examples:

    1. i want to do some hot migrations. I choose a machine, try to migrate it in VC, it goes to 10% then stalls, at the same time when looking on the hosts through the VI client connected directly to it, it all shows up with no error msg's and completes after less than a minute, but in VC it just sits on 10% for 15 min, then times out and becomes orphaned in VC.

    2. i want to increase memory on a powered off VM in vc, i choose edit settings, increase memory, click ok, then it says "in progress" for fifteen min, then times out, but watching it through VI client connected directly to host, it works like a charm.

    i'm quite a novice on VMware as you might have already guessed, and this is my first post, so please be gentle :smileywink: anywho, any help at all on resolving this problem would be highly appreciated.

    Many thanks,

    John



  • 2.  RE: operation timed out... or did it?

    Posted Jan 31, 2007 12:35 AM

    you could try either restarting the virtual center agents on the hosts affected or removing and readding the hosts in virtual center as a first step.

    Is this happening to all of your hosts or just the 2.5 or just the 3.x hosts?



  • 3.  RE: operation timed out... or did it?

    Posted Jan 31, 2007 08:22 AM

    Ok, here's what i did, i ran

    service mgmt-vmware restart

    service vmware-vpxa restart

    Those commands did bring back the servers listed as orphaned after vmotion operations, but did not change the fact.

    I will try removing the host from VC soon, it seems that it needs to be in maintainence mode to be removed, and that's going to be a be a pain, to get it into maintainence mode i need to migrate all running VM's from it, and i can only do 2 at the time, and they take 15 min to time out before a can start new vmotions.

    To answer your question:

    The problems i'm having only seem to be on one of the upgraded hosts, lets call my hosts ESX1-4, 1 & 2 are running 3.01, 3 & 4 are 2.x.

    Host 1 is the one with the problems. 2 seems to be working fine, except on VMotions, when the following scenario happen:

    A migration from 1 -> 2 progress will stall after 10% and stay on 10 'til it times out.

    VMotion from 2 -> 1 progress bar will go to 90% and stay there 'till time out.

    VMotion from 3 & 4 works fine to 2 (and probably 1 as well, can't remember, don't want to do any more till it's solved, since migration from old hosts are irreversible)

    Edit hardware settings, power on hosts etc. all works good on 2,3 & ,4, so as stated above, it seems to be a problem with host 1.

    I'll let you know how the removal/readding of the host in VC went once it's done.



  • 4.  RE: operation timed out... or did it?

    Posted Jan 31, 2007 08:54 AM

    I have seen symptoms like this on our systems, what fixed the issues for me was to stop and start the virtual centre service on your virtual centre server, so that the whole of VC is refreshed.



  • 5.  RE: operation timed out... or did it?

    Posted Jan 31, 2007 10:56 AM

    Starting and stopping virtual center in my case does nothing. I have done that so many times now i lost count :smileywink:

    Anywho, as suggested i tried to remove the host and then readd it.

    Removed it sucessfully, but now i'm not able to readd it !! it just shows up as disvonnected, and when trying to connect it, the add host wizard comes up, and when done, it says: Not enough licenses to add host or something along thoose lines.

    Funny thing is thouh, i have 4 available cpu licenses for esx standard and vmotion etc. i've restarted the license server and virtual center server, nothing works.

    I tried to change license server configuration in VC to connect to IP instead of localhost, can't do it! not enough license error message pops up when trying to do that also.

    Something seems to be seriously wrong with virtual center / license server ...

    Any help at all on how to proceed is highly appreciated!!



  • 6.  RE: operation timed out... or did it?

    Posted Jan 31, 2007 07:03 PM

    Hi John,

    I looked in my VC server as I added a host earlier today. Here is what I see in my VC logs and despite the errors, it was added and licensed successfully. Can you look in your logs and post the entries when you try to add your host back into VC?

    ... does not like SSL cert when I add it with just the host name vs FQDN.

    then...

    "failed to connect to host xxxx, check that authd is running correctly (lin/connect error 11). then about 30-45 seconds later it states

    ..synchronizing host xxxx

    ..updating vpxa for host xxxx

    and then a few more synchronizing host xxxx.



  • 7.  RE: operation timed out... or did it?

    Posted Feb 01, 2007 06:12 AM

    hi,

    My logs are full of errrors, when trying to add the failing host:

    Virtual Center log:

    \[2007-02-01 06:39:32.764 'App' 4844 info] \[VpxLRO] -- BEGIN task-813 -- group-h4 -- vim.Folder.addStandaloneHost

    \[2007-02-01 06:39:32.936 'BaseLibs' 4844 warning] SSLVerifyCertAgainstSystemStore: The remote host certificate has these problems:

    \* A certificate in the host's chain is based on an untrusted root.

    \[2007-02-01 06:39:32.936 'BaseLibs' 4844 warning] SSLVerifyIsEnabled: failed to read registry value. Assuming verification is disabled. LastError = 0

    \[2007-02-01 06:39:32.936 'BaseLibs' 4844 warning] SSLVerifyCertAgainstSystemStore: Certificate verification is disabled, so connection will proceed despite the error

    \[2007-02-01 06:39:32.936 'BaseLibs' 4844 warning] SSLVerifyCertAgainstSystemStore: The remote host certificate has these problems:

    \* A certificate in the host's chain is based on an untrusted root.

    \[2007-02-01 06:39:32.936 'BaseLibs' 4844 warning] SSLVerifyIsEnabled: failed to read registry value. Assuming verification is disabled. LastError = 0

    \[2007-02-01 06:39:32.936 'BaseLibs' 4844 warning] SSLVerifyCertAgainstSystemStore: Certificate verification is disabled, so connection will proceed despite the error

    \[2007-02-01 06:39:33.045 'BaseLibs' 4844 warning] SSLVerifyCertAgainstSystemStore: The remote host certificate has these problems:

    \* A certificate in the host's chain is based on an untrusted root.

    \[2007-02-01 06:39:33.045 'BaseLibs' 4844 warning] SSLVerifyIsEnabled: failed to read registry value. Assuming verification is disabled. LastError = 0

    \[2007-02-01 06:39:33.045 'BaseLibs' 4844 warning] SSLVerifyCertAgainstSystemStore: Certificate verification is disabled, so connection will proceed despite the error

    \[2007-02-01 06:39:33.045 'BaseLibs' 4844 warning] SSLVerifyCertAgainstSystemStore: The remote host certificate has these problems:

    \* A certificate in the host's chain is based on an untrusted root.

    \[2007-02-01 06:39:33.045 'BaseLibs' 4844 warning] SSLVerifyIsEnabled: failed to read registry value. Assuming verification is disabled. LastError = 0

    \[2007-02-01 06:39:33.045 'BaseLibs' 4844 warning] SSLVerifyCertAgainstSystemStore: Certificate verification is disabled, so connection will proceed despite the error

    \[2007-02-01 06:39:33.389 'BaseLibs' 4844 warning] SSLVerifyCertAgainstSystemStore: The remote host certificate has these problems:

    \* A certificate in the host's chain is based on an untrusted root.

    \[2007-02-01 06:39:33.389 'BaseLibs' 4844 warning] SSLVerifyIsEnabled: failed to read registry value. Assuming verification is disabled. LastError = 0

    \[2007-02-01 06:39:33.389 'BaseLibs' 4844 warning] SSLVerifyCertAgainstSystemStore: Certificate verification is disabled, so connection will proceed despite the error

    \[2007-02-01 06:39:33.389 'BaseLibs' 4844 warning] SSLVerifyCertAgainstSystemStore: The remote host certificate has these problems:

    \* A certificate in the host's chain is based on an untrusted root.

    \[2007-02-01 06:39:33.389 'BaseLibs' 4844 warning] SSLVerifyIsEnabled: failed to read registry value. Assuming verification is disabled. LastError = 0

    \[2007-02-01 06:39:33.389 'BaseLibs' 4844 warning] SSLVerifyCertAgainstSystemStore: Certificate verification is disabled, so connection will proceed despite the error

    \[2007-02-01 06:39:33.498 'BaseLibs' 4844 warning] SSLVerifyCertAgainstSystemStore: The remote host certificate has these problems:

    \* A certificate in the host's chain is based on an untrusted root.

    \[2007-02-01 06:39:33.498 'BaseLibs' 4844 warning] SSLVerifyIsEnabled: failed to read registry value. Assuming verification is disabled. LastError = 0

    \[2007-02-01 06:39:33.498 'BaseLibs' 4844 warning] SSLVerifyCertAgainstSystemStore: Certificate verification is disabled, so connection will proceed despite the error

    \[2007-02-01 06:39:33.498 'BaseLibs' 4844 warning] SSLVerifyCertAgainstSystemStore: The remote host certificate has these problems:

    \* A certificate in the host's chain is based on an untrusted root.

    \[2007-02-01 06:39:33.498 'BaseLibs' 4844 warning] SSLVerifyIsEnabled: failed to read registry value. Assuming verification is disabled. LastError = 0

    \[2007-02-01 06:39:33.498 'BaseLibs' 4844 warning] SSLVerifyCertAgainstSystemStore: Certificate verification is disabled, so connection will proceed despite the error

    \[2007-02-01 06:39:34.139 'App' 4844 warning] ============BEGIN FAILED METHOD CALL DUMP============

    \[2007-02-01 06:39:34.139 'App' 4844 warning] Invoking \[createUser] on \[vim.host.LocalAccountManager:ha-localacctmgr]

    \[2007-02-01 06:39:34.139 'App' 4844 warning] Arg user:

    (vim.host.LocalAccountManager.AccountSpecification) {

    dynamicType =

    \[2007-02-01 06:39:14.876 'ha-eventmgr' 14687152 info] Event 131 : User root logged out

    \[2007-02-01 06:39:14.876 'VmdbAdapter' 14687152 verbose] Removed vmdb connection /db/connection/#44/

    \[2007-02-01 06:39:23.992 'App' 102939568 verbose] Accepted authd connection from: 192.10.10.141:2137

    \[2007-02-01 06:39:24.191 'TaskManager' 3076452480 info] Task Created : haTask--vim.SessionManager.login-4151

    \[2007-02-01 06:39:24.195 'Vimsvc' 3076452480 info] \[Auth]: User root

    \[2007-02-01 06:39:24.196 'ha-eventmgr' 3076452480 info] Event 132 : User root@192.10.10.141 logged in

    \[2007-02-01 06:39:24.197 'TaskManager' 3076452480 info] Task Completed : haTask--vim.SessionManager.login-4151

    \[2007-02-01 06:39:24.409 'TaskManager' 12823472 info] Task Created : haTask-ha-folder-root-vim.host.LocalAccountManager.createUser-4152

    \[2007-02-01 06:39:24.409 'TaskManager' 12823472 info] Task Completed : haTask-ha-folder-root-vim.host.LocalAccountManager.createUser-4152

    \[2007-02-01 06:39:24.409 'Vmomi' 12823472 info] Activation \[N5Vmomi10ActivationE:0xa501d88] : Invoke done \[createUser] on [vim.host.LocalAccountManager:ha-loc

    alacctmgr]

    \[2007-02-01 06:39:24.410 'Vmomi' 12823472 info] Throw vim.fault.AlreadyExists

    \[2007-02-01 06:39:24.410 'Vmomi' 12823472 info] Result:

    (vim.fault.AlreadyExists) {

    name = "vpxuser"

    msg = ""

    }

    \[2007-02-01 06:39:24.628 'TaskManager' 102939568 info] Task Created : haTask-ha-folder-root-vim.host.LocalAccountManager.updateUser-4153

    \[2007-02-01 06:39:24.682 'TaskManager' 102939568 info] Task Completed : haTask-ha-folder-root-vim.host.LocalAccountManager.updateUser-4153

    \[2007-02-01 06:39:24.849 'TaskManager' 14687152 info] Task Created : haTask--vim.AuthorizationManager.setEntityPermissions-4154

    \[2007-02-01 06:39:24.852 'TaskManager' 14687152 info] Task Completed : haTask--vim.AuthorizationManager.setEntityPermissions-4154

    \[2007-02-01 06:39:24.852 'Vimsvc' 14687152 info] \[Auth]: User vpxuser

    \[2007-02-01 06:39:25.969 'ha-eventmgr' 18717616 info] Event 133 : User root logged out

    \[2007-02-01 06:39:25.970 'VmdbAdapter' 18717616 verbose] Removed vmdb connection /db/connection/#45/

    \[2007-02-01 06:39:28.696 'Locale' 18717616 warning] Default resource used for 'FirewallInfo.activeDirectorKerberos.label' expected in module 'host'.

    \[2007-02-01 06:39:28.696 'Locale' 18717616 warning] Default resource used for 'FirewallInfo.kerberos.label' expected in module 'host'.

    \[2007-02-01 06:39:31.023 'EnvironmentBrowser' 10021808 info] Hw info file: /etc/vmware/hostd/hwInfo.xml

    \[2007-02-01 06:39:31.030 'EnvironmentBrowser' 10021808 info] Config target info loaded

    so as it seems to me, the host itself synchs fine, therefore i can connect to it firectly with the VI client, but when trying to add it to VC, it doesn't ....



  • 8.  RE: operation timed out... or did it?

    Posted Feb 01, 2007 10:37 AM

    some updates:

    I've ried to repair the VC installation. I could readd all hosts except esx1, which is still failing, not enough licenses ... etc. I then tried a complete reinstall off VC and license server, still same problem, i can add all hosts except esx1.

    i'm completely lost at this point :heart:



  • 9.  RE: operation timed out... or did it?

    Posted Feb 01, 2007 11:28 PM

    John, this may be a way of determining if it is a problem with the host or with your license server or VC server or the license file itself.

    try removing all of your hosts from vc. Then look to make sure you do not see any remnants of that host anywhere in VC like the datastore view for example. Then try adding esx1 (the problem one) as the first host if you haven't already. If you've already tried that, do you get the same error?

    In the log was this "2007-02-01 06:39:45.483 'App' 4844 error] \[LicMgr] Bad host information for host-1432 Not sure what to make of this?

    Another speculation >> Is the time synced on this host with VC? ntpd working?

    Finally, if you haven't already, I would try support. Sorry I haven't been of much help.



  • 10.  RE: operation timed out... or did it?

    Posted Feb 02, 2007 08:35 PM

    removed all hosts, readded as you said, result: same error

    have no idea what that line in the log means, tried google'ing on it, didn't get any results.

    ntpd is working only if take iptables down, but time is synchronized (same on all servers atleast)

    been in contact with vmware support, filed a SR same day as i started this thread, still haven't heard from them accept from a message on my vocimail saying they would try me on my mobile (which they haven't) :/

    you have been plenty of help though, since i ran out of ideas on how to keep going myself 2 days ago :smileywink:

    first thing monday morning i'm gonna file a new SR with vmware, this time marked as critical, maybe then they put some effort in trying to help me.

    thanks for all the help.

    /john



  • 11.  RE: operation timed out... or did it?

    Posted Feb 15, 2007 09:23 PM

    Did you ever find out anything about the not enough licenses error? I'm having the same problem all of a sudden.



  • 12.  RE: operation timed out... or did it?

    Posted Jan 31, 2007 01:00 PM

    Hi John

    I did not mention it it my last post as it seem so drastic, but the one time when restarting the VC service did not work and we also had the license error, as a last resort we removed all hosts from the database and uninstalled VC from the server re-installed and added them all back in again, as I say it was a last resort so if someone here as a solution before doing this great, but if you run out of ideas it will be worth a go, we have had no problems since, and it was about a month ago now.



  • 13.  RE: operation timed out... or did it?

    Posted Jan 31, 2007 02:38 PM

    Thats it probably what i'll end up doing eventually, as i last resort. First i wanna give this thread a day or two and see if it can be solved, but i'm kinda running out of hope :~



  • 14.  RE: operation timed out... or did it?

    Posted Feb 19, 2007 09:19 PM

    How are things going?

    I have the same problem.

    Regards

    Daniel



  • 15.  RE: operation timed out... or did it?

    Posted Jun 20, 2007 10:12 PM

    Did you ever get any resolution to this? Does anyone else know any solutions? I've got exactly the same scenario - three hosts, of which one can't join the VC server.

    Regards,

    Ed.



  • 16.  RE: operation timed out... or did it?

    Posted Jul 12, 2007 05:10 AM

    Can you check to see if there are any lock files on the host? I had a similiar issue in the past and it was related to the passwd file having a stale lock floating around

    check for /etc/ptmp or WRITELOCK files hanging around

    removal allowed me to rejoin the host to VC.

    Just a thought since this thread is not getting much traction



  • 17.  RE: operation timed out... or did it?

    Posted Oct 25, 2007 01:56 PM

    I had the same problem today, it was a network link which went up and down, and was the one with VMotion on it.

    As soon as I fixed the issue, the Service Console cpu time went to 0% from 100% and i had no more timeouts.

    I'm planning to install an additional card to put the VMotion network in active/standby just to prevent this problem on production servers :smileyhappy:



  • 18.  RE: operation timed out... or did it?

    Posted Nov 16, 2007 08:54 AM

    It looks like have the same problem. The problem appeared on ESX 3.01+VC2.0.2. Last weekend we installed new ESX3.0.2 U1 + VC 2.0.2 U1 (clear VC database and new installation ESX and VC with zero). The problem is still there

    Any ideas?

    2007-11-10 10:12:30.909 'App' 4004 info VpxLRO -- BEGIN task-internal-14 -- datacenter-2 -- vim.Datacenter.queryConnectionInfo

    2007-11-10 10:12:31.034 'BaseLibs' 4004 warning SSLVerifyCertAgainstSystemStore: The remote host certificate has these problems:

    A certificate in the host's chain is based on an untrusted root.

    2007-11-10 10:12:31.034 'BaseLibs' 4004 warning SSLVerifyIsEnabled: failed to read registry value. Assuming verification is disabled. LastError = 0

    2007-11-10 10:12:31.034 'BaseLibs' 4004 warning SSLVerifyCertAgainstSystemStore: Certificate verification is disabled, so connection will proceed despite the error

    2007-11-10 10:12:31.049 'BaseLibs' 4004 warning SSLVerifyCertAgainstSystemStore: The remote host certificate has these problems:

    A certificate in the host's chain is based on an untrusted root.

    2007-11-10 10:12:31.049 'BaseLibs' 4004 warning SSLVerifyIsEnabled: failed to read registry value. Assuming verification is disabled. LastError = 0

    2007-11-10 10:12:31.049 'BaseLibs' 4004 warning SSLVerifyCertAgainstSystemStore: Certificate verification is disabled, so connection will proceed despite the error

    2007-11-10 10:12:31.143 'App' 4004 error VpxVmdbCnx Authd error: 551 There is no VMware process running for config file vmware-vpxa

    2007-11-10 10:12:31.143 'App' 4004 error VpxVmdbCnx Failed to connect to host esx3.test1.spb.cbr.ru:902. Check that authd is running correctly (lib/connect error 11)

    2007-11-10 10:12:31.252 'BaseLibs' 4004 warning SSLVerifyCertAgainstSystemStore: The remote host certificate has these problems:

    A certificate in the host's chain is based on an untrusted root.

    2007-11-10 10:12:31.252 'BaseLibs' 4004 warning SSLVerifyIsEnabled: failed to read registry value. Assuming verification is disabled. LastError = 0

    2007-11-10 10:12:31.252 'BaseLibs' 4004 warning SSLVerifyCertAgainstSystemStore: Certificate verification is disabled, so connection will proceed despite the error

    2007-11-10 10:12:31.252 'BaseLibs' 4004 warning SSLVerifyCertAgainstSystemStore: The remote host certificate has these problems:

    A certificate in the host's chain is based on an untrusted root.

    2007-11-10 10:12:31.252 'BaseLibs' 4004 warning SSLVerifyIsEnabled: failed to read registry value. Assuming verification is disabled. LastError = 0

    2007-11-10 10:12:31.252 'BaseLibs' 4004 warning SSLVerifyCertAgainstSystemStore: Certificate verification is disabled, so connection will proceed despite the error

    2007-11-10 10:12:31.362 'BaseLibs' 4004 warning SSLVerifyCertAgainstSystemStore: The remote host certificate has these problems:

    A certificate in the host's chain is based on an untrusted root.