vCenter

 View Only
Expand all | Collapse all

Recovering vCenter 7

Dr_Virt

Dr_VirtJul 24, 2023 01:42 PM

  • 1.  Recovering vCenter 7

    Posted Jul 21, 2023 03:35 PM

    Had a lab vCenter crash and am trying to figure out why.

    Current symptoms:

    * If starting vCenter from shell with "service-control --start --all" the process will fail with vPostgres couldn't start.

    * If starting vPostgres manually ("service-control --start --vmware-vpostgres") and then starting vCenter ("service-control --start --all") the proccess will fail with vpxd-svcs failed to start. 

    * I logged into vCenter VCDB and verified administrator account

    * I reset vCenter certificates and validated with lsdoctor

    * vxpd.log shows "Failed to connect to Authz service" and "Failed to initialize authorizeManager"

    Anyone seen something like this?



  • 2.  RE: Recovering vCenter 7

    Posted Jul 24, 2023 01:32 PM

    Hi,

    looks like a certificate issue.

     

    Have you checked all certificates with 

    for store in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list | grep -v TRUSTED_ROOT_CRLS); do echo "[*] Store :" $store; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $store --text | grep -ie "Alias" -ie "Not After";done;

     

    lsdcotor says all good ?  after trustfix ?

     

     



  • 3.  RE: Recovering vCenter 7

    Posted Jul 24, 2023 01:37 PM

     

    Upon running the suggested code, all certs are dated 2025 and beyond. There is a BACKUP_STORE cert for __MACHINE_CERT dated December 2022, but I was under the understanding that those are inactive. 

    LSDOCTOR shows all good. 



  • 4.  RE: Recovering vCenter 7

    Posted Jul 24, 2023 01:42 PM

    Dr_Virt_0-1690206096460.png

     



  • 5.  RE: Recovering vCenter 7

    Posted Jul 24, 2023 01:44 PM

    Can you verify it the Hostname is correct with this command with the certificate ?

    /usr/lib/vmware-vmafd/bin/vmafd-cli get-pnid --server-name localhost

     



  • 6.  RE: Recovering vCenter 7

    Posted Jul 24, 2023 01:53 PM

     

    Yes. It returns the FQDN of the VCSA.



  • 7.  RE: Recovering vCenter 7

    Posted Jul 24, 2023 01:56 PM

    which Build of vCenter you are running ?

    Which way do you have reset the certificates ?

    /usr/lib/vmware-vmca/bin/certificate-manager

    with option 8 ?

    If not please do it with option 8



  • 8.  RE: Recovering vCenter 7

    Posted Jul 24, 2023 01:59 PM

     

    VCSA - 7.0.3.01000
    BUILD - 20395099


    Yes. it was the Certificate Manager with option 8.



  • 9.  RE: Recovering vCenter 7

    Posted Jul 24, 2023 02:06 PM

    Got a new error:

    VPXD - Failed to read X509 cert

    Dr_Virt_0-1690207531810.png

     



  • 10.  RE: Recovering vCenter 7

    Posted Jul 24, 2023 03:00 PM


  • 11.  RE: Recovering vCenter 7

    Posted Jul 24, 2023 03:10 PM

     

    First, thank you for all of the assistance. 

    I have executed that KB. The STS was in good standing, but I replaced it anyway.



  • 12.  RE: Recovering vCenter 7

    Posted Jul 24, 2023 03:01 PM

    I would reset a Certificate to default VMware cert and after that would create a new CSR.



  • 13.  RE: Recovering vCenter 7

    Posted Jul 24, 2023 03:11 PM

     

    All certificates are VMware self-signed certificates.



  • 14.  RE: Recovering vCenter 7

    Posted Jul 24, 2023 05:48 PM

    what about that option? 

    https://kb.vmware.com/s/article/82332 



  • 15.  RE: Recovering vCenter 7

    Posted Jul 24, 2023 06:00 PM

     

     

    All certs are in good standing and the STS was replaced today.



  • 16.  RE: Recovering vCenter 7

    Posted Jul 24, 2023 08:06 PM

    There is something with vPostgres and the certificates. When attempting to start vPostgres on its own, there is a long list of messages about trying to build the root_crl.pem file. It makes many requests to the auth service, but eventually fails.

    Dr_Virt_0-1690229137985.png

     

     



  • 17.  RE: Recovering vCenter 7

    Posted Jul 24, 2023 09:24 PM

    Well, can get most of the services up, but the vSphere-UI just won't play nice.

    Dr_Virt_0-1690233035215.png

     



  • 18.  RE: Recovering vCenter 7

    Posted Jul 25, 2023 03:30 PM

    Interesting what is causing such issues



  • 19.  RE: Recovering vCenter 7

    Posted Jul 26, 2023 08:44 PM

    Anyone know if we can just deploy a new vCenter and have it discover or reregister the existing cluster (vSAN, NSX, etc.)?

    If not, I will have to plan a big "new deployment and migration". 

    1) Remove host from existing cluster

    2) Clean host

    3) Deploy vCenter to single host

    4) Enable vSAN

    5) Enable NSX

    6) Begin migration of workloads (how without a working vCenter?)

    7) Role hosts between clusters



  • 20.  RE: Recovering vCenter 7

    Posted Jul 27, 2023 05:33 AM

    Hi  

    First ist this a streched cluster with witness host or a standard cluster / OSA or ESA ?

    vsan can work without the vCenter, so in my opinion its not neccessary to destroy everything.

    The importent thing ist to install a new vCenter - do you have local datastores in one of your ESXi host - for example a boot device mit about 200 GB space ? - there you can temporarly deploy a vCenter. 

    Then follow this 

    1. Create a cluster and enable vSAN on new cluster 
    2. Check vSAN Health before (esxcli vsan cluster get,....)
    3. Move each host.
    4. Reapply storage policies to the VMs
    5. Re-enable stretched cluster if neccessary

    Witch NSX Version do you use ? - the nodes must be redeployed.



  • 21.  RE: Recovering vCenter 7

    Posted Jul 27, 2023 12:44 PM

     

    It is a standard cluster linked to another (management & compute vCenters).

    I have done a single node vSAN deployment before. The premise is a single host is taken out of the existing cluster (4 hosts), cleaned, and then a single node vSAN deplyment and vCenter are used to start the new rebuild cluster. 

    NSX is 3.2.2.



  • 22.  RE: Recovering vCenter 7
    Best Answer

    Posted Aug 02, 2023 04:42 PM

    Well, was able to recover. VMware sent a certificate tool (vCert) which identified some trust issues and registrations which the standard tools didn't address. 

    Then I found an issue with setting up logging within the tomcat instance. I commented out the "isAccessLogCreated" and "accessLogCleaner" beans from the Tomcat config.

    I also had to manually rebuild the vPostgresql certificate store.

    I restarted the services and got the core up and running. I got a good snapshot of the VCSA. I attempted to do a VCSA back and it failed continuously. I decided to attempt an upgrade to repair the VCSA. It took about 2 hours, but the upgrade completed from 7.03f to g. I continue to walk the update path all the way to the latest 7.03 release. 

    I tested the VCSA backup and it ran successfully. 

    I tested the Tomcat by uncommenting the previously commented out beans. It ran successfully. 

    In summary, there was corruption at multiple points within the VCSA. The help here and from VMware was able to recover it. Thank you all.