vCenter

 View Only
Expand all | Collapse all

VCSA services not starting

  • 1.  VCSA services not starting

    Posted Dec 12, 2017 10:50 PM

    I've got VCentre 6.5 in the lab and after a power cut, I went to restart it and it's starting but services are failing.

    root@vcentre65 [/]# service-control --start --all

    Service-control failed. Error Failed to start vmon services.vmon-cli RC=1, stderr=Failed to start vmware-vpostgres, vapi-endpoint services. Error: Operation timed out

    I've checked for any lock files and it all looks fine, the appliance has plenty of disk space so I'm not sure what's causing it. Starting the services manually doesn't help and the errors I'm getting are not exactly helpful......

    root@vcentre65 [ / ]# service-control --start vmware-vpostgres

    Perform start operation. vmon_profile=None, svc_names=['vmware-vpostgres'], include_coreossvcs=False, include_leafossvcs=False

    2017-12-12T22:46:37.009Z Service vmware-vpostgres state STOPPED

    Error executing start on service vmware-vpostgres. Details {

      "resolution": null,

      "detail": [

      {

      "args": [

      "vmware-vpostgres"

      ],

      "id": "install.ciscommon.service.failstart",

      "localized": "An error occurred while starting service 'vmware-vpostgres'",

      "translatable": "An error occurred while starting service '%(0)s'"

      }

      ],

      "componentKey": null,

      "problemId": null

    }

    Service-control failed. Error {

      "resolution": null,

      "detail": [

      {

      "args": [

      "vmware-vpostgres"

      ],

      "id": "install.ciscommon.service.failstart",

      "localized": "An error occurred while starting service 'vmware-vpostgres'",

      "translatable": "An error occurred while starting service '%(0)s'"

      }

      ],

      "componentKey": null,

      "problemId": null

    }

    So... Any suggestions as to what might be causing VCSA 6.5 not play ball?!

    Thanks.



  • 2.  RE: VCSA services not starting

    Posted Dec 12, 2017 11:01 PM

    First thing is check logs for vPostgres and see why the daemon isn't starting.



  • 3.  RE: VCSA services not starting

    Posted Dec 12, 2017 11:07 PM

    Sorry, I should have mentioned that, there isn't anything in the vpostgres log.

    root@vcentre65 [ /tmp ]# more /storage/db/vpostgres/pg_xlog

    *** /storage/db/vpostgres/pg_xlog: directory ***

    root@vcentre65 [ /tmp ]#



  • 4.  RE: VCSA services not starting

    Posted Dec 13, 2017 03:36 AM

    Hi,

    Try to check the space on root partition. See if its full.

    df -h

    You need to extend the root size if it is full. Follow the article below. I did face same issue in VCSA 6.0.

    VCSA 6.0 failed to stage for patch update and failed to connect after reboot also - VMware Diary



  • 5.  RE: VCSA services not starting

    Posted Dec 13, 2017 07:54 AM

    It's not. Disk space was one of the first things I checked as I hit space issues with VCSA 5.5

    root@vcentre65 [ ~ ]# df -h

    Filesystem Size Used Avail Use% Mounted on

    devtmpfs 4.9G 0 4.9G 0% /dev

    tmpfs 4.9G 0 4.9G 0% /dev/shm

    tmpfs 4.9G 696K 4.9G 1% /run

    tmpfs 4.9G 0 4.9G 0% /sys/fs/cgroup

    /dev/sda3 11G 5.3G 4.8G 53% /

    tmpfs 4.9G 912K 4.9G 1% /tmp

    /dev/mapper/netdump_vg-netdump 985M 1.3M 932M 1% /storage/netdump

    /dev/sda1 120M 27M 87M 24% /boot

    /dev/mapper/imagebuilder_vg-imagebuilder 9.8G 23M 9.2G 1% /storage/imagebuilder

    /dev/mapper/dblog_vg-dblog 15G 102M 14G 1% /storage/dblog

    /dev/mapper/db_vg-db 9.8G 165M 9.1G 2% /storage/db

    /dev/mapper/autodeploy_vg-autodeploy 9.8G 23M 9.2G 1% /storage/autodeploy

    /dev/mapper/updatemgr_vg-updatemgr 99G 731M 93G 1% /storage/updatemgr

    /dev/mapper/seat_vg-seat 9.8G 198M 9.1G 3% /storage/seat

    /dev/mapper/core_vg-core 25G 5.9G 18G 26% /storage/core

    /dev/mapper/log_vg-log 9.8G 1.8G 7.5G 20% /storage/log



  • 6.  RE: VCSA services not starting

    Posted Dec 13, 2017 01:47 PM

    Do the clean power of ( not restart) then power on.

    if PSC is external then power off VC first then PSC and power on in reverse order.



  • 7.  RE: VCSA services not starting

    Posted Dec 13, 2017 02:41 PM

    If vPostgres isn't starting, a shutdown and restart isn't going to do anything.

    @OP, check in /var/log/vmware/vpostgres/postgresql-##.log that corresponds to the latest date stamp. Also check serverlog.stderr. Anything in there?



  • 8.  RE: VCSA services not starting

    Posted Dec 13, 2017 10:34 PM

    A full shutdown and restart didn't fix anything.

    serverlog.stderr shows nothing

    root@vcentre65 [ ~ ]# tail /storage/log/vmware/vpostgres/serverlog.stderr

    Starting service process with pid: 57064.

    LOG: skipping missing configuration file "/storage/db/vpostgres/postgresql.conf.repl"

    LOG: skipping missing configuration file "/storage/db/vpostgres/postgresql.conf.repl"

    2017-12-13 22:19:40.594 UTC 5a31a77c.dee8 0 LOG: registering background worker "health_status_worker"

    2017-12-13 22:19:40.765 UTC 5a31a77c.dee8 0 LOG: redirecting log output to logging collector process

    2017-12-13 22:19:40.765 UTC 5a31a77c.dee8 0 HINT: Future log output will appear in directory "/var/log/vmware/vpostgres".

    Now, there is nothing in /var/log/vmware/vpostgres/, however a search for vpostgres logs shows something interesting:

    root@vcentre65 [ /storage/log/vmware/vpostgres ]# pwd

    /storage/log/vmware/vpostgres

    root@vcentre65 [ /storage/log/vmware/vpostgres ]# tail 25 postgresql-13.log

    tail: cannot open '25' for reading: No such file or directory

    ==> postgresql-13.log <==

    2017-12-13 15:12:34.800 UTC 5a314362.3a24 0 LOG: invalid secondary checkpoint record

    2017-12-13 15:12:34.800 UTC 5a314362.3a24 0 PANIC: could not locate a valid checkpoint record

    2017-12-13 15:12:38.466 UTC 5a314362.3a22 0 LOG: startup process (PID 14884) was terminated by signal 6: Aborted

    2017-12-13 15:12:38.467 UTC 5a314362.3a22 0 LOG: aborting startup due to startup process failure

    2017-12-13 22:19:40.768 UTC 5a31a77c.deec 0 LOG: database system was interrupted; last known up at 2017-12-11 10:54:22 UTC

    2017-12-13 22:19:41.142 UTC 5a31a77c.deec 0 LOG: invalid primary checkpoint record

    2017-12-13 22:19:41.142 UTC 5a31a77c.deec 0 LOG: invalid secondary checkpoint record

    2017-12-13 22:19:41.142 UTC 5a31a77c.deec 0 PANIC: could not locate a valid checkpoint record

    2017-12-13 22:19:44.926 UTC 5a31a77c.dee8 0 LOG: startup process (PID 57068) was terminated by signal 6: Aborted

    2017-12-13 22:19:44.926 UTC 5a31a77c.dee8 0 LOG: aborting startup due to startup process failure



  • 9.  RE: VCSA services not starting

    Posted Dec 13, 2017 10:41 PM

    I've done some more digging and it looks like corruption in the database, I'll try a few thinks and report back, thanks daphnissov​, you've pointed me in the right direction - much appreciated!



  • 10.  RE: VCSA services not starting
    Best Answer

    Posted Dec 13, 2017 11:42 PM

    Good deal, let me know what you find out (for everyone's benefit on this thread). I did some experimenting in my lab, and it looks like you'll want to check out pg_resetxlog command to see if the corruption needs to be overridden or not.



  • 11.  RE: VCSA services not starting

    Posted Dec 14, 2017 02:13 PM

    You're spot on, I had to do some messing around with /etc/passwd as I couldn't su to vpostgres because it was set to nologin.

    I eventually ran pg_resetxlog and that brought the DB up but now vpxd is complaining:

    2017-12-14T14:01:08.663Z warning vpxd[7F99A77C9800] [Originator@6876 sub=InvtVmDb] Failed to load VPX_VM record from DB. Host id: '40' is not found in the inventory for VM id: '49'

    2017-12-14T14:01:08.663Z warning vpxd[7F99A77C9800] [Originator@6876 sub=InvtVmDb] Failed to load VPX_VM record from DB. Host id: '40' is not found in the inventory for VM id: '53'

    2017-12-14T14:01:08.664Z warning vpxd[7F99A77C9800] [Originator@6876 sub=InvtVmDb] Failed to load VPX_VM record from DB. Host id: '56' is not found in the inventory for VM id: '75'

    2017-12-14T14:01:08.664Z warning vpxd[7F99A77C9800] [Originator@6876 sub=InvtVmDb] Failed to load VPX_VM record from DB. Host id: '56' is not found in the inventory for VM id: '73'

    2017-12-14T14:01:08.664Z warning vpxd[7F99A77C9800] [Originator@6876 sub=InvtVmDb] Failed to load VPX_VM record from DB. Host id: '523' is not found in the inventory for VM id: '263'

    I think that the database is toast so I'm going to restore it from backup, it's been an interesting lesson :smileyhappy:



  • 12.  RE: VCSA services not starting

    Posted Dec 14, 2017 02:23 PM

    Argh, if it can't load those records from the table they're probably hosed, unfortunately. But a DB restore should be good to try.



  • 13.  RE: VCSA services not starting

    Posted Dec 14, 2017 02:32 PM

    Yeah, as soon as I saw those errors I thought that Vcentre was probably a mess. I'll see if I can restore the database and if that doesn't fix it I'll just restore the whole VM. Not a big deal but an interesting experiment :smileyhappy: