IHAC who is running APM 10.5.2 on Linux 7 servers. They installed the APM DB on PostgreSQL and that's been working. However, the Linux administrators patched the servers over the weekend and restarted them. The APM DB did not start up as it was not set to do so in init.d or systemd.
Does anyone have instructions for setting up PostgreSQL for automatic restart on a Linux 7 server using systemd? The scripts provided with APM 10.5.2 are for Linux 6 servers as they specify using init.d.
The server startup is vendor-related. We just wrap the installer.
Thanks Hiko. Is this our answer to the customer? Go to that link and read through and try to figure out how to make it work with systemd? It would be very useful to have the information for setting up auto start upon reboot for Linux 7 using systemd like we provide for it in Linux 6 systems.
The same goes for Linux 6 or 7. The section on the systemd setup is on the page:
"On Linux systems either add
/usr/local/pgsql/bin/pg_ctl start -l logfile -D /usr/local/pgsql/data
to /etc/rc.d/rc.local or /etc/rc.local or look at the file contrib/start-scripts/linux in the PostgreSQL source distribution.
When using systemd, you can use the following service unit file (e.g., at /etc/systemd/system/postgresql.service):
[Unit]Description=PostgreSQL database serverDocumentation=man:postgres(1)[Service]Type=notifyUser=postgresExecStart=/usr/local/pgsql/bin/postgres -D /usr/local/pgsql/dataExecReload=/bin/kill -HUP $MAINPIDKillMode=mixedKillSignal=SIGINTTimeoutSec=0[Install]WantedBy=multi-user.target
Using Type=notify requires that the server binary was built with configure --with-systemd.
Consider carefully the timeout setting. systemd has a default timeout of 90 seconds as of this writing and will kill a process that does not notify readiness within that time. But a PostgreSQL server that might have to perform crash recovery at startup could take much longer to become ready. The suggested value of 0 disables the timeout logic.
While the server is running, its PID is stored in the file postmaster.pid in the data directory. This is used to prevent multiple server instances from running in the same data directory and can also be used for shutting down the server."
The next section on troubleshooting startup failures is also worth noting:
There are several common reasons the server might fail to start. Check the server's log file, or start it by hand (without redirecting standard output or standard error) and see what error messages appear. Below we explain some of the most common error messages in more detail.
LOG: could not bind IPv4 socket: Address already in useHINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.FATAL: could not create TCP/IP listen socket
This usually means just what it suggests: you tried to start another server on the same port where one is already running. However, if the kernel error message is not Address already in use or some variant of that, there might be a different problem. For example, trying to start a server on a reserved port number might draw something like:
$ postgres -p 666LOG: could not bind IPv4 socket: Permission deniedHINT: Is another postmaster already running on port 666? If not, wait a few seconds and retry.FATAL: could not create TCP/IP listen socket
A message like:
FATAL: could not create shared memory segment: Invalid argumentDETAIL: Failed system call was shmget(key=5440001, size=4011376640, 03600).
probably means your kernel's limit on the size of shared memory is smaller than the work area PostgreSQL is trying to create (4011376640 bytes in this example). Or it could mean that you do not have System-V-style shared memory support configured into your kernel at all. As a temporary workaround, you can try starting the server with a smaller-than-normal number of buffers (shared_buffers). You will eventually want to reconfigure your kernel to increase the allowed shared memory size. You might also see this message when trying to start multiple servers on the same machine, if their total space requested exceeds the kernel limit.
An error like:
FATAL: could not create semaphores: No space left on deviceDETAIL: Failed system call was semget(5440126, 17, 03600).
does not mean you've run out of disk space. It means your kernel's limit on the number of System V semaphores is smaller than the number PostgreSQL wants to create. As above, you might be able to work around the problem by starting the server with a reduced number of allowed connections (max_connections), but you'll eventually want to increase the kernel limit.
If you get an "illegal system call" error, it is likely that shared memory or semaphores are not supported in your kernel at all. In that case your only option is to reconfigure the kernel to enable these features.
Details about configuring System V IPC facilities are given in Section 18.4.1.
The likely reason this happened is because the EM and/or PostgreSQL installation was done as non-root user. This is talked about in the installation guide. You'll need root permissions to setup the daemon.
Also have the root user create the user 'postgres' if the installation was done using a non-root account. The vendor installer cannot create the account with those permissions, which again is described in the guide.
Customer was not able to restart PostgreSQL after the patch. When attempted, they were prompted for the root user password, even though they installed it as a non-root user originally. The last time they got that prompt they had to delete and reinstall the APM DB. So now I don't know if this is a result of the patch applied to the server or something in the database.
It was likely that the entire installation and service unit file needed to be rechecked for proper permissions and that the user 'postgres' existed since it wasn't created during the installation. Having checked all those, I'd check the logs and the message from systemd to identify the issue.
Prior to the patch applied over the weekend, the system was functioning with no problems. I'm very concerned about the fragility of the APM DB (PostgreSQL on Linux 7) if a patch causes it to stop working.
Had the server been rebooted prior to patching to ensure a successful restart? I bet they didn't and it would have failed then. The user 'postgres' is likely non-existent and was specified during the installation. Have them check. If it's not there, just have a Linux admin create it and it give it full permissions to the database installation.
Once the server is up and running, connect to the database server using pgsql or pgAdmin to validate connectivity.
No, the database had never been setup to restart upon server reboot. That is one of the original asks in this question: setting up restart on a Linux 7 server using systemd. They didn't get it setup prior to the patch and reboot, and after the reboot it was not running so they tried to start it manually and it failed (prompted for the root password). The Linux admin, when given the script from the database directory in APM, replied that it was a Linux 6 script and not a Linux 7 script because it used init.d and not systemd.
So, bottom line is that they want a script like what we provide now for Linux 6 servers that will work on Linux 7 servers using systemd. Of course, they have to get the database working again and they're thinking that they will need to reinstall it from scratch.
No need for a script; it's all in the documentation I just posted.
Once the setup is validated, they can just run either the 'service' or 'system' command to check the status:
service postgresql-9.2 status
systemctl status postgresql-9.2.service
I would recommend the customer follow the directions to create the service unit file for the database server.
If they continue to insist on using a control script, then they can use the attached as reference. They will have to modify this since it's based upon my test server install.
Has any customers or partners built such a script meeting Cary's requirements?