Shell scripting: using lock-files
Many developers write a shell-script just to circumvent a small issue they don't want to implement in the program, or because it is easier to do so.
Very often too, this little shell script is then spawned as a child process from within a loop. And that's when the problem happens. The loop turns faster than the shell-script exits from the previous run. These shell-script runs will start to stack up until an out of memory condition happens and the kernel intercepts the next runs or just starts randomly killing running applications to free up resources to survive.
The other time a shell-script should not run twice, is when a same data-set or file is being manipulated, or if an application is allowed to run only once.
This is where lock-files come in.
The lockfile function
Usually, the lock-file will be formatted "[script-name].lock" and located in /var/lock or the /var/run directory, and the content will be the PID of the currently running script.
The lock-file functions needs to handle 3 conditions:
- Check the currently running process ID, and see if an instance is already running.
If the process runs already, wait a while and try to set the lock again. If not possible, bail-out issuing a warning.
- set the lock-file
- remove the lock-file
The below function has proven to be right for a long time. This one handles the actual Lock function:
##############################################################################
#
# Lockfile Generation
# tolock: LockFile location with full path, define in main script.
# action: number, exit-code to be fed to the error-handler function
#
# Call with: Lock tolock Action
Lock() {
# Lockfile to create
tolock="$LockFile" # Lockfile location with full path
Action="$2" # number, exit-code to be fed to the error-handler function
#
# Lock file if lockfile does not exist.
if [ -s $tolock ]
then
# If we have provided a second Var, set Exit status using it.
if [ ! -n "$Action" ]
then
# Oops, we found a lockfile. Loop while checking if still exists.
while [ -s $tolock ]
do
sleep 5 ;
done
MSG="Creating lockfile $tolock failed after 5 secs"
# write PID into Lock-File.
echo $$ > $tolock
errlvl=$?
errors
else
Pid="`cat $tolock`"
Exists="`ps auxw | grep " $Pid " | grep -c $PROGNAME`"
if [ $Exists = 1 ]
then
MSG=""$PROGNAME" already running. Exiting..."
errlvl=$Action
errors
else
MSG="Found stale lockfile... Removing it..."
errlvl=15 # Send out Warning message
errors
rm -f $tolock
errlvl=$?
errors
MSG="Creating lockfile $tolock failed"
echo $$ > $tolock
errlvl=$?
errors
fi
fi
else
# Lock it
MSG="Creating lockfile $tolock failed"
echo $$ > $tolock
errlvl=$?
errors
fi
} # Lock
The Unlock function is way simpler, as all we have to do is remove the current lock-file before exiting the script.
##############################################################################
#
# Lockfile removal
# No arguments required. Works as is.
#
Unlock(){
# Name of Lockfile to unlock
unlock="$LockFile"
# Unlock the file.
if [ -s $unlock ]
then
PID=$$
if [ "`cat $unlock`" != "$PID" ]
then
# Lock it
errlvl=15
MSG="Wrong lock-file PID. Probably a race-condition happened...n"
errors
else
# Removing Lockfile
rm -f $unlock
fi
fi
#
} # Unlock
To use the functions, all we have to do in our main script is to add the LockFile variable to provide the path of the lock-file, and make sure the Lock/Unlock functions are provided by our shared include file.
# Define Lock-File
LockFile=/var/lock/${PROGNAME}.lock
If we now add the Lock and Unlock functions to our sample-script and execute it:
#######################################################################
# Actual script - do not modify anything below this point
#######################################################################
# Prevent double execution
Lock $LockFile 1
# One way of using the log-message, put it all into the current MSG
# variable and invoque internal funciton "log"
MSG="Starting program $PROGNAME"
log $MSG
if [ -f file.ba ]
then
# Apply error message assuming the worst-case.
MSG="Move file.ba to /tmp/foo.ba failed"
mv -f file.ba /tmp/foo.ba 2>/dev/null
# Assign the return code of the last program execution
errlvl=$?
errors
# Echo the error code back to the console so we can see it.
MSG="Move file.ba to /tmp/foo.ba Succeeded"
else
MSG="file.ba does not exist"
errlvl=1
errors
fi
# Log the operation. In case an error occured, we exit anyway
log $MSG
# remove lock-file
Unlock
Testing the Lock functionality
The execution will show that it executes, but on second execution we'll have a WARNING showing up:
jmertin@calypso:~/work/2020/Blogs$ ./sample.sh
>>> sample[22213]: Starting program sample
========================================================================
*** FATAL: An error occured in "sample()" code 1. Bailing out...
>>> sample[22213]: FATAL: file.ba does not exist
jmertin@calypso:~/work/2020/Blogs$ ./sample.sh
>>> sample[22220]: WARNING: Found stale lockfile... Removing it...
>>> sample[22220]: Starting program sample
========================================================================
*** FATAL: An error occured in "sample()" code 1. Bailing out...
>>> sample[22220]: FATAL: file.ba does not exist
The Warning is actually a wanted side-effect for this example, and to go away, will require a modification to our error-handler function.
What happened here is that when executed the first time, the lock-file was set, but the error condition made the script exit without removing the lock-file.
Checking the lock-file content, we see that it exists, was set and provides the PID of the last execution:
jmertin@calypso:~/work/2020/Blogs$ cat /var/lock/sample.lock
22220
Applying this small change to our errors function inside our shmod.inc file:
--- shmod.inc_old 2020-02-06 12:05:39.617855140 +0100
+++ shmod.inc 2020-02-06 12:05:42.729838586 +0100
@@ -86,6 +86,7 @@
echo "========================================================================"
echo "*** FATAL: An error occured in "${PROGNAME}(${FUNCTION})" code $errlvl. Bailing out..."
log "FATAL: $MSG"
+ Unlock
exit $errlvl
fi
fi
will take care of the lock-file remaining after an unwanted exit condition happened.
The reason this is important is because in case the script was killed or crashed, the lock-file will still remain and this is our indicator it needs to be investigated!
After this modification, the execution of our sample script shows that the lock-file was removed by the controlled exit. The below example run shows the stale-lock file from the previous runs, but when we run it again, we see the no more stale lock-file message showing us it was correctly removed.
jmertin@calypso:~/work/2020/Blogs$ ./sample.sh
>>> sample[22380]: WARNING: Found stale lockfile... Removing it...
>>> sample[22380]: Starting program sample
========================================================================
*** FATAL: An error occured in "sample()" code 1. Bailing out...
>>> sample[22380]: FATAL: file.ba does not exist
jmertin@calypso:~/work/2020/Blogs$ ./sample.sh
>>> sample[22397]: Starting program sample
========================================================================
*** FATAL: An error occured in "sample()" code 1. Bailing out...
>>> sample[22397]: FATAL: file.ba does not exist
jmertin@calypso:~/work/2020/Blogs$ ls -l /var/lock/sample.lock
ls: cannot access '/var/lock/sample.lock': No such file or directory
Testing the time functionality
Let's test the time-function. We'll add a sleep timer just before the file check function. When we execute the script the first time, we'll go to another console and execute the same script again.
MSG="Starting program $PROGNAME"
log $MSG
echo "Sleeping 30secs"
sleep 30
if [ -f file.ba ]
then
In this example, because we gave the Lock function the action to exit, it will act accordingly.
jmertin@calypso:~/work/2020/Blogs$ ./sample.sh
========================================================================
*** FATAL: An error occured in "sample()" code 1. Bailing out...
>>> sample[22484]: FATAL: sample already running. Exiting...
>>> sample[22484]: WARNING: Wrong lock-file PID. Probably a race-condition happened...
If we now remove the "Action" from the lock-call of our sample-script and modify it to simply "Lock", the script will wait until the previous script will stop running. Make sure you only use this functionality if multiple instances of the same script are allowed to run.
jmertin@calypso:~/work/2020/Blogs$ ./sample.sh
>>> sample[22677]: WARNING: Lockfile exists. Waiting 5secs until next check
>>> sample[22677]: WARNING: Lockfile exists. Waiting 5secs until next check
>>> sample[22677]: WARNING: Lockfile exists. Waiting 5secs until next check
>>> sample[22677]: WARNING: Lockfile exists. Waiting 5secs until next check
>>> sample[22677]: WARNING: Lockfile exists. Waiting 5secs until next check
>>> sample[22677]: WARNING: Lockfile exists. Waiting 5secs until next check
>>> sample[22677]: Starting program sample
========================================================================
*** FATAL: An error occured in "sample()" code 1. Bailing out...
>>> sample[22677]: FATAL: file.ba does not exist
And of course, if the log configuration is still valid, the syslog will show the following entry:
Feb 6 12:19:11 calypso sample[22677]: WARNING: Lockfile exists. Waiting 5secs until next check
Feb 6 12:19:36 calypso sample[22677]: message repeated 5 times: [WARNING: Lockfile exists. Waiting 5secs until next check]
Feb 6 12:19:38 calypso sample[22669]: FATAL: file.ba does not exist
Feb 6 12:19:41 calypso sample[22677]: Starting program sample
Feb 6 12:20:11 calypso sample[22677]: FATAL: file.ba does not exist
You will see here that in the 3rd line, we have the execution of the first script, while line 1,2 and 4,5 show the second script running and waiting (the ID appended to sample shows it.
Conclusion
The lock-file is very important to make sure that script run correctly, and can't be spawned in an uncontrolled manner to DDoS the local system. Using the provided functions help handle the lock-file automatically as long as its destination is defined. It has also the side effect that it is linked to the log-function, so it can report issues to to whatever destination is configured.
Next scripting: CLI UI handling - or how can I use some functions to generate a pseudo UI on the CLI