DX Application Performance Management

 View Only

Shell scripting: using lock-files

By Jörg Mertin posted Feb 26, 2020 09:07 AM

  

Shell scripting: using lock-files

Many developers write a shell-script just to circumvent a small issue they don't want to implement in the program, or because it is easier to do so.
Very often too, this little shell script is then spawned as a child process from within a loop. And that's when the problem happens. The loop turns faster than the shell-script exits from the previous run. These shell-script runs will start to stack up until an out of memory condition happens and the kernel intercepts the next runs or just starts randomly killing running applications to free up resources to survive.

The other time a shell-script should not run twice, is when a same data-set or file is being manipulated, or if an application is allowed to run only once.

This is where lock-files come in.

The lockfile function

Usually, the lock-file will be formatted "[script-name].lock" and located in /var/lock or the /var/run directory, and the content will be the PID of the currently running script.

The lock-file functions needs to handle 3 conditions:

  1. Check the currently running process ID, and see if an instance is already running.
    If the process runs already, wait a while and try to set the lock again. If not possible, bail-out issuing a warning.
  2. set the lock-file
  3. remove the lock-file

The below function has proven to be right for a long time. This one handles the actual Lock function:

##############################################################################
#
# Lockfile Generation
# tolock: LockFile location with full path, define in main script.
# action: number, exit-code to be fed to the error-handler function
# 
# Call with: Lock tolock Action

Lock() {
# Lockfile to create
tolock="$LockFile" # Lockfile location with full path
Action="$2" # number, exit-code to be fed to the error-handler function
#
# Lock file if lockfile does not exist.
if [ -s $tolock ]
then
    # If we have provided a second Var, set Exit status using  it.
    if [ ! -n "$Action" ]
    then
        # Oops, we  found a lockfile. Loop while checking if still exists.
        while [ -s $tolock ]
        do
            sleep 5 ;
        done
        MSG="Creating lockfile $tolock failed after 5 secs"
        # write PID into Lock-File.
        echo $$ > $tolock
        errlvl=$?
        errors
    else
        Pid="`cat $tolock`"
        Exists="`ps auxw | grep " $Pid " | grep -c $PROGNAME`"
        if [ $Exists = 1 ]
        then
            MSG=""$PROGNAME" already running. Exiting..."
            errlvl=$Action
            errors
        else
            MSG="Found stale lockfile... Removing it..."
            errlvl=15 # Send out Warning message
            errors
            rm -f $tolock
            errlvl=$?
            errors
            MSG="Creating lockfile $tolock failed"
            echo $$ > $tolock
            errlvl=$?
            errors
        fi
    fi
else
    # Lock it
    MSG="Creating lockfile $tolock failed"
    echo $$ > $tolock
    errlvl=$?
    errors
fi
} # Lock

The Unlock function is way simpler, as all we have to do is remove the current lock-file before exiting the script.

##############################################################################
#
# Lockfile removal
# No arguments required. Works as is.
#
Unlock(){
# Name of Lockfile to unlock
unlock="$LockFile"
# Unlock the file.
if [ -s $unlock ]
then
    PID=$$
    if [ "`cat $unlock`" != "$PID" ]
    then
        # Lock it
        errlvl=15
        MSG="Wrong lock-file PID. Probably a race-condition happened...n"
        errors
    else
        # Removing Lockfile
        rm -f $unlock
    fi
fi
#
} # Unlock

To use the functions, all we have to do in our main script is to add the LockFile variable to provide the path of the lock-file, and make sure the Lock/Unlock functions are provided by our shared include file.

# Define Lock-File
LockFile=/var/lock/${PROGNAME}.lock

If we now add the Lock and Unlock functions to our sample-script and execute it:

#######################################################################
# Actual script - do not modify anything below this point
#######################################################################

# Prevent double execution
Lock $LockFile 1

# One way of using the log-message, put it all into the current MSG
# variable and invoque internal funciton "log"
MSG="Starting program $PROGNAME"
log $MSG

if [ -f file.ba ]
then
    # Apply error message assuming the worst-case.
    MSG="Move file.ba to /tmp/foo.ba failed"
    mv -f file.ba /tmp/foo.ba 2>/dev/null
    # Assign the return code of the last program execution
    errlvl=$?
    errors

    # Echo the error code back to the console so we can see it.
    MSG="Move file.ba to /tmp/foo.ba Succeeded"

else
    MSG="file.ba does not exist"
    errlvl=1
    errors
fi

# Log the operation. In case an error occured, we exit anyway
log $MSG

# remove lock-file
Unlock

Testing the Lock functionality

The execution will show that it executes, but on second execution we'll have a WARNING showing up:

jmertin@calypso:~/work/2020/Blogs$ ./sample.sh 
>>> sample[22213]: Starting program sample

========================================================================
*** FATAL:  An error occured in "sample()" code 1. Bailing out...
>>> sample[22213]: FATAL: file.ba does not exist

jmertin@calypso:~/work/2020/Blogs$ ./sample.sh 
>>> sample[22220]: WARNING: Found stale lockfile... Removing it...
>>> sample[22220]: Starting program sample

========================================================================
*** FATAL:  An error occured in "sample()" code 1. Bailing out...
>>> sample[22220]: FATAL: file.ba does not exist

The Warning is actually a wanted side-effect for this example, and to go away, will require a modification to our error-handler function.
What happened here is that when executed the first time, the lock-file was set, but the error condition made the script exit without removing the lock-file.

Checking the lock-file content, we see that it exists, was set and provides the PID of the last execution:

jmertin@calypso:~/work/2020/Blogs$ cat /var/lock/sample.lock 
22220

Applying this small change to our errors function inside our shmod.inc file:

--- shmod.inc_old       2020-02-06 12:05:39.617855140 +0100
+++ shmod.inc   2020-02-06 12:05:42.729838586 +0100
@@ -86,6 +86,7 @@
            echo "========================================================================" 
             echo "*** FATAL:  An error occured in "${PROGNAME}(${FUNCTION})" code $errlvl. Bailing out..."
             log "FATAL: $MSG"
+           Unlock
             exit $errlvl
        fi
     fi

will take care of the lock-file remaining after an unwanted exit condition happened.
The reason this is important is because in case the script was killed or crashed, the lock-file will still remain and this is our indicator it needs to be investigated!

After this modification, the execution of our sample script shows that the lock-file was removed by the controlled exit. The below example run shows the stale-lock file from the previous runs, but when we run it again, we see the no more stale lock-file message showing us it was correctly removed.

jmertin@calypso:~/work/2020/Blogs$ ./sample.sh 
>>> sample[22380]: WARNING: Found stale lockfile... Removing it...
>>> sample[22380]: Starting program sample

========================================================================
*** FATAL:  An error occured in "sample()" code 1. Bailing out...
>>> sample[22380]: FATAL: file.ba does not exist

jmertin@calypso:~/work/2020/Blogs$ ./sample.sh 
>>> sample[22397]: Starting program sample

========================================================================
*** FATAL:  An error occured in "sample()" code 1. Bailing out...
>>> sample[22397]: FATAL: file.ba does not exist

jmertin@calypso:~/work/2020/Blogs$ ls -l /var/lock/sample.lock
ls: cannot access '/var/lock/sample.lock': No such file or directory


Testing the time functionality

Let's test the time-function. We'll add a sleep timer just before the file check function. When we execute the script the first time, we'll go to another console and execute the same script again.

MSG="Starting program $PROGNAME"
log $MSG

echo "Sleeping 30secs"
sleep 30

if [ -f file.ba ]
then

In this example, because we gave the Lock function the action to exit, it will act accordingly.

jmertin@calypso:~/work/2020/Blogs$ ./sample.sh 

========================================================================
*** FATAL:  An error occured in "sample()" code 1. Bailing out...
>>> sample[22484]: FATAL: sample already running. Exiting...
>>> sample[22484]: WARNING: Wrong lock-file PID. Probably a race-condition happened...

If we now remove the "Action" from the lock-call of our sample-script and modify it to simply "Lock", the script will wait until the previous script will stop running. Make sure you only use this functionality if multiple instances of the same script are allowed to run.

jmertin@calypso:~/work/2020/Blogs$ ./sample.sh 
>>> sample[22677]: WARNING: Lockfile exists. Waiting 5secs until next check
>>> sample[22677]: WARNING: Lockfile exists. Waiting 5secs until next check
>>> sample[22677]: WARNING: Lockfile exists. Waiting 5secs until next check
>>> sample[22677]: WARNING: Lockfile exists. Waiting 5secs until next check
>>> sample[22677]: WARNING: Lockfile exists. Waiting 5secs until next check
>>> sample[22677]: WARNING: Lockfile exists. Waiting 5secs until next check
>>> sample[22677]: Starting program sample

========================================================================
*** FATAL:  An error occured in "sample()" code 1. Bailing out...
>>> sample[22677]: FATAL: file.ba does not exist

And of course, if the log configuration is still valid, the syslog will show the following entry:

Feb  6 12:19:11 calypso sample[22677]: WARNING: Lockfile exists. Waiting 5secs until next check
Feb  6 12:19:36 calypso sample[22677]: message repeated 5 times: [WARNING: Lockfile exists. Waiting 5secs until next check]
Feb  6 12:19:38 calypso sample[22669]: FATAL: file.ba does not exist
Feb  6 12:19:41 calypso sample[22677]: Starting program sample
Feb  6 12:20:11 calypso sample[22677]: FATAL: file.ba does not exist

You will see here that in the 3rd line, we have the execution of the first script, while line 1,2 and 4,5 show the second script running and waiting (the ID appended to sample shows  it.

Conclusion

The lock-file is very important to make sure that script run correctly, and can't be spawned in an uncontrolled manner to DDoS the local system. Using the provided functions help handle the lock-file automatically as long as its destination is defined. It has also the side effect that it is linked to the log-function, so it can report issues to to whatever destination is configured.

Next  scripting: CLI UI handling - or how can I use some functions to generate a pseudo UI on the CLI

0 comments
8 views

Permalink