UserGuide Home NCI-NF BoM SUN Altix Data Store Clusters Condor BoM SamFS Software Map FAQ

Handling errors in shell scripts

Avoiding disasters Now and then users encounter unexpected errors in shell scripts. Common causes are when files or networks are not available or executables crash because of bugs or bad data. Also, shell scripts can be written with unanticipated dependencies on the environment and can be more or less robust in their syntax and error handling. As an example consider the following bourne/korn shell command sequence:
        # do not do this!
        cd $subdir
        rm -rf *
There are two potential problems with this.
  • If the variable "subdir" is not defined, the cd sees no argument and the directory is changed to the users HOME directory - then all files and sub-directories are recursively removed (except dot files and directories).
  • If the variable "subdir" is defined but does not specify a valid directory, the cd will fail and the directory will not be changed - then all files etc. in the current directory will be deleted.
Fortunately for the user from which this example was taken, HPCCC staff were able to recover the lost files from a backup. But remember, not all file systems on HPCCC systems have backups.
Strategies Here are several strategies suggested by HPCCC staff to catch errors using shell syntax. This will lessen the chance of such disasters happening.
  1. Probably the best solution in this case is to refer to the variable as
       cd ${subdir:?}
    In this case the script will exit with an error message if subdir is not defined.
  2. Use && or || to test the exit code (only good if $subdir is defined)
       cd $subdir || exit 1 # will exit if cd $subdir fails
       rm -rf *             # or continue on
  3. Test for existence of $subdir first
       test -n "$subdir" || exit 1
       test -d "$subdir" && cd $subdir || exit 1
       [ -d $subdir ] && cd $subdir || exit 1 # same as above for ksh
  4. You can also use if rather than && or || but that may be awkward if you want to check many commands
        if cd $subdir;
        then rm -rf *
        else exit 1
        if test -n "$subdir"
            if cd $subdir
                rm -rf *
                echo cd failed - bailing out
                exit 1
            echo variable '$subdir' is not defined - bailing out
            exit 1
Additional notes
  • For this specific case the following command would be better - as long as you did not have permission to remove files in the root directory ;-)
       rm -rf $subdir/*
  • It would be best if your current directory was not under $subdir ...
  • In bourne or korn shell variable expansion can be modified to catch unset values or use a default if unset.
         ${parameter:?}       - exit if parameter unset or null
         ${parameter:?word}   - word is printed if parameter is unset or null and exit.
         ${parameter:-word}   - word is used if parameter is undefined
  • The korn shell command "set -u" causes errors to be generated if variables are used but not defined. You still need to catch the error.
  • In C shell, the behaviour is different. The script will exit if a command fails (not in a test). If the variable is not defined that will also trigger an error by default.
  • For korn shell (or bash), use trap to set up the shell to catch the error and run a command. The simplest command to run is "exit" or "exit $?" so that the script is aborted. The following command should be executed somewhere early in the script, and will apply to all following commands (which don't have their exit value checked by inbuilt shell control structures (if && || ! etc.)
        trap 'exit $?' ERR
    However, this would not have worked in the example in question because there is no error generated.
  • We continue to recommend the use of Korn shell (or bash) for batch jobs, but recommend that care be taken to catch errors.

Last updated: 16 Mar, 2012
Email problems, suggestions, questions to
Thanks to NCI-NF for the userguide structure.