Saturday 27 October 2007

Geeky experiments with bash functions

return doesn't mean what you think it does

Modern UNIX shells like bash have the ability to define functions. Functions are a great way to factorise parts of code that you need to use in several areas of your script or isolate discrete pieces of logic. In most programming languages, one fundamental aspect of functions is that they can return a value which is the result of whatever computation they were doing. And indeed a shell like bash has a built in return command. But, hang on, if you read up on return, you realise that it can only return integer values. The reason for this is that return works the same way as exit: it sets the $? variable with the value given as argument, or 0 if no argument is given, and aborts the function. exit aborts the whole script instead. So, if you use return, use it to provide the calling code with an error code. This doesn't solve the original problem though: how can we return a value from a function, such as a character string?

As often with UNIX, the answer is deceptively simple and consistent with everything you know about scripts: just echo the value you want to return and call your function as if it was a full blown script, with inverted quotes or the $(...) construct, as in the example below.

#!/bin/bash

function f {
  echo "[ $1 ]"
  return 1
}

s=`f "abc"`
echo "\$?=$?"
echo "\$s=$s"

Save this in a file called fn.sh, make it executable and run it:

$ ./fn.sh
$?=1
$s=[ abc ]

As you can see, the $? special variable was set with the value 1 and the $s variable was updated with the result of the function.

Recursive fun

Once you know how to return a value from your function, the next thing you need is to know how to pass it some parameters. Once again, it works exactly the same as in a full blown script: you just use the $n positional variables. Recursion works as you expect as well. So let's demonstrate with a classic textbook example: a recursive factorial.

#!/bin/bash

function fact {
  if [ $# -lt 1 ]; then
    return 1
  elif [ $1 -lt 1 ]; then
    return 2
  elif [ $1 -eq 1 ]; then
    r=1
  else
    r=$(( $1 * `fact $(( $1 - 1 ))` ))
  fi
  echo "$r"
}

fact $1

Save it, run it and you should get something like the following. Don't give it too high a value though, we'll see why in a second: 10 should be enough to demonstrate that it works.

$ ./fact.sh 10
3628800

While we're here, let's have a quick look at this function as it has a couple of interesting constructs. It does the following:

  • check the number of parameters it has been passed, using the $# variable, and returns an error if less than 1,
  • check that the first parameter is positive, as a negative value is invalid and return an error code in this case,
  • check the termination condition of the recursion and set the result if we have reached that condition,
  • finally calculate the factorial by calling itself recursively.

Note the use of the $((...)) construct to do the relevant arithmetic calculations: one is needed inside the recursive call to the function, to tell ensure the value passed is the result if $1 - 1 rather than the three parameters $1, - and 1; another one is needed outside the call to calculate the product.

This script also proves that when using functions in this way, the variables defined inside the function are local and not overwritten by a subsequent call. This is because the use of the back quotes actually forks a new process in which the function is called. You can verify this by adding a sleep statement inside the function, running the script in the background and running ps:

$ ps
  PID  TT  STAT      TIME COMMAND
  394  p1  S      0:00.07 -bash
 2414  p1  S      0:00.01 /bin/bash ./fact.sh 10
 2415  p1  S      0:00.00 /bin/bash ./fact.sh 10
 2416  p1  S      0:00.00 /bin/bash ./fact.sh 10
 2417  p1  S      0:00.00 /bin/bash ./fact.sh 10
 2418  p1  S      0:00.00 /bin/bash ./fact.sh 10
 2419  p1  S      0:00.00 /bin/bash ./fact.sh 10
 2420  p1  S      0:00.00 /bin/bash ./fact.sh 10
 2421  p1  S      0:00.00 /bin/bash ./fact.sh 10
 2422  p1  S      0:00.00 /bin/bash ./fact.sh 10
 2423  p1  S      0:00.00 /bin/bash ./fact.sh 10
 2424  p1  S      0:00.00 sleep 5

Each child process has its own context and variables and doesn't interfere with the other ones. However, this means that you have to be extremely careful when using functions this way as you could quite easily spawn a large number of processes. Recursion in particular could be deadly.

Pipe dreams

Finally, if a function generally works like a script, can we pipe it? yes but if you want it to be on the consuming side of the pipe, you will need to adapt the function to take its input from stdin rather than a parameter. And you can even make it work so that it can do both. Here is a modified version of the very first script:

#!/bin/bash

function f {
  if [ $# -ge 1 ]; then
    echo "[ $1 ]"
  else
    while read line; do
      if [ -n "$line" ]; then
        echo `f "$line"`
      fi
    done
  fi
}

find ~ -type f -print | f

You could apply this construct to most functions: check if there are any parameters, in which case you can use them normally, otherwise read each input line and call the function recursively using the line as parameter. Don't forget to enclose it between quotes though, so that it is passed as a single parameter and blank lines don't trigger an infinite recursion. Run this script and you should get a list of all files in your home directory, with each file enclosed in square brackets.

That's it for functions. Please tell me if any of the examples above don't work for you. I have tested them on Ubuntu Linux, Sun Solaris Express and Mac OS-X so they should be fairly portable but you never know. They may not work with shells other than bash but feel free to experiment.

No comments: