Evaluating the Tokens, or Order of Expansion in Bash

July 23rd, 2012 by Allister Banks

Previously in our series on commonly overlooked things in bash, we spoke about being specific with the binaries our script will call, and mentioned conventions to use after deciding what to variable-ize. The rigidity and lack of convenience afforded by bash starts to poke through when we’re trying to abstract re-usable inputs through making them into variables, and folks are commonly tripped up when trying to have everything come out as intended on the other side, when that line runs. You may already know to put quotes around just about every variable to catch the possibility of spaces messing things up, and we’re not even touching on complex ‘sanitization’ of things like non-roman alphabets and/or UTF-8 encoding. Knowing the ‘order of operation expansion’ that the interpreter will use when running our scripts is important. It’s not all drudgery, though, as we’ll uncover features available to bash that you may not have realized exist.

For instance, you may know curly braces can be used did you know there’s syntax to, for example, expand to multiple extensions for the same filenames by putting them in curly braces, comma-separated? An interactive example(with set -x)
cp veryimportantconfigfile{,-backup}
+ cp veryimportantconfigfile veryimportantconfigfile-backup

That’s referred to as filename (or just) brace expansion, and is the first in order of the (roughly) six types of expansion the bash interpreter goes through when evaluating lines and ‘token-ized’ variables in a script.

Since you’re CLI-curious and go command line (trademark @thespider) all the time, you’re probably familiar with not only that you can use tilde(~) for a shortcut to the current logged-in users home directory, but also that just cd alone will assume you meant you wanted to go to that home directory? A users home gets a lot of traffic, and while the builtin $HOME variable is probably more reliable if you must include interaction with home directories in your script, tilde expansion (including any subdirectories tagged to the end) is the next in our expansion order.

Now things get (however underwhelmingly) more interesting. Third in the hit parade, each with semi-equal weighting, are
a. the standard “variable=foo, echo $variable” style ‘variable expressions’ we all know and love,
b. backtick-extracted results of commands, which can also be achieved with $(command) (and, worse came to worse, you could force another expansion of a variable with the eval command)
c. arithmetic expressions (like -gt for greater than, equal, less than, etc.) as we commonly use for comparison tests,
and an interesting set of features that are actually convenient (and mimic some uses of regular expressions), called (misleadingly)
$. dollar sign substitution. All of the different shorthand included under this category has been written about elsewhere in detail, but one in particular is an ad-hoc twist on a catchall that you could use via the ‘shell options’, or shopt command(originally created to expand on ‘set‘, which we mentioned in our earlier article when adding a debug option with ‘set -x‘). All of the options available with shopt are also a bit too numerous to cover now, but one that you’ll see particularly strict folks use is ‘nounset‘, to ensure that variables have always been defined if they’re going to be evaluated as the script runs. It’s only slightly confusing that a variable can have an empty string for a value, which would pass this check. Often, it’s the other way around, and we’ll have variables that are defined without being used; the thing we’d really like to look out for is when a variable is supposed to have a ‘real’ value, and the script could cause ill affects by running without one – so the question becomes how do we check for those important variables as they’re expanded?
A symbol used in bash that will come up later when we cover getopt is the colon, which refers to the existence of an argument, or the variables value (text or otherwise) that you’d be expecting to have set. Dollar sign substitution mimics this concept when allowing you to ad hoc check for empty (or ‘null’) variables by following a standard ‘$variable‘ with ‘:?’ (finished product: ${variable:?})- in other words, it’s a test if the $variable expanded into a ‘real’ value, and it will exit the script at that point with an error if unset, like an ejector seat.

Moving on to the less heavy expansions, the next is… command lookups in the run environments PATH, which are evaluated like regular(western) sentences, from left to right.
As it traipses along down a line running a command, it follows that commands rules regarding if it’s supposed to expect certain switches and arguments, and assumes those are split by some sort of separation (whitespace by default), referred to as the Internal Field Separator. The order of expansion continues with this ‘word splitting’.

And finally, there’s regular old pathname pattern matching – if you’re processing a file or folder in a directory, it find the first instance that matches to evaluate that – pretty straightforward. You may notice we’re often linking to the Bash Guide for Beginners site, as hosted by The Linux Documentation Project. Beyond that resource, there’s also videos from 2011(iTunesU link) and 2012(youtube) Penn State Mac Admins conference on this topic if you need a refresher before we forge ahead for a few more posts.

Tags: , ,

Comments are closed.