Posts Tagged ‘programming’

Evaluating the Tokens, or Order of Expansion in Bash

Monday, July 23rd, 2012

Previously in our series on commonly overlooked things in bash, we spoke about being specific with the binaries our script will call, and mentioned conventions to use after deciding what to variable-ize. The rigidity and lack of convenience afforded by bash starts to poke through when we’re trying to abstract re-usable inputs through making them into variables, and folks are commonly tripped up when trying to have everything come out as intended on the other side, when that line runs. You may already know to put quotes around just about every variable to catch the possibility of spaces messing things up, and we’re not even touching on complex ‘sanitization’ of things like non-roman alphabets and/or UTF-8 encoding. Knowing the ‘order of operation expansion’ that the interpreter will use when running our scripts is important. It’s not all drudgery, though, as we’ll uncover features available to bash that you may not have realized exist.

For instance, you may know curly braces can be used did you know there’s syntax to, for example, expand to multiple extensions for the same filenames by putting them in curly braces, comma-separated? An interactive example(with set -x)
cp veryimportantconfigfile{,-backup}
+ cp veryimportantconfigfile veryimportantconfigfile-backup

That’s referred to as filename (or just) brace expansion, and is the first in order of the (roughly) six types of expansion the bash interpreter goes through when evaluating lines and ‘token-ized’ variables in a script.

Since you’re CLI-curious and go command line (trademark @thespider) all the time, you’re probably familiar with not only that you can use tilde(~) for a shortcut to the current logged-in users home directory, but also that just cd alone will assume you meant you wanted to go to that home directory? A users home gets a lot of traffic, and while the builtin $HOME variable is probably more reliable if you must include interaction with home directories in your script, tilde expansion (including any subdirectories tagged to the end) is the next in our expansion order.

Now things get (however underwhelmingly) more interesting. Third in the hit parade, each with semi-equal weighting, are
a. the standard “variable=foo, echo $variable” style ‘variable expressions’ we all know and love,
b. backtick-extracted results of commands, which can also be achieved with $(command) (and, worse came to worse, you could force another expansion of a variable with the eval command)
c. arithmetic expressions (like -gt for greater than, equal, less than, etc.) as we commonly use for comparison tests,
and an interesting set of features that are actually convenient (and mimic some uses of regular expressions), called (misleadingly)
$. dollar sign substitution. All of the different shorthand included under this category has been written about elsewhere in detail, but one in particular is an ad-hoc twist on a catchall that you could use via the ‘shell options’, or shopt command(originally created to expand on ‘set‘, which we mentioned in our earlier article when adding a debug option with ‘set -x‘). All of the options available with shopt are also a bit too numerous to cover now, but one that you’ll see particularly strict folks use is ‘nounset‘, to ensure that variables have always been defined if they’re going to be evaluated as the script runs. It’s only slightly confusing that a variable can have an empty string for a value, which would pass this check. Often, it’s the other way around, and we’ll have variables that are defined without being used; the thing we’d really like to look out for is when a variable is supposed to have a ‘real’ value, and the script could cause ill affects by running without one – so the question becomes how do we check for those important variables as they’re expanded?
A symbol used in bash that will come up later when we cover getopt is the colon, which refers to the existence of an argument, or the variables value (text or otherwise) that you’d be expecting to have set. Dollar sign substitution mimics this concept when allowing you to ad hoc check for empty (or ‘null’) variables by following a standard ‘$variable‘ with ‘:?’ (finished product: ${variable:?})- in other words, it’s a test if the $variable expanded into a ‘real’ value, and it will exit the script at that point with an error if unset, like an ejector seat.

Moving on to the less heavy expansions, the next is… command lookups in the run environments PATH, which are evaluated like regular(western) sentences, from left to right.
As it traipses along down a line running a command, it follows that commands rules regarding if it’s supposed to expect certain switches and arguments, and assumes those are split by some sort of separation (whitespace by default), referred to as the Internal Field Separator. The order of expansion continues with this ‘word splitting’.

And finally, there’s regular old pathname pattern matching – if you’re processing a file or folder in a directory, it find the first instance that matches to evaluate that – pretty straightforward. You may notice we’re often linking to the Bash Guide for Beginners site, as hosted by The Linux Documentation Project. Beyond that resource, there’s also videos from 2011(iTunesU link) and 2012(youtube) Penn State Mac Admins conference on this topic if you need a refresher before we forge ahead for a few more posts.

Back to Basics with Bash

Tuesday, July 17th, 2012

The default shell for Macs has been bash for as long as some of us can remember(as long as we forget it was tcsh through 10.2.8… and before that… there was no shell, it was OS 9!) Bash as a scripting language doesn’t get the best reputation as it is certainly suboptimal and generally unoptimized for modern workflows. To get common things done you need to care about procedural tasks and things can become very ‘heavy’ very quickly. With more modern programming languages that have niceties like API’s and libraries, the catchphrase you’ll hear is you get loads of functionality ‘for free,’ but it’s good to know how far we can get, and why those object-oriented folks keep telling us we’re missing out. And, although most of us are using bash every time we open a shell(zsh users probably know all this stuff anyway) there are things a lot of us aren’t doing in scripts that could be better. Bash is not going away, and is plenty serviceable for ‘lighter’, one-off tasks, so over the course of a few posts we’ll touch on bash-related topics.

Some things even a long-time scripter may easily overlook is how we might set variables more smartly and _often_, making good decisions and being specific about what we choose to variable-ize. If the purpose of a script is to customize things in a way that’s reusable, making a variable out of that customization (say, for example, a hostname or notification email address) allows us to easily re-set that variable in the future. And in our line of work, if you do something once, it is highly probable you’ll do it again.

Something else you may have seen in certain scripts is the PATH variable being explicitly set or overridden, under the assumption that may not be set in the environment the script runs in, or the droids binaries we’re looking for will definitely be found once we set the path directories specifically. This is well-intentioned, but imprecise to put it one way, clunky to put it another. Setting a custom path, or having binaries customized that could end up interacting with our script may cause unintended issues, so some paranoia should be exhibited. As scientists and troubleshooters, being as specific as possible always pays returns, so a guiding principle we should consider adopting is to, instead of setting the path and assuming, make a variable for each binary called as part of a script.

Now would probably be a good time to mention a few things that assist us when setting variables for binaries. Oh, and as conventions go, it helps to leave variable names that are set for binaries as lowercase, and all caps for the customizations we’re shoving in, which helps us visually see only our customized info in all caps as we debug/inspect the script and when we go in to update those variables for a new environment. /usr/bin/which tells us what is the path to the binary which is currently the first discovered in our path, for example ‘which which’ tells us we first found a version of ‘which’ in /usr/bin. Similarly, you may realize from its name what /usr/bin/whereis does. Man pages as a mini-topic is also discussed here. However, a more useful way to tell if you’re using the most efficient version of a binary is to check it with /usr/bin/type. If it’s a shell builtin, like echo, it may be faster than alternatives found at other paths, and you may not even find it necessary to make a variable for it, since there is little chance someone has decided to replace bash’s builtin ‘cd’…

The last practice we’ll try to spread the adoption of is using declare when setting variables. Again, while a lazy sysadmin is a good sysadmin, a precise one doesn’t have to worry about as many failures. A lack of portability across shells helped folks overlook it, but this is useful even if it is bash-specific. When you use declare and -r for read-only, you’re ensuring your variable doesn’t accidentally get overwritten later in the script. Just like the tool ‘set’ for shell settings, which is used to debug scripts when using the xtrace option for tracing how variables are expanded and executed, you can remove the type designation from variables with a +, e.g. set +x. Integers can be ensured by using -i (which frees us from using ‘let’ when we are just simply setting a number), arrays with -a, and when you need a variable to stick around for longer than the individual script it’s set in or the current environment, you can export the variable with -x. Alternately, if you must use the same exact variable name with a different value inside a nested script, you can set the variable as local so you don’t ‘cross the streams’. We hope this starts a conversation on proper bash-ing, look forward to more ‘back to basics’ posts like this one.