The Unix Shell

(Re)Introducing the Shell

Overview

Teaching: 5 min
Exercises: 0 min
Questions
  • What is a command shell and why would I use one?

Objectives
  • Explain how the shell relates to the keyboard, the screen, the operating system, and users’ programs.

  • Explain when and why command-line interfaces should be used instead of graphical interfaces.

The Shell

This course assumes you have either taken our introductory shell course already, or have broadly similar background knowledge. Here is a reminder of what we mean by the shell.

The shell is a program where users can type commands. With the shell, it’s possible to invoke complicated programs like climate modeling software or simple commands that create an empty directory with only one line of code. The most popular Unix shell is Bash. Bash is the default shell on most modern implementations of Unix and in most packages that provide Unix-like tools for Windows.

The grammar of a shell allows you to combine existing tools into powerful pipelines and handle large volumes of data automatically. Sequences of commands can be written into a script, improving the reproducibility of workflows.

In addition, the command line is often the easiest way to interact with remote machines and supercomputers. Familiarity with the shell is near essential to run a variety of specialized tools and resources including high-performance computing systems. As clusters and cloud computing systems become more popular for scientific data crunching, being able to interact with the shell is becoming a necessary skill. We can build on the command-line skills covered here to tackle a wide range of scientific questions and computational challenges.

Phillipa’s Pipeline: A Typical Problem

Phillipa Frogg, an ecologist, wants to use the Living Planet Index dataset to help her with her research. However, she is unable to use the raw data directly; instead, she has to edit the data so it’s in a suitable format for her to make best use of. Although she could do this by hand in a text editor, this would be laborious, time-consuming, and error-prone. With the shell, Phillipa can instead assign her computer this mundane task while she focuses her attention on writing her latest paper.

The next few lessons will explore the ways Phillipa can achieve this. More specifically, they explain how she can use a command shell to run shell programs, and use loops to automate the repetitive steps of entering file names, so that her computer can work while she writes her paper.

As a bonus, once she has put a processing pipeline together, she will be able to use it again whenever she collects more data.

In order to achieve her task, Phillipa needs to know how to:

Key Points

  • A shell is a program whose primary purpose is to read commands and run other programs.

  • This lesson uses Bash, the default shell in many implementations of Unix.

  • Programs can be run in Bash by entering commands at the command-line prompt.

  • The shell’s main advantages are its high action-to-keystroke ratio, its support for automating repetitive tasks, and its capacity to access networked machines.

  • The shell’s main disadvantages are its primarily textual nature and how cryptic its commands and operation can be.


Manual Pages

Overview

Teaching: 5 min
Exercises: 0 min
Questions
  • How to use man pages?

Objectives
  • Use man to display the manual page for a given command.

  • Explain how to read the synopsis of a given command while using man.

  • Search for specific options or flags in the manual page for a given command.

We can get help for any Unix command with the man (short for manual) command. For example, here is the command to look up information on cp:

$ man cp

The output displayed is referred to as the “man page”.

Note, if you are using Git Bash for Windows, man pages are not available. However, you can find them on the Web if you search for a term such as “man cp”. You can also some help from many commands with the --help option, whether using Git Bash and other systems:

$ cp --help

Most man pages contain more information than can fit in one terminal screen.
To help facilitate reading, the man command tries to use a “pager” to move and search through the information screenfull by screenfull. The most common pager is called less. Detailed information is available using man less. less is typically the default pager for Unix systems and other tools may use it for output paging as well.

When less displays a colon ‘:’, we can press the space bar to get the next page, the letter ‘h’ to get help, or the letter ‘q’ to quit.

man’s output is typically complete but concise, as it is designed to be used as a reference rather than a tutorial. Most man pages are divided into sections:

Other sections we might see include AUTHOR, REPORTING BUGS, COPYRIGHT, HISTORY, (known) BUGS, and COMPATIBILITY.

How to Read the Synopsis

Here is the is synopsis for the cp command on Ubuntu Linux:

SYNOPSIS
   cp [OPTION]... [-T] SOURCE DEST
   cp [OPTION]... SOURCE... DIRECTORY
   cp [OPTION]... -t DIRECTORY SOURCE...

This tells the reader that there are three ways to use the command. Let’s look at the first usage:

cp [OPTION]... [-T] SOURCE DEST

[OPTION] means the cp command can be followed by one or more optional flags. We can tell they’re optional because of the square brackets, and we can tell that one or more are welcome because of the ellipsis (…). For example, the fact that [-T] is in square brackets, but after the ellipsis, means that it’s optional, but if it’s used, it must come after all the other options.

SOURCE refers to the source file or directory, and DEST to the destination file or directory. Their precise meanings are explained at the top of the DESCRIPTION section.

The other two usage examples can be read in similar ways. Note that to use the last one, the -t option is mandatory (because it isn’t shown in square brackets).

The DESCRIPTION section starts with a few paragraphs explaining the command and its use, then expands on the possible options one by one.

Finding Help on Specific Options

If we want to skip ahead to the option you’re interested in, we can search for it using the slash key ‘/’. (This isn’t part of the man command: it’s a feature of less.) For example, to find out about -t, we can type /-t and press return. After that, we can use the ‘n’ key to navigate to the next match until we find the detailed information we need:

-t, --target-directory=DIRECTORY
     copy all SOURCE arguments into DIRECTORY

This means that this option has the short form -t and the long form --target-directory and that it takes an argument. Its meaning is to copy all the SOURCE arguments into DIRECTORY.

Limitations of Man Pages

Man pages can be useful for a quick confirmation of how to run a command, but they are not famous for being readable. If you can’t find what you need in the man page— or you can’t understand what you’ve found— try entering “unix command copy file” into your favorite search engine: it will often produce more helpful results.

You May Also Enjoy…

The explainshell.com site does a great job of breaking complex Unix commands into parts and explaining what each does. Sadly, it doesn’t work in reverse…

Key Points

  • man command displays the manual page for a given command.

  • [OPTION]... means the given command can be followed by one or more optional flags.

  • Flags specified after ellipsis are still optional but must come after all other flags.

  • While inside the manual page,use / followed by your pattern to do interactive searching.


Loops

Overview

Teaching: 35 min
Exercises: 15 min
Questions
  • How can I perform the same actions on many different files?

Objectives
  • Write a loop that applies one or more commands separately to each file in a set of files.

  • Trace the values taken on by a loop variable during execution of the loop.

  • Explain the difference between a variable’s name and its value.

  • Explain why spaces and some punctuation characters shouldn’t be used in file names.

  • Demonstrate how to see what commands have recently been executed.

  • Re-run recently executed commands without retyping them.

Loops are a programming construct which allow us to repeat a command or set of commands for each item in a list. As such they are key to productivity improvements through automation. Similar to wildcards and tab completion, using loops also reduces the amount of typing required (and hence reduces the number of typing mistakes).

Suppose we have several hundred files containing population time series data. For this example, we’ll use the exercise-data/populations directory which only has six such files, but the principles can be applied to many many more files at once. Each file contains population time series for one species, from the Living Planet Database of the Living Planet Index.

The structure of these files is the same: each line gives data for one population time series, as tab-delimited text.

Column headings are given on the first line of the combined-data file six_species.csv, which can be displayed as follows:

$ head -n 1 six-species.csv

Let’s look at the files:

$ head -n 5 bowerbird.txt dunnock.txt python.txt shark.txt toad.txt wildcat.txt

Due to the amount of data in each line, the output is visually confusing.

We would like to print out the class (high-level classification) for the species in each file. Class is given in the fifth column. For each file, we would need to execute the command cut -f 5 and pipe this to sort and uniq. We’ll use a loop to solve this problem, but first let’s look at the general form of a loop, using the pseudo-code below:

for thing in list_of_things
do
    operation_using $thing    # Indentation within the loop is not required, but aids legibility
done

and we can apply this to our example like this:

$ for filename in bowerbird.txt dunnock.txt python.txt shark.txt toad.txt wildcat.txt
> do
>     cut -f 5 $filename | sort | uniq
> done
Aves
Aves
Reptilia
Elasmobranchii
Amphibia
Mammalia

This shows us the first two files contain data on a species in the class Aves, the third contains data on a species in Reptilia, and so on.

Follow the Prompt

The shell prompt changes from $ to > and back again as we were typing in our loop. The second prompt, >, is different to remind us that we haven’t finished typing a complete command yet. A semicolon, ;, can be used to separate two commands written on a single line.

When the shell sees the keyword for, it knows to repeat a command (or group of commands) once for each item in a list. Each time the loop runs (called an iteration), an item in the list is assigned in sequence to the variable, and the commands inside the loop are executed, before moving on to the next item in the list. Inside the loop, we call for the variable’s value by putting $ in front of it. The $ tells the shell interpreter to treat the variable as a variable name and substitute its value in its place, rather than treat it as text or an external command.

In this example, the list is six filenames: bowerbird.txt, dunnock.txt, python.txt, shark.txt, toad.txt and wildcat.txt. Each time the loop iterates, it will assign a file name to the variable filename and run the cut command. The first time through the loop, $filename is bowerbird.txt. The interpreter runs the command cut -f 5 on bowerbird.txt and pipes the output to the sort command. Then it pipes the output of the sort command to the uniq command, which prints its output to the terminal. For the second iteration, $filename becomes dunnock.txt. The interpreter runs the command cut -f 5 on dunnock.txt and pipes the output to the sort command. Then it pipes the output of the sort command to the uniq command, which prints its output to the terminal. This continues until each of the filenames in turn has been assigned to the variable $filename. After the final item, wildcat.txt, the shell exits the for loop.

Same Symbols, Different Meanings

Here we see > being used as a shell prompt, whereas > is also used to redirect output. Similarly, $ is used as a shell prompt, but, as we saw earlier, it is also used to ask the shell to get the value of a variable.

If the shell prints > or $ then it expects you to type something, and the symbol is a prompt.

If you type > or $ yourself, it is an instruction from you that the shell should redirect output or get the value of a variable.

When using variables it is also possible to put the names into curly braces to clearly delimit the variable name: $filename is equivalent to ${filename}, but is different from ${file}name. You may find this notation in other people’s programs.

We have called the variable in this loop filename in order to make its purpose clearer to human readers. The shell itself doesn’t care what the variable is called; if we wrote this loop as:

$ for x in bowerbird.txt dunnock.txt python.txt shark.txt toad.txt wildcat.txt
> do
>     cut -f 5 $x | sort | uniq
> done

or:

$ for temperature in bowerbird.txt dunnock.txt python.txt shark.txt toad.txt wildcat.txt
> do
>     cut -f 5 $temperature | sort | uniq
> done

it would work exactly the same way. Don’t do this. Programs are only useful if people can understand them, so meaningless names (like x) or misleading names (like temperature) increase the odds that the program won’t do what its readers think it does.

In the above examples, the variables (thing, filename, x and temperature) could have been given any other name, as long as it is meaningful to both the person writing the code and the person reading it.

Note also that loops can be used for other things than filenames, like a list of numbers or a subset of data.

Write your own loop

How would you write a loop that echoes all 10 numbers from 0 to 9?

Solution

$ for loop_variable in 0 1 2 3 4 5 6 7 8 9
> do
>     echo $loop_variable
> done
0
1
2
3
4
5
6
7
8
9

Variables in Loops

This exercise refers to the shell-lesson-data/exercise-data/populations directory. ls *.txt gives the following output:

bowerbird.txt  dunnock.txt  python.txt  shark.txt  toad.txt  wildcat.txt

What is the output of the following code?

$ for datafile in *.txt
> do
>     ls *.txt
> done

Now, what is the output of the following code?

$ for datafile in *.txt
> do
>     ls $datafile
> done

Why do these two loops give different outputs?

Solution

The first code block gives the same output on each iteration through the loop. Bash expands the wildcard *.txt within the loop body (as well as before the loop starts) to match all files ending in .txt and then lists them using ls. The expanded loop would look like this:

$ for datafile in bowerbird.txt  dunnock.txt  python.txt  shark.txt  toad.txt  wildcat.txt
> do
>     ls bowerbird.txt  dunnock.txt  python.txt  shark.txt  toad.txt  wildcat.txt
> done
bowerbird.txt  dunnock.txt  python.txt  shark.txt  toad.txt  wildcat.txt
bowerbird.txt  dunnock.txt  python.txt  shark.txt  toad.txt  wildcat.txt
bowerbird.txt  dunnock.txt  python.txt  shark.txt  toad.txt  wildcat.txt
bowerbird.txt  dunnock.txt  python.txt  shark.txt  toad.txt  wildcat.txt
bowerbird.txt  dunnock.txt  python.txt  shark.txt  toad.txt  wildcat.txt
bowerbird.txt  dunnock.txt  python.txt  shark.txt  toad.txt  wildcat.txt

The second code block lists a different file on each loop iteration. The value of the datafile variable is evaluated using $datafile, and then listed using ls.

bowerbird.txt
dunnock.txt
python.txt
shark.txt
toad.txt
wildcat.txt

Limiting Sets of Files

What would be the output of running the following loop in the shell-lesson-data/exercise-data/populations directory?

$ for filename in t*
> do
>     ls $filename
> done
  1. No files are listed.
  2. All files are listed.
  3. Only python.txt toad.txt and wildcat.txt are listed.
  4. Only toad.txt is listed.

Solution

4 is the correct answer. * matches zero or more characters, so any file name starting with the letter t, followed by zero or more other characters will be matched.

How would the output differ from using this command instead?

$ for filename in *t*
> do
>     ls $filename
> done
  1. The same files will be listed.
  2. The files bowerbird.txt, dunnock.txt, python.txt, shark.txt, toad.txt and wildcat.txt will be listed.
  3. No files are listed this time.
  4. The files python.txt and toad.txt will be listed.
  5. Only the file six-species.csv will be listed.

Solution

2 is the correct answer. * matches zero or more characters, so a file name with zero or more characters before a letter t and zero or more characters after the letter t will be matched. In other words, and file name containing at least one t will be listed.

Saving to a File in a Loop - Part One

In the shell-lesson-data/exercise-data/populations directory, what is the effect of this loop?

for species in *.txt
do
    echo $species
    cat $species > species.txt
done
  1. Prints bowerbird.txt, dunnock.txt, python.txt, shark.txt, toad.txt and wildcat.txt, and the text from wildcat.txt will be saved to a file called species.txt.
  2. Prints bowerbird.txt, dunnock.txt, python.txt, shark.txt, , toad.txt and wildcat.txt, and the text from all six files would be concatenated and saved to a file called species.txt.
  3. Prints bowerbird.txt, dunnock.txt, python.txt, shark.txt, toad.txt and wildcat.txt, and the text from bowerbird.txt will be saved to a file called species.txt.
  4. None of the above.

Solution

  1. The text from each file in turn gets written to the species.txt file. However, the file gets overwritten on each loop iteration, so the final content of species.txt is the text from the wildcat.txt file.

Saving to a File in a Loop - Part Two

Also in the shell-lesson-data/exercise-data/populations directory, remove the file you created above:

rm species.txt

Use ls to check you only have the files we provided, i.e.

bowerbird.txt  dunnock.txt  python.txt  shark.txt  six-species.csv  toad.txt  wildcat.txt

Now, what would be the output of the following loop?

for datafile in *.txt
do
    cat $datafile >> all.txt
done
  1. All of the text from bowerbird.txt, dunnock.txt, python.txt, shark.txt and toad.txt would be concatenated and saved to a file called all.txt.
  2. The text from bowerbird.txt will be saved to a file called all.txt.
  3. All of the text from bowerbird.txt, dunnock.txt, python.txt, shark.txt, toad.txt and wildcat.txt would be concatenated and saved to a file called all.txt.
  4. All of the text from bowerbird.txt, dunnock.txt, python.txt, shark.txt, toad.txt and wildcat.txt would be printed to the screen and saved to a file called all.txt.

Solution

3 is the correct answer. >> appends to a file, rather than overwriting it with the redirected output from a command. Given the output from the cat command has been redirected, nothing is printed to the screen.

Here’s a slightly more complicated loop:

$ for filename in *.txt
> do
>     echo $filename
>     head -n 10 $filename | tail -n 1
> done

The shell starts by expanding *.txt to create the list of files it will process. The loop body then executes two commands for each of those files. The first command, echo, prints its command-line arguments to standard output. For example:

$ echo hello there

prints:

hello there

In this case, since the shell expands $filename to be the name of a file, echo $filename prints the name of the file. Note that we can’t write this as:

$ for filename in *.txt
> do
>     $filename
>     head -n 10 $filename | tail -n 1
> done

because then the first time through the loop, when $filename expanded to bowerbird.txt, the shell would try to run bowerbird.txt as a program. Finally, the head and tail combination selects line 10 from whatever file is being processed (assuming the file has at least 10 lines; otherwise it selects the last line of the file).

Spaces in Names

Spaces are used to separate the elements of the list that we are going to loop over. If one of those elements contains a space character, we need to surround it with quotes, and do the same thing to our loop variable. Suppose our data files are named:

red dragon.txt
purple unicorn.txt

To loop over these files, we would need to add double quotes like so:

$ for filename in "red dragon.txt" "purple unicorn.txt"
> do
>     head -n 10 "$filename" | tail -n 1
> done

It is simpler to avoid using spaces (or other special characters) in filenames.

The files above don’t exist, so if we run the above code, the head command will be unable to find them, however the error message returned will show the name of the files it is expecting:

head: cannot open 'red dragon.txt' for reading: No such file or directory
head: cannot open 'purple unicorn.txt' for reading: No such file or directory

Try removing the quotes around $filename in the loop above to see the effect of the quote marks on spaces.

head: cannot open 'red' for reading: No such file or directory
head: cannot open 'dragon.txt' for reading: No such file or directory
head: cannot open 'purple' for reading: No such file or directory
head: cannot open 'unicorn.txt' for reading: No such file or directory

We would like to modify each of the six files for individual species in shell-lesson-data/exercise-data/populations, but also save a version of the original files, naming the copies original-bowerbird.txt, original-dunnock.txt, original-python.txt, and so on. We can’t use:

$ cp *.txt original-*.txt

because that would expand to:

$ cp bowerbird.txt  dunnock.txt  python.txt  shark.txt  toad.txt  wildcat.txt original-*.txt

This wouldn’t back up our files, instead we get an error:

cp: target `original-*.txt' is not a directory

This problem arises when cp receives more than two inputs. When this happens, it expects the last input to be a directory where it can copy all the files it was passed. Since there is no directory named original-*.txt in the populations directory we get an error.

Instead, we can use a loop:

$ for filename in *.txt
> do
>     cp $filename original-$filename
> done

This loop runs the cp command once for each filename. The first time, when $filename expands to bowerbird.txt, the shell executes:

cp bowerbird.txt original-bowerbird.txt

The second time, the command is:

cp dunnock.txt original-dunnock.txt

The third time, the command is:

cp python.txt original-python.txt

and so on, until a copy of each of the six files has been made.

Since the cp command does not normally produce any output, it’s hard to check that the loop is doing the correct thing. However, we learned earlier how to print strings using echo, and we can modify the loop to use echo to print our commands without actually executing them. As such we can check what commands would be run in the unmodified loop.

The following diagram shows what happens when the modified loop is executed, and demonstrates how the judicious use of echo is a good debugging technique.

The for loop "for filename in *.txt; do echo cp $filename original-$filename;
done" will successively assign the names of all "*.txt" files in your current
directory to the variable "$filename" and then execute the command. With the
files "bowerbird.txt", "dunnock.txt", "python.txt", "shark.txt",
"toad.txt" and "wildcat.txt" in the current directory
the loop will successively call the echo command six times and print six
lines: "cp bowerbird.txt original-bowerbird.txt",
then "cp dunnock.txt original-dunnock.txt",
"cp python.txt original-python.txt",
"cp shark.txt original-shark.txt",
"cp toad.txt original-toad.txt",
and finally "cp wildcat.txt original-wildcat.txt"

Keyboard shortcuts for moving around the command line

We can move to the beginning of a line in the shell by typing Ctrl+A and to the end using Ctrl+E. This may be easier and faster than using the left and right cursor keys.

An extensive range of shortcuts is provided by the shell. To discover more, try a Web search for “bash keyboard shortcuts”.

Those Who Know History Can Choose to Repeat It

Another way to repeat previous work is to use the history command to get a list of the last few hundred commands that have been executed, and then to use !123 (where ‘123’ is replaced by the command number) to repeat one of those commands. For example, if a user types this:

$ history | tail -n 5

and happens to see this in the output:

  456  ls -l NENE0*.txt
  457  rm stats-NENE01729B.txt.txt
  458  bash goostats.sh NENE01729B.txt stats-NENE01729B.txt
  459  ls -l NENE0*.txt
  460  history

then she can re-run goostats.sh on NENE01729B.txt simply by typing !458.

Other History Commands

There are a number of other shortcut commands for getting at the history.

  • Ctrl+R enters a history search mode ‘reverse-i-search’ and finds the most recent command in your history that matches the text you enter next. Press Ctrl+R one or more additional times to search for earlier matches. You can then use the left and right arrow keys to choose that line and edit it then hit Return to run the command.
  • !! retrieves the immediately preceding command (you may or may not find this more convenient than using )
  • !$ retrieves the last word of the last command. That’s useful more often than you might expect: after bash goostats.sh NENE01729B.txt stats-NENE01729B.txt, you can type less !$ to look at the file stats-NENE01729B.txt, which is quicker than doing and editing the command-line.

Doing a Dry Run

A loop is a way to do many things at once — or to make many mistakes at once if it does the wrong thing. One way to check what a loop would do is to echo the commands it would run instead of actually running them.

Suppose we want to preview the commands the following loop will execute without actually running those commands:

$ for datafile in *.txt
> do
>     cat $datafile >> all.txt
> done

What is the difference between the two loops below, and which one would we want to run?

# Version 1
$ for datafile in *.txt
> do
>     echo cat $datafile >> all.txt
> done
# Version 2
$ for datafile in *.txt
> do
>     echo "cat $datafile >> all.txt"
> done

Solution

The second version is the one we want to run. This prints to screen everything enclosed in the quote marks, expanding the loop variable name because we have prefixed it with a dollar sign. It also does not modify nor create the file all.txt, as the >> is treated literally as part of a string rather than as a redirection instruction.

The first version appends the output from the command echo cat $datafile to the file, all.txt. This file will just contain the list; cat bowerbird.txt, cat dunnock.txt, cat python.txt etc.

Try both versions for yourself to see the output! Be sure to open the all.txt file to view its contents.

Nested Loops

Suppose we want to set up a directory structure to organize some experiments measuring reaction rate constants with different compounds and different temperatures. What would be the result of the following code:

$ for species in bowerbird dunnock python
> do
>     for continent in Africa Asia Europe
>     do
>         mkdir $species-$continent
>     done
> done

Solution

We have a nested loop, i.e. contained within another loop, so for each species in the outer loop, the inner loop (the nested loop) iterates over the list of three continents, and creates a new directory for each combination.

Try running the code for yourself to see which directories are created!

Key Points

  • A for loop repeats commands once for every thing in a list.

  • Every for loop needs a variable to refer to the thing it is currently operating on.

  • Use $name to expand a variable (i.e., get its value). ${name} can also be used.

  • Do not use spaces, quotes, or wildcard characters such as ‘*’ or ‘?’ in filenames, as it complicates variable expansion.

  • Give files consistent names that are easy to match with wildcard patterns to make it easy to select them for looping.

  • Use the up-arrow key to scroll up through previous commands to edit and repeat them.

  • Use Ctrl+R to search through the previously entered commands.

  • Use history to display recent commands, and ![number] to repeat a command by number.


Shell Scripts

Overview

Teaching: 30 min
Exercises: 15 min
Questions
  • How can I save and re-use commands?

Objectives
  • Write a shell script that runs a command or series of commands for a fixed set of files.

  • Run a shell script from the command line.

  • Write a shell script that operates on a set of files defined by the user on the command line.

  • Create pipelines that include shell scripts you, and others, have written.

We are finally ready to see what makes the shell such a powerful programming environment. We are going to take the commands we repeat frequently and save them in files so that we can re-run all those operations again later by typing a single command. For historical reasons, a bunch of commands saved in a file is usually called a shell script, but make no mistake: these are actually small programs.

Not only will writing shell scripts make your work faster — you won’t have to retype the same commands over and over again — it will also make it more accurate (fewer chances for typos) and more reproducible. If you come back to your work later (or if someone else finds your work and wants to build on it) you will be able to reproduce the same results simply by running your script, rather than having to remember or retype a long list of commands.

For this example, we’ll agin use the exercise-data/populations directory containing population time series for six species, from the Living Planet Database of the Living Planet Index.

Let’s start by going back to populations/ and creating a new file, middle.sh, which will become our shell script. Use cd if required, to change to this directory, then pwd to check you are in the right directory. Then:

$ nano middle.sh

The command nano middle.sh opens the file middle.sh within the text editor ‘nano’ (which runs within the shell). If the file does not exist, it will be created. We can use the text editor to directly edit the file – we’ll simply insert the following line:

head -n 10 shark.txt | tail -n 2

This is a variation on the pipe we constructed earlier: it selects lines 9-10 of the file shark.txt. Remember, we are not running it as a command just yet: we are putting the commands in a file.

Then we save the file (Ctrl-O in nano), and exit the text editor (Ctrl-X in nano). Check that the directory populations now contains a file called middle.sh.

Once we have saved the file, we can ask the shell to execute the commands it contains. Our shell is called bash, so we run the following command:

$ bash middle.sh
19586   Carcharodon_carcharias  0       Dicken_M._L._M._J._Smale_et_al._(2013)._White_sharks_Carcharodon_carcharias_at_Bird_Island_Algoa_Bay_South_Africa._African_Journal_of_Marine_Science_35(2):_175-182     Elasmobranchii  Lamniformes    Lamnidae Carcharodon     carcharias              (Linnaeus_1758) Great_white_shark       Bird_Island_Algoa_Bay_Eastern_Cape      South_Africa    South_Africa    Africa  NULL    NULL    -33.5   25.775554       1       Marine  NULL    NULL   NULL     NULL    Tropical_and_subtropical_Indo-Pacific   Indian_Ocean    Unknown 0       Sightings_per_unit_effort_SPUE_(**hr)   Visual_census   NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    0.225   0.487   0       NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
19587   Carcharodon_carcharias  0       Ryklief_R._P._A._Pistorius_et_al._(2014)._Spatial_and_seasonal_patterns_in_sighting_rate_and_life-history_composition_of_the_white_shark_Carcharodon_carcharias_at_Mossel_Bay_South_Africa._African_Journal_of_Marine_Science_36(4):_449-453    Elasmobranchii  Lamniformes     Lamnidae        Carcharodon     carcharias     (Linnaeus_1758)  Great_white_shark       Seal_Island_Mossel_Bay_Western_Cape     South_Africa    South_Africa    Africa NULL     NULL    -34.151089      22.119689       1       Marine  NULL    NULL    NULL    NULL    Tropical_and_subtropical_Indo-Pacific   Indian_Ocean    Unknown 0       Sightings_per_unit_effort_SPUE_(**hr)   Visual_census_Feb-Dec   NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    1.6809  1.0745  2.1702 NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL

Sure enough, our script’s output is exactly what we would get if we ran that pipeline directly.

Text vs. Whatever

We usually call programs like Microsoft Word or LibreOffice Writer “text editors”, but we need to be a bit more careful when it comes to programming. By default, Microsoft Word uses .docx files to store not only text, but also formatting information about fonts, headings, and so on. This extra information isn’t stored as characters and doesn’t mean anything to tools like head. When editing programs, therefore, you must either use a plain text editor, or be careful to save files as plain text.

What if we want to select lines from an arbitrary file? We could edit middle.sh each time to change the filename, but that would probably take longer than typing the command out again in the shell and executing it with a new file name. Instead, let’s edit middle.sh and make it more versatile:

$ nano middle.sh

Now, within “nano”, replace the text shark.txt with the special variable called $1:

head -n 10 "$1" | tail -n 2

Inside a shell script, $1 means ‘the first filename (or other argument) on the command line’. We can now run our script like this:

$ bash middle.sh shark.txt
19586   Carcharodon_carcharias  0       Dicken_M._L._M._J._Smale_et_al._(2013)._White_sharks_Carcharodon_carcharias_at_Bird_Island_Algoa_Bay_South_Africa._African_Journal_of_Marine_Science_35(2):_175-182     Elasmobranchii  Lamniformes    Lamnidae Carcharodon     carcharias              (Linnaeus_1758) Great_white_shark       Bird_Island_Algoa_Bay_Eastern_Cape      South_Africa    South_Africa    Africa  NULL    NULL    -33.5   25.775554       1       Marine  NULL    NULL   NULL     NULL    Tropical_and_subtropical_Indo-Pacific   Indian_Ocean    Unknown 0       Sightings_per_unit_effort_SPUE_(**hr)   Visual_census   NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    0.225   0.487   0       NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
19587   Carcharodon_carcharias  0       Ryklief_R._P._A._Pistorius_et_al._(2014)._Spatial_and_seasonal_patterns_in_sighting_rate_and_life-history_composition_of_the_white_shark_Carcharodon_carcharias_at_Mossel_Bay_South_Africa._African_Journal_of_Marine_Science_36(4):_449-453    Elasmobranchii  Lamniformes     Lamnidae        Carcharodon     carcharias     (Linnaeus_1758)  Great_white_shark       Seal_Island_Mossel_Bay_Western_Cape     South_Africa    South_Africa    Africa NULL     NULL    -34.151089      22.119689       1       Marine  NULL    NULL    NULL    NULL    Tropical_and_subtropical_Indo-Pacific   Indian_Ocean    Unknown 0       Sightings_per_unit_effort_SPUE_(**hr)   Visual_census_Feb-Dec   NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    1.6809  1.0745  2.1702 NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL

or on a different file like this:

$ bash middle.sh toad.txt
9084    Bufo_bufo       0       Cooke_A._S._and_R._S._Oldham_(1995)._Establishment_of_populations_of_the_common_frog_Rana_temporaria_and_common_toad_Bufo_bufo_in_a_newly_created_reserve_following_translocation._Herpetological_Journal_5(1):_173-180.        Amphibia        Anura   Bufonidae       Bufo    bufo    NULL    (Linnaeus_1758) Common_toad     The_Boardwalks_Reserve_north_bank_of_the_River_Nene_near_the_western_edge_of_Peterborough       United_Kingdom  United_Kingdom Europe   Europe_and_Central_Asia Central_and_Western_Europe      52.55444        -0.26444        0       Freshwater     NULL     NULL    Palearctic      Temperate_floodplain_rivers_and_wetlands        NULL    NULL    NULL    0       Peak_total_toad_count   Counts_during_breeding_season   NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    0       NULL    127    311      181     328     306     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL
18832   Bufo_bufo       0       Jedrzejewska_B._et_al._(2002)._Seasonal_dynamics_and_breeding_of_amphibians_in_pristine_forests_(Bialowieza_National_Park_E_Poland)_in_dry_years._Folia_Zoologica_52(1):_77-86. Amphibia        Anura   Bufonidae       Bufo    bufo            (Linnaeus_1758) Common_toad     Oak-hornbeam-lime_forests_Bia?�owie??a_National_Park_East_Poland        Poland  Poland  Europe  Europe_and_Central_Asia Central_and_Western_Europe      52.75   23.916667      Terrestrial      Palearctic      Temperate_broadleaf_and_mixed_forests   NULL    NULL    NULL    NULL    NULL    0      Number_of_individuals*ha Live_trapping_on_8_30x30m_grids NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    71.5953 45.1319 NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL

Double-Quotes Around Arguments

For the same reason that we put the loop variable inside double-quotes, in case the filename happens to contain any spaces, we surround $1 with double-quotes.

Currently, we need to edit middle.sh each time we want to adjust the range of lines that is returned. Let’s fix that by configuring our script to instead use three command-line arguments. Each additional argument that we provide will be accessible via the special variables $1, $2, $3, which refer to the first, second, third command-line arguments, respectively.

Knowing this, we can use additional arguments to define the range of lines to be passed to head and tail respectively:

$ nano middle.sh
head -n "$2" "$1" | tail -n "$3"

We can now run:

$ bash middle.sh shark.txt 10 2
19586   Carcharodon_carcharias  0       Dicken_M._L._M._J._Smale_et_al._(2013)._White_sharks_Carcharodon_carcharias_at_Bird_Island_Algoa_Bay_South_Africa._African_Journal_of_Marine_Science_35(2):_175-182     Elasmobranchii  Lamniformes    Lamnidae Carcharodon     carcharias              (Linnaeus_1758) Great_white_shark       Bird_Island_Algoa_Bay_Eastern_Cape      South_Africa    South_Africa    Africa  NULL    NULL    -33.5   25.775554       1       Marine  NULL    NULL   NULL     NULL    Tropical_and_subtropical_Indo-Pacific   Indian_Ocean    Unknown 0       Sightings_per_unit_effort_SPUE_(**hr)   Visual_census   NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    0.225   0.487   0       NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
19587   Carcharodon_carcharias  0       Ryklief_R._P._A._Pistorius_et_al._(2014)._Spatial_and_seasonal_patterns_in_sighting_rate_and_life-history_composition_of_the_white_shark_Carcharodon_carcharias_at_Mossel_Bay_South_Africa._African_Journal_of_Marine_Science_36(4):_449-453    Elasmobranchii  Lamniformes     Lamnidae        Carcharodon     carcharias     (Linnaeus_1758)  Great_white_shark       Seal_Island_Mossel_Bay_Western_Cape     South_Africa    South_Africa    Africa NULL     NULL    -34.151089      22.119689       1       Marine  NULL    NULL    NULL    NULL    Tropical_and_subtropical_Indo-Pacific   Indian_Ocean    Unknown 0       Sightings_per_unit_effort_SPUE_(**hr)   Visual_census_Feb-Dec   NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    1.6809  1.0745  2.1702 NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL

By changing the arguments to our command we can change our script’s behaviour:

$ bash middle.sh shark.txt 4 3
7701    Carcharodon_carcharias  0       Dudley_S._F._J._(2002)._Shark_Catch_Trends_and_Effort_Reduction_in_the_Beach_Protection_Program_KwaZulu-Natal_South_Africa._SCIENTIFIC_COUNCIL_MEETING_-_SEPTEMBER_2002_NAFO._*_Dudley_S._F._J._and_C._A._Simpfendorfer_(2006)._Population_status_of_14_shark_species_caught_in_the_protective_gillnets_off_KwaZulu-Natal_beaches_South_Africa_1978-2003._Marine_and_Freshwater_Research_57:_225-240.   Elasmobranchii  Lamniformes     Lamnidae       Carcharodon      carcharias              (Linnaeus_1758) Great_white_shark       Beaches_of_KwaZulu-Natal_province_South_Africa  South_Africa    South_Africa    Africa  NULL    NULL    -29.25  33.08333        0       Marine  NULL    NULL   NULL     NULL    Tropical_and_subtropical_Indo-Pacific   Indian_Ocean    Unknown 0       number*km-net_year      shark_net_catch_rates   NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   1.8      1.6     1.4     0.9     0.6     0.6     1.4     1.1     0.75    0.7     0.9     1.4     0.9     0.5     0.7    0.9      1.12    1.19    0.99    0.65    0.25    1.1     0.55    0.65    0.87    1.37    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
9057    Carcharodon_carcharias  1       Cliff_G._S._F._J._Dudley_et_al._(1996)._Catches_of_white_sharks_in_KwaZulu-Natal_South_Africa_and_environmental_influences._Great_white_sharks:_the_biology_of_Carcharodon_carcharias._A._P._Klimley_and_D._G._Ainley:_351-362. Elasmobranchii  Lamniformes     Lamnidae        Carcharodon     carcharias      NULL    (Linnaeus_1758) Great_white_shark       Natal_Coast_South_Africa        South_Africa    South_Africa    Africa  NULL    NULL   -31.71667        30.38333        0       Marine  NULL    NULL    NULL    NULL    Tropical_and_subtropical_Indo-Pacific  Indian_Ocean     Unknown 0       CPUE_(no.*km-net*yr)    Shark_nets      NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    3.9     1.9     3.5     1.3     0.9    0.6      0.3     1.8     1.1     1.5     1.7     0.9     2.2     1.8     1.3     0.7     0.6     0.4     1.5     1.2    0.7      0.8     1       1.5     1       0.8     1.6     1.8     NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL
9058    Carcharodon_carcharias  0       Cliff_G._S._F._J._Dudley_et_al._(1996)._Catches_of_white_sharks_in_KwaZulu-Natal_South_Africa_and_environmental_influences._Great_white_sharks:_the_biology_of_Carcharodon_carcharias._A._P._Klimley_and_D._G._Ainley:_351-362. Elasmobranchii  Lamniformes     Lamnidae        Carcharodon     carcharias      NULL    (Linnaeus_1758) Great_white_shark       Richards_Bay_South_Africa       South_Africa    South_Africa    Africa  NULL    NULL   -28.85   32.23333        0       Marine  NULL    NULL    NULL    NULL    Tropical_and_subtropical_Indo-Pacific   Indian_Ocean    Unknown 0       CPUE_(no.*km-net*yr)    Shark_nets      NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    5.2     3.5     1.5     2       2       2       1.1    1.8      3.2     1.4     0.5     1.1     2.9     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL

This works, but it may take the next person who reads middle.sh a moment to figure out what it does. We can improve our script by adding some comments at the top:

$ nano middle.sh
# Select lines from the middle of a file.
# Usage: bash middle.sh filename end_line num_lines
head -n "$2" "$1" | tail -n "$3"

A comment starts with a # character and runs to the end of the line. The computer ignores comments, but they’re invaluable for helping people (including your future self) understand and use scripts. The only caveat is that each time you modify the script, you should check that the comment is still accurate: an explanation that sends the reader in the wrong direction is worse than none at all.

What if we want to process many files in a single pipeline? For example, if we want to sort our .txt files by length, we would type:

$ wc -l *.txt | sort -n

because wc -l lists the number of lines in the files (recall that wc stands for ‘word count’, adding the -l option means ‘count lines’ instead) and sort -n sorts things numerically. We could put this in a file, but then it would only ever sort a list of .txt files in the current directory. If we want to be able to get a sorted list of other kinds of files, we need a way to get all those names into the script. We can’t use $1, $2, and so on because we don’t know how many files there are. Instead, we use the special variable $@, which means, ‘All of the command-line arguments to the shell script’. We also should put $@ inside double-quotes to handle the case of arguments containing spaces ("$@" is special syntax and is equivalent to "$1" "$2" …).

Here’s an example:

$ nano sorted.sh
# Sort files by their length.
# Usage: bash sorted.sh one_or_more_filenames
wc -l "$@" | sort -n
$ bash sorted.sh *.txt ../numbers.txt
    1 python.txt
    3 bowerbird.txt
    4 wildcat.txt
    5 ../numbers.txt
   11 dunnock.txt
   18 shark.txt
   20 toad.txt
   62 total

List Unique Species

Remember, you can see the column headings for our population time series files as follows:

$ head -n 1 six-species.csv

Count manually to confirm that “Binomial” (the binomial species name) is the second column, and “Country” is the 15th column and “System” is the 22nd column.

We can use the command cut -f 2,14,22 shark.txt | sort | uniq to display the unique combinations of species, country and system in shark.txt. (Note, the columns appear ragged due to the positioning of tab stops. But all the data are there.) In order to avoid having to type out this series of commands every time, a scientist may choose to write a shell script instead.

Write a shell script called species.sh that takes any number of filenames as command-line arguments, and uses a variation of the above command to print a list of the unique species appearing in each of those files separately.

Solution

# Script to find unique combinations of species, country and
# system in tab-delimited text files where the data are in
# columns 2, 14 and 22.
# This script accepts any number of file names as command line arguments.

# Loop over all files
for file in $@
do
   echo "Unique combinations of species, country and system within $file:"
   # Extract binomial species names, countries and systems
   cut -f 2,14,22 $file | sort | uniq
done

Suppose we have just run a series of commands that did something useful — for example, that created a graph we’d like to use in a paper. We’d like to be able to re-create the graph later if we need to, so we want to save the commands in a file. Instead of typing them in again (and potentially getting them wrong) we can do this:

$ history | tail -n 5 > redo-figure-3.sh

Depending on which commands we have typed recently, the file redo-figure-3.sh might now contain:

297 bash goostats.sh NENE01729B.txt stats-NENE01729B.txt
298 bash goodiff.sh stats-NENE01729B.txt /data/validated/01729.txt > 01729-differences.txt
299 cut -d ',' -f 2-3 01729-differences.txt > 01729-time-series.txt
300 ygraph --format scatter --color bw --borders none 01729-time-series.txt figure-3.png
301 history | tail -n 5 > redo-figure-3.sh

After a moment’s work in an editor to remove the serial numbers on the commands, and to remove the final line where we called the history command, we have a completely accurate record of how we created that figure.

Why Record Commands in the History Before Running Them?

If you run the command:

$ history | tail -n 5 > recent.sh

the last command in the file is the history command itself, i.e., the shell has added history to the command log before actually running it. In fact, the shell always adds commands to the log before running them. Why do you think it does this?

Solution

If a command causes something to crash or hang, it might be useful to know what that command was, in order to investigate the problem. Were the command only be recorded after running it, we would not have a record of the last command run in the event of a crash.

In practice, most people develop shell scripts by running commands at the shell prompt a few times to make sure they’re doing the right thing, then saving them in a file for re-use. This style of work allows people to recycle what they discover about their data and their workflow with one call to history and a bit of editing to clean up the output and save it as a shell script.

Variables in Shell Scripts

In the populations directory, imagine you have a shell script called script.sh containing the following commands:

head -n $2 $1
tail -n $3 $1

While you are in the populations directory, you type the following command:

$ bash script.sh '*.txt' 1 1

Which of the following outputs would you expect to see?

  1. All of the lines between the first and the last lines of each file ending in .txt in the populations directory
  2. The first of each file ending in .txt in the populations directory, followed by the last line of each such file
  3. The first and the last line of each file in the populations directory
  4. An error because of the quotes around *.txt

Solution

The correct answer is 2.

The special variables $1, $2 and $3 represent the command line arguments given to the script, such that the commands run are:

$ head -n 1 bowerbird.txt  dunnock.txt  python.txt  shark.txt  toad.txt  wildcat.txt
$ tail -n 1 bowerbird.txt  dunnock.txt  python.txt  shark.txt  toad.txt  wildcat.txt

The shell does not expand '*.pdb' because it is enclosed by quote marks. As such, the first argument to the script is '*.txt' which gets expanded within the script by head and tail.

Note, python.txt only contains a single line, so for this file the line is output twice (being both the first line and the last line.)

Find the Longest File With a Given Extension

Write a shell script called longest.sh that takes the name of a directory and a filename extension as its arguments, and prints out the name of the file with the most lines in that directory with that extension. For example:

$ bash longest.sh shell-lesson-data/exercise-data/populations txt

would print the name of the .txt file in shell-lesson-data/exercise-data/populations that has the most lines.

Feel free to test your script on another directory e.g.

$ bash longest.sh shell-lesson-data/exercise-data/writing txt

Solution

# Shell script which takes two arguments:
#    1. a directory name
#    2. a file extension
# and prints the name of the file in that directory
# with the most lines which matches the file extension.

wc -l $1/*.$2 | sort -g | tail -n 2 | head -n 1

The first part of the pipeline, wc -l $1/*.$2 | sort -g, counts the lines in each file and sorts them numerically (largest last). When there’s more than one file, wc also outputs a final summary line, giving the total number of lines across all files. We use tail -n 2 | head -n 1 to throw away this last line.

With wc -l $1/*.$2 | sort -n | tail -n 1 we’ll see the final summary line: we can build our pipeline up in pieces to be sure we understand the output.

Script Reading Comprehension

For this question, consider the shell-lesson-data/exercise-data/populations directory once again. This contains a number of files containing population time series data, in addition to any other files you may have created. Explain what each of the following three scripts would do when run as bash script1.sh *.txt, bash script2.sh *.txt, and bash script3.sh *.txt respectively.

# Script 1
echo *.*
# Script 2
for filename in $1 $2 $3
do
    cat $filename
done
# Script 3
echo $@.txt

Solutions

In each case, the shell expands the wildcard in *.txt before passing the resulting list of file names as arguments to the script.

Script 1 would print out a list of all files containing a dot in their name. The arguments passed to the script are not actually used anywhere in the script.

Script 2 would print the contents of the first 3 files with a .txt file extension. $1, $2, and $3 refer to the first, second, and third argument respectively.

Script 3 would print all the arguments to the script (i.e. all the .txt files), followed by .txt. $@ refers to all the arguments given to a shell script.

bowerbird.txt dunnock.txt python.txt script.txt shark.txt toad.txt wildcat.txt.txt

Debugging Scripts

Suppose you have saved the following script in a file called do-errors.sh in Phillipa’s north-pacific-gyre/scripts directory:

# Calculate stats for data files.
for datafile in "$@"
do
    echo $datfile
    bash goostats.sh $datafile stats-$datafile
done

When you run it from the north-pacific-gyre directory:

$ bash do-errors.sh NENE*A.txt NENE*B.txt

the output is blank. To figure out why, re-run the script using the -x option:

$ bash -x do-errors.sh NENE*A.txt NENE*B.txt

What is the output showing you? Which line is responsible for the error?

Solution

The -x option causes bash to run in debug mode. This prints out each command as it is run, which will help you to locate errors. In this example, we can see that echo isn’t printing anything. We have made a typo in the loop variable name, and the variable datfile doesn’t exist, hence returning an empty string.

Key Points

  • Save commands in files (usually called shell scripts) for re-use.

  • bash [filename] runs the commands saved in a file.

  • $@ refers to all of a shell script’s command-line arguments.

  • $1, $2, etc., refer to the first command-line argument, the second command-line argument, etc.

  • Place variables in quotes if the values might have spaces in them.

  • Letting users decide what files to process is more flexible and more consistent with built-in Unix commands.


Finding Things

Overview

Teaching: 30 min
Exercises: 20 min
Questions
  • How can I find files?

  • How can I find things in files?

Objectives
  • Use grep to select lines from text files that match simple patterns.

  • Use find to find files and directories whose names match simple patterns.

  • Use the output of one command as the command-line argument(s) to another command.

  • Explain what is meant by ‘text’ and ‘binary’ files, and why many common tools don’t handle the latter well.

In the same way that many of us now use ‘Google’ as a verb meaning ‘to find’, Unix programmers often use the word ‘grep’. ‘grep’ is a contraction of ‘global/regular expression/print’, a common sequence of operations in early Unix text editors. It is also the name of a very useful command-line program.

grep finds and prints lines in files that match a pattern. For our examples, we will use a file that contains three haiku taken from a 1998 competition in Salon magazine (Credit to authors Joy Rothke, Howard Korder, and Margaret Segall, respectively. See Haiku Error Messsages archived Page 1 and Page 2 .). For this set of examples, we’re going to be working in the writing subdirectory:

$ cd
$ cd Desktop/shell-lesson-data/exercise-data/writing
$ cat haiku.txt
The Web site you seek
cannot be located but
endless others exist.

With searching comes loss
and the presence of absence:
"My Thesis" not found.

Yesterday it worked
Today it is not working
Software is like that.

Let’s find lines that contain the word ‘not’:

$ grep not haiku.txt
cannot be located but
"My Thesis" not found.
Today it is not working

Here, not is the pattern we’re searching for. The grep command searches through the file, looking for matches to the pattern specified. To use it type grep, then the pattern we’re searching for and finally the name of the file (or files) we’re searching in.

The output is the three lines in the file that contain the letters ‘not’.

By default, grep searches for a pattern in a case-sensitive way. In addition, the search pattern we have selected does not have to form a complete word, as we will see in the next example.

Let’s search for the pattern: ‘The’.

$ grep The haiku.txt
The Web site you seek
"My Thesis" not found.

This time, two lines that include the letters ‘The’ are outputted, one of which contained our search pattern within a larger word, ‘Thesis’.

To restrict matches to lines containing the word ‘The’ on its own, we can give grep with the -w option. This will limit matches to word boundaries.

Later in this lesson, we will also see how we can change the search behavior of grep with respect to its case sensitivity.

$ grep -w The haiku.txt
The Web site you seek

Note that a ‘word boundary’ includes the start and end of a line, so not just letters surrounded by spaces. Sometimes we don’t want to search for a single word, but a phrase. This is also easy to do with grep by putting the phrase in quotes.

$ grep -w "is not" haiku.txt
Today it is not working

We’ve now seen that you don’t have to have quotes around single words, but it is useful to use quotes when searching for multiple words. It also helps to make it easier to distinguish between the search term or phrase and the file being searched. We will use quotes in the remaining examples.

Another useful option is -n, which numbers the lines that match:

$ grep -n "it" haiku.txt
1:The Web site you seek
5:With searching comes loss
9:Yesterday it worked
10:Today it is not working

Here, we can see that lines 1, 5, 9, and 10 contain the letters ‘it’.

We can combine options (i.e. flags) as we do with other Unix commands. For example, let’s find the lines that contain the word ‘the’. We can combine the option -w to find the lines that contain the word ‘the’ and -n to number the lines that match:

$ grep -n -w "the" haiku.txt
6:and the presence of absence:

Now we want to use the option -i to make our search case-insensitive:

$ grep -n -w -i "the" haiku.txt
1:The Web site you seek
6:and the presence of absence:

Now, we want to use the option -v to invert our search, i.e., we want to output the lines that do not contain the word ‘the’.

$ grep -n -w -v "the" haiku.txt
1:The Web site you seek
2:cannot be located but
3:endless others exist.
4:
5:With searching comes loss
7:"My Thesis" not found.
8:
9:Yesterday it worked
10:Today it is not working
11:Software is like that.

If we use the -r (recursive) option, grep can search for a pattern recursively through a set of files in subdirectories.

Let’s search recursively for Yesterday in the shell-lesson-data/exercise-data/writing directory:

$ grep -r Yesterday .
./haiku.txt:Yesterday it worked
./LittleWomen.txt:"Yesterday, when Aunt was asleep and I was trying to be as still as a
./LittleWomen.txt:Yesterday at dinner, when an Austrian officer stared at us and then
./LittleWomen.txt:Yesterday was a quiet day spent in teaching, sewing, and writing in my

grep has lots of other options. To find out what they are, we can type:

$ grep --help
Usage: grep [OPTION]... PATTERN [FILE]...
Search for PATTERN in each FILE or standard input.
PATTERN is, by default, a basic regular expression (BRE).
Example: grep -i 'hello world' menu.h main.c

Regexp selection and interpretation:
  -E, --extended-regexp     PATTERN is an extended regular expression (ERE)
  -F, --fixed-strings       PATTERN is a set of newline-separated fixed strings
  -G, --basic-regexp        PATTERN is a basic regular expression (BRE)
  -P, --perl-regexp         PATTERN is a Perl regular expression
  -e, --regexp=PATTERN      use PATTERN for matching
  -f, --file=FILE           obtain PATTERN from FILE
  -i, --ignore-case         ignore case distinctions
  -w, --word-regexp         force PATTERN to match only whole words
  -x, --line-regexp         force PATTERN to match only whole lines
  -z, --null-data           a data line ends in 0 byte, not newline

Miscellaneous:
...        ...        ...

Using grep

Which command would result in the following output:

and the presence of absence:
  1. grep "of" haiku.txt
  2. grep -E "of" haiku.txt
  3. grep -w "of" haiku.txt
  4. grep -i "of" haiku.txt

Solution

The correct answer is 3, because the -w option looks only for whole-word matches. The other options will also match ‘of’ when part of another word (in this case, the word Software).

Wildcards

grep’s real power doesn’t come from its options, though; it comes from the fact that patterns can include wildcards. (The technical name for these is regular expressions, which is what the ‘re’ in ‘grep’ stands for.) Regular expressions are both complex and powerful; if you want to do complex searches, please look at the lesson on our website. As a taster, we can find lines that have an ‘o’ in the second position like this:

$ grep -E "^.o" haiku.txt
Today it is not working
Software is like that.

We use the -E option and put the pattern in quotes to prevent the shell from trying to interpret it. (If the pattern contained a *, for example, the shell would try to expand it before running grep.) The ^ in the pattern anchors the match to the start of the line. The . matches a single character (just like ? in the shell), while the o matches an actual ‘o’.

Little Women

You and your friend, having just finished reading Little Women by Louisa May Alcott, are in an argument. Of the four sisters in the book, Jo, Meg, Beth, and Amy, your friend thinks that Jo was the most mentioned. You, however, are certain it was Amy. Luckily, you have a file LittleWomen.txt containing the full text of the novel (shell-lesson-data/exercise-data/writing/LittleWomen.txt). Using a for loop, how would you tabulate the number of times each of the four sisters is mentioned?

Hint: one solution might employ the commands grep and wc and a |, while another might utilize grep options. There is often more than one way to solve a programming task, so a particular solution is usually chosen based on a combination of yielding the correct result, elegance, readability, and speed.

Solutions

for sis in Jo Meg Beth Amy
do
    echo $sis:
    grep -ow $sis LittleWomen.txt | wc -l
done

Alternative, slightly inferior solution:

for sis in Jo Meg Beth Amy
do
    echo $sis:
    grep -ocw $sis LittleWomen.txt
done

This solution is inferior because grep -c only reports the number of lines matched. The total number of matches reported by this method will be lower if there is more than one match per line.

Perceptive observers may have noticed that character names sometimes appear in all-uppercase in chapter titles (e.g. ‘MEG GOES TO VANITY FAIR’). If you wanted to count these as well, you could add the -i option for case-insensitivity (though in this case, it doesn’t affect the answer to which sister is mentioned most frequently).

While grep finds lines in files, the find command finds files themselves. Again, it has a lot of options; to show how the simplest ones work, we’ll use the shell-lesson-data/exercise-data directory tree shown below.

.
├── numbers.txt
├── populations/
│   ├── bowerbird.txt
│   ├── dunnock.txt
│   ├── python.txt
│   ├── shark.txt
│   ├── six-species.csv
|   ├── toad.txt
│   └── wildcat.txt 
|
└── writing/
    ├── haiku.txt
    └── LittleWomen.txt

The exercise-data directory contains one file, numbers.txt, and two directories: populations and writing containing various files.

For our first command, let’s run find . (remember to run this command from the shell-lesson-data/exercise-data folder).

$ find .
.
./numbers.txt
./populations
./populations/bowerbird.txt
./populations/dunnock.txt
./populations/python.txt
./populations/shark.txt
./populations/six-species.csv
./populations/toad.txt
./populations/wildcat.txt
./writing
./writing/haiku.txt
./writing/LittleWomen.txt

As always, the . on its own means the current working directory, which is where we want our search to start. find’s output is the names of every file and directory under the current working directory. This can seem useless at first but find has many options to filter the output and in this lesson we will discover some of them.

The first option in our list is -type d that means ‘things that are directories’. Sure enough, find’s output is the names of the five directories (including .):

$ find . -type d
.
./populations
./writing

Notice that the objects find finds are not listed in any particular order. If we change -type d to -type f, we get a listing of all the files instead:

$ find . -type f
./numbers.txt
./populations/bowerbird.txt
./populations/dunnock.txt
./populations/python.txt
./populations/shark.txt
./populations/six-species.csv
./populations/toad.txt
./populations/wildcat.txt
./writing/haiku.txt
./writing/LittleWomen.txt

Now let’s try matching by name:

$ find . -name *.txt
./numbers.txt

We expected it to find all the text files, but it only prints out ./numbers.txt. The problem is that the shell expands wildcard characters like * before commands run. Since *.txt in the current directory expands to ./numbers.txt, the command we actually ran was:

$ find . -name numbers.txt

find did what we asked; we just asked for the wrong thing.

To get what we want, let’s do what we did with grep: put *.txt in quotes to prevent the shell from expanding the * wildcard. This way, find actually gets the pattern *.txt, not the expanded filename numbers.txt:

$ find . -name "*.txt"
./numbers.txt
./populations/bowerbird.txt
./populations/dunnock.txt
./populations/python.txt
./populations/shark.txt
./populations/toad.txt
./populations/wildcat.txt
./writing/haiku.txt
./writing/LittleWomen.txt

Listing vs. Finding

ls and find can be made to do similar things given the right options, but under normal circumstances, ls lists everything it can, while find searches for things with certain properties and shows them.

As we said earlier, the command line’s power lies in combining tools. We’ve seen how to do that with pipes; let’s look at another technique. As we just saw, find . -name "*.txt" gives us a list of all text files in or below the current directory. How can we combine that with wc -l to count the lines in all those files?

The simplest way is to put the find command inside $():

$ wc -l $(find . -name "*.txt")
      5 ./numbers.txt
      3 ./populations/bowerbird.txt
     11 ./populations/dunnock.txt
      1 ./populations/python.txt
     18 ./populations/shark.txt
     20 ./populations/toad.txt
      4 ./populations/wildcat.txt
     11 ./writing/haiku.txt
  21022 ./writing/LittleWomen.txt
  21095 total

When the shell executes this command, the first thing it does is run whatever is inside the $(). It then replaces the $() expression with that command’s output. Since the output of find is the nine filenames ending in .txt./numbers.txt, ./populations/bowerbird.txt, ./populations/dunnock.txt, and so on – the shell constructs the command:

$ wc -l ./numbers.txt ./populations/bowerbird.txt ./populations/dunnock.txt ./populations/python.txt ./populations/shark.txt ./populations/toad.txt ./populations/wildcat.txt ./writing/haiku.txt ./writing/LittleWomen.txt

which is what we wanted. This expansion is exactly what the shell does when it expands wildcards like * and ?, but lets us use any command we want as our own ‘wildcard’.

It’s very common to use find and grep together. The first finds files that match a pattern; the second looks for lines inside those files that match another pattern. Here, for example, we can find txt files that contain the word “searching” by looking for the string ‘searching’ in all the .txt files in the current directory:

$ grep "searching" $(find . -name "*.txt")
./writing/haiku.txt:With searching comes loss
./writing/LittleWomen.txt:sitting on the top step, affected to be searching for her book, but was

Matching and Subtracting

The -v option to grep inverts pattern matching, so that only lines which do not match the pattern are printed. Given that, which of the following commands will find all .txt files in populations except toad.txt? Once you have thought about your answer, you can test the commands in the shell-lesson-data/exercise-data directory.

  1. find populations -name "*.txt" | grep -v toad
  2. find populations -name *.txt | grep -v toad
  3. grep -v "toad" $(find populations -name "*.txt")
  4. None of the above.

Solution

Option 1. is correct. Putting the match expression in quotes prevents the shell expanding it, so it gets passed to the find command.

Option 2 would also works in this instance if there were no *.txt files in the current directory. In this case, the shell tries to expand *.txt but finds no match, so the wildcard expression gets passed to find. (We first encountered this in Episode 3.)

Option 3 is incorrect because it searches the contents of the files for lines which do not match ‘toad’, rather than searching the file names.

Binary Files

We have focused exclusively on finding patterns in text files. What if your data is stored as images, in databases, or in some other format?

A handful of tools extend grep to handle a few non text formats. But a more generalizable approach is to convert the data to text, or extract the text-like elements from the data. Alternatively, we might recognize that the shell and text processing have their limits, and to use another programming language.

find Pipeline Reading Comprehension

Write a short explanatory comment for the following shell script:

wc -l $(find . -name "*.csv") | sort -n

Solution

  1. Find all files with a .csv extension recursively from the current directory
  2. Count the number of lines each of these files contains
  3. Sort the output from step 2 numerically

Key Points

  • find finds files with specific properties that match patterns.

  • grep selects lines in files that match patterns.

  • --help is an option supported by many bash commands, and programs that can be run from within Bash, to display more information on how to use these commands or programs.

  • man [command] displays the manual page for a given command.

  • $([command]) inserts a command’s output in place.


Transferring Files

Overview

Teaching: 5 min
Exercises: 0 min
Questions
  • How to use wget to transfer a file?

Objectives
  • Learn what wget is

  • Use wget to transfer a remote file to your local computer

Many of us use remote repositories with git (e.g. GitHub). This allows us to download code or other files written by colleagues.

What about files that do not exist in a git repository? If we wish to download online files from the shell, we can use tools such as Wget.

Wget

Wget is a simple tool developed for the GNU Project that downloads files with the HTTP, HTTPS and FTP protocols. It is widely used by Unix-like users and is available with most Linux distributions. (However, it is not part of Git Bash for Windows.)

To download this lesson (located at https://edcarp.github.io/shell-intermediate-esces/05-file-transfer/index.html) from the Web, we can simply type:

$ wget https://edcarp.github.io/shell-intermediate-esces/05-file-transfer/index.html

We will see something similar to this:

--2023-05-15 08:08:03--  https://edcarp.github.io/shell-intermediate-esces/05-file-transfer/index.html
Resolving edcarp.github.io (edcarp.github.io)... 185.199.110.153, 185.199.108.153, 185.199.109.153, ...
Connecting to edcarp.github.io (edcarp.github.io)|185.199.110.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 15409 (15K) [text/html]
Saving to: ‘index.html’

index.html                    100%[=================================================>]  15.05K  --.-KB/s    in 0.001s

2023-05-15 08:08:03 (20.5 MB/s) - ‘index.html’ saved [15409/15409]

We can then view or edit index.html using the usual tools, for example cat, nano or head. We can also double-click it in our laptop GUI to open it in a Web browser.

Other commands

Alternatively, we can use curl, which supports a much larger range of protocols including common mail based protocols like pop3 and smtp. Or we might use lftp.

Please refer to the man pages by typing man wget, man curl, and man lftp in the shell for more information.

Applications

wget, curl and lftp allow a degree of automation that is not possible using a Web browser. Possible applications include:

Key Points

  • There are multiple ways to copy remote files at the command-line.

  • Compared to using a Web browser to access and save files, these allow greater opportunities for automation.


Working Remotely

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • How do I use ‘ssh’ and ‘scp’ ?

Objectives
  • Learn what SSH is

  • Learn how to work remotely using ssh and scp

What if we want to run some commands on another machine, such as the server in the basement that manages our database of experimental results? To do this, we have to first log in to that machine. We call this a remote login.

Once our local client is connected to the remote server, everything we type into the client is passed on, by the server, to the shell running on the remote computer. That remote shell runs those commands on our behalf, just as a local shell would, then sends back output, via the server, to our client, for our computer to display.

The SSH protocol uses encryption to ensure that outsiders can’t see what’s in the messages going back and forth between different computers.

The remote login server which accepts connections from client programs is known as the SSH daemon, sshd.

The client program we use to login remotely is the secure shell or ssh.

The ssh login client has a companion program called scp, which allows us to copy files to or from a remote computer using the same kind of encrypted connection.

A remote login using ssh

Depending on security settings on the server and network, we may have to be connected to the local network; or if working remotely, we may have to use the institution’s VPN.

Then, we issue the command ssh username@computer, which tries to make a connection to the SSH daemon running on the remote computer we have specified.

Typing exit, or Control-D on an empty line, terminates the remote shell.

In the example below, the remote machine’s command prompt is moon> instead of just $. To make it clearer which machine is doing what, we’ll indent the commands sent to the remote machine and their output.

$ pwd
/c/Users/nelle/Desktop
$ ssh nelle@moon.euphoric.edu
Password: ********

Assuming this connection works (this specific example will not!), any commands we issue, for example hostname, pwd, ls or some scientific analysis software, will run on the remote server. This will continue until we exit the remote shell, for example:

    moon> exit

pwd confirms we are now running commands on the local computer again (not the remote server):

$ pwd
/c/Users/nelle/Desktop

Copying files to, and from a remote machine using scp

To copy a file, we specify the source and destination paths, either of which may include computer names.

For example, this command might be used to copy our latest results to Nelle’s backups directory of server backupserver.euphoric.edu, printing out its progress as it does so:

$ scp results.dat nelle@backupserver.euphoric.edu:backups/results-2023-16-05.dat
Password: ********
results.dat              100%  9  1.0 MB/s 00:00

Note the colon :, seperating the hostname of the server and the pathname of the file we are copying to.

Copying a whole directory betwen remote machines uses the same syntax as the cp command: we just use the -r option to signal that we want copying to be recursively. For example, this command copies all of our results from the backup directory on the backupserver.euphoric.edu server to our laptop:

$ scp -r nelle@backupserver.euphoric.edu:backups ./backups
Password: ********

Key Points

  • SSH is a secure means to access a remote Linux computer

  • The ‘ssh’ and ‘scp’ utilities are secure alternatives to walking over to a machine, logging into it, and copying files off it

  • ‘ssh’ and ‘scp’ are essential for using remote Linux servers


Permissions

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • Understanding file/directory permissions

Objectives
  • What are file/directory permissions?

  • How to view permissions?

  • How to change permissions?

  • File/directory permissions in Windows are different

Unix controls who can read, modify, and run files using permissions.

We’ll discuss how Windows handles permissions at the end of the section: the concepts are similar, but the rules are different. (The examples in this lesson do work with Git Bash for Windows. But where control of permissions is crucial under Windows, further effort may be required.)

Every user has a unique user name. They also have an integer user ID.

Users can belong to any number of groups. Each group has a unique name and integer group IDs.

Every file and directory on a Unix computer belongs to one owner and one group.

For each file on the system, every user falls into one of three categories: the owner of the file, someone in the file’s group, and everyone else.

For each of these three categories, the computer keeps track of whether people in that category can read the file; write to the file; or run the file as a program. (In the case of directories, the latter has a different meaning - whether people in that category can search the directory.)

For example, if a file had the following set of permissions:

usergroupall
readyesyesno
writeyesnono
executenonono

it would mean that:

Permissions are revealed with ls -l, for example in the exercise-data directory:

$ ls -l
$ ls -l
total 5
-rw-r--r-- 1 nelle 1049089 18 Feb 16 17:40 numbers.txt
drwxr-xr-x 1 nelle 1049089  0 Mar 21 09:06 populations/
drwxr-xr-x 1 nelle 1049089  0 Feb 16 17:40 writing/

On the right side, we have the files’ names. On the left, we see permissions.

Let’s have a closer look at one of those permission strings: -rw-r--r--. The first character tells us what type of thing this is: ‘-‘ means it’s a regular file, while ‘d’ means it’s a directory.

The next three characters tell us what permissions the file’s owner has. Here, the owner can read and write the file: rw-. The middle triplet shows us the group’s permissions. If the permission is turned off, we see a dash, so r-- means “read, but not write or execute”. The final triplet shows us what everyone who isn’t the file’s owner, or in the file’s group, can do. In this case, it’s ‘r–’ again, so everyone on the system can look at the file’s contents but do nothing else.

To change permissions, we use the chmod command (whose name stands for “change mode”).

In research contexts, removing write permission can help reduce the chance of accidental damage to important files.

This command removes write permission on the numbers.txt for all users:

$ chmod -w numbers.txt
-r--r--r-- 1 nelle 1049089 18 Feb 16 17:40 numbers.txt

Now, all users of the system can read the file but nothing else.

To restore write permission for the user who owns the file (only), the command is:

$ chmod u+w numbers.txt

The ‘u’ signals that we’re changing the privileges of the user (i.e., the file’s owner), and +w means we should add write permission. A quick ls -l shows us that it worked, because the owner’s permissions are now set to read and write:

-rw-r--r-- 1 nelle 1049089 18 Feb 16 17:40 numbers.txt

What about Windows?

Those are the basics of permissions on Unix. As we said at the outset, though, things work differently on Windows. There, permissions are defined by access control lists, or ACLs. An ACL is a list of pairs, each of which combines a “who” with a “what”. For example, you could give a user permission to append data to a file without giving permission to read or delete it.

This is more flexible that the Unix model, but it’s also more complex to administer and understand on small systems.

Key Points

  • File permissions describe who and what can read, write, modify, and access a file.

  • Use ls -l to view the permissions for a specific file.

  • Use chmod to change permissions on a file or directory.


Shell Variables

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • How are variables set and accessed in the Unix shell?

Objectives
  • Understand how variables are implemented in the shell

  • Explain how the shell uses the PATH variable to search for executables

  • Read the value of an existing variable

  • Create new variables and change their values

The shell is just a program, and like other programs, it has variables. Those variables control its execution, so by changing their values you can change how the shell and other programs behave.

Let’s start by running the command set and looking at some of the variables in a typical shell session:

$ set | less

What you see is highly system-dependent. As an example, the start of the output might look something like this:

ACLOCAL_PATH=/mingw64/share/aclocal:/usr/share/aclocal
ALLUSERSPROFILE='C:\ProgramData'
APPDATA='C:\Users\nelle\AppData\Roaming'
BASH=/usr/bin/bash
BASHOPTS=cmdhist:complete_fullquote:expand_aliases:extquote:force_fignore:hostcomplete:interactive_comments:login_shell:progcomp:promptvars:sourcepath
BASH_ALIASES=()
BASH_ARGC=()
BASH_ARGV=()
BASH_CMDS=()
BASH_LINENO=()
BASH_SOURCE=()
BASH_VERSINFO=([0]="4" [1]="4" [2]="23" [3]="1" [4]="release" [5]="x86_64-pc-msys")
BASH_VERSION='4.4.23(1)-release'
COLUMNS=80
COMMONPROGRAMFILES='C:\Program Files\Common Files'
COMPLETION_PATH='C:/Program Files/Git/mingw64/share/git/completion'
COMPUTERNAME=MY-COMPUTER
COMP_WORDBREAKS=$' \t\n"\'@><=;|&(:'
COMSPEC='C:\WINDOWS\system32\cmd.exe'
CONFIG_SITE=/etc/config.site
CommonProgramW6432='C:\Program Files\Common Files'

Every variable has a name. By convention, variables that are always present are given upper-case names. All shell variables’ values are strings.

Some variables (like PATH) store lists of values. To see the value stored in your PATH variable, do:

$ echo $PATH

In the case of PATH, the convention is to use a colon ‘:’ as a separator. If a program wants the individual elements of such a list, it’s the program’s responsibility to split the variable’s string value into pieces.

The PATH Variable

Let’s have a closer look at that PATH variable. Its value defines the shell’s search path, i.e., the list of directories that the shell looks in for runnable programs when you type in a program name without specifying what directory it is in.

The rule it uses is simple: the shell checks each directory in the PATH variable in turn, looking for a program with the requested name in that directory. As soon as it finds a match, it stops searching and runs the program.

To show how this works, here are the components of the above PATH listed one per line:

/c/Users/nelle/bin
/mingw64/bin
/usr/local/bin
/usr/bin
/bin
/mingw64/bin
/usr/bin
/c/Users/nelle/bin
/c/Program Files (x86)/Common Files/Oracle/Java/javapath
/c/Program Files (x86)/Common Files/Intel/Shared Libraries/redist/intel64/compiler
/c/WINDOWS/system32
/c/WINDOWS
/c/WINDOWS/System32/Wbem
/c/WINDOWS/System32/WindowsPowerShell/v1.0
/c/WINDOWS/System32/OpenSSH
/cmd
/c/Users/nelle/AppData/Local/anaconda3
/c/Users/nelle/AppData/Local/anaconda3/Library/mingw-w64/bin
/c/Users/nelle/AppData/Local/anaconda3/Library/usr/bin
/c/Users/nelle/AppData/Local/anaconda3/Library/bin
/c/Users/nelle/AppData/Local/anaconda3/Scripts
/c/Users/nelle/AppData/Local/Microsoft/WindowsApps
/c/Users/nelle/AppData/Local/Programs/Microsoft VS Code/bin
/usr/bin/vendor_perl
/usr/bin/core_perl

Let’s say there are two programs called analyze, in two different directories: /bin/analyze and /c/Users/nelle/bin/analyze. Since the shell searches the directories in the order they’re listed in PATH, it finds /bin/analyze first and runs that.

Showing the Value of a Variable

Let’s show the value of the variable HOME:

$ echo HOME
HOME

That just prints “HOME”, which isn’t what we wanted. Let’s try this instead:

$ echo $HOME
/c/Users/nelle

The dollar sign tells the shell that we want the value of the variable rather than its name.

This works just like wildcards: the shell does the replacement before running the program we’ve asked for. Thanks to this expansion, what we actually run is echo /c/Users/nelle, which displays the right thing.

Creating and Changing Variables

Creating a variable is easy—we just assign a value to a name using “=”, without any spaces either side:

$ SECRET_IDENTITY=Daniel
$ echo $SECRET_IDENTITY
Daniel

To change the value, just assign a new one:

$ SECRET_IDENTITY=Chris
$ echo $SECRET_IDENTITY
Chris

If we want to set some variables automatically every time we run a shell, the can go in files called .bashrc and/or .bash_profile in our home directory. (The ‘.’ character at the front prevents ls from listing this file unless we specifically ask it to using -a.)

For example, this provides a mechanism to add custom software to the search path so it can be run like any other command.

Key Points

  • The PATH variable defines the shell’s search path

  • Variables are assigned using “=” and recalled using the variable’s name prefixed by “$