(Re)Introducing the Shell
Overview
Teaching: 5 min
Exercises: 0 minQuestions
What is a command shell and why would I use one?
Objectives
Explain how the shell relates to the keyboard, the screen, the operating system, and users’ programs.
Explain when and why command-line interfaces should be used instead of graphical interfaces.
The Shell
This course assumes you have either taken our introductory shell course already, or have broadly similar background knowledge. Here is a reminder of what we mean by the shell.
The shell is a program where users can type commands. With the shell, it’s possible to invoke complicated programs like climate modeling software or simple commands that create an empty directory with only one line of code. The most popular Unix shell is Bash. Bash is the default shell on most modern implementations of Unix and in most packages that provide Unix-like tools for Windows.
The grammar of a shell allows you to combine existing tools into powerful pipelines and handle large volumes of data automatically. Sequences of commands can be written into a script, improving the reproducibility of workflows.
In addition, the command line is often the easiest way to interact with remote machines and supercomputers. Familiarity with the shell is near essential to run a variety of specialized tools and resources including high-performance computing systems. As clusters and cloud computing systems become more popular for scientific data crunching, being able to interact with the shell is becoming a necessary skill. We can build on the command-line skills covered here to tackle a wide range of scientific questions and computational challenges.
Phillipa’s Pipeline: A Typical Problem
Phillipa Frogg, an ecologist, wants to use the Living Planet Index dataset to help her with her research. However, she is unable to use the raw data directly; instead, she has to edit the data so it’s in a suitable format for her to make best use of. Although she could do this by hand in a text editor, this would be laborious, time-consuming, and error-prone. With the shell, Phillipa can instead assign her computer this mundane task while she focuses her attention on writing her latest paper.
The next few lessons will explore the ways Phillipa can achieve this. More specifically, they explain how she can use a command shell to run shell programs, and use loops to automate the repetitive steps of entering file names, so that her computer can work while she writes her paper.
As a bonus, once she has put a processing pipeline together, she will be able to use it again whenever she collects more data.
In order to achieve her task, Phillipa needs to know how to:
- navigate to a file/directory
- create a file/directory
- check the length of a file
- chain commands together
- retrieve a set of files from a remote location
- connect to a powerful Linux server
- iterate over files
- run a shell script containing her pipeline
- control file permissions, so colleagues can read her work but not alter it
Key Points
A shell is a program whose primary purpose is to read commands and run other programs.
This lesson uses Bash, the default shell in many implementations of Unix.
Programs can be run in Bash by entering commands at the command-line prompt.
The shell’s main advantages are its high action-to-keystroke ratio, its support for automating repetitive tasks, and its capacity to access networked machines.
The shell’s main disadvantages are its primarily textual nature and how cryptic its commands and operation can be.
Manual Pages
Overview
Teaching: 5 min
Exercises: 0 minQuestions
How to use man pages?
Objectives
Use
man
to display the manual page for a given command.Explain how to read the synopsis of a given command while using
man
.Search for specific options or flags in the manual page for a given command.
We can get help for any Unix command with the man
(short for manual) command.
For example,
here is the command to look up information on cp
:
$ man cp
The output displayed is referred to as the “man page”.
Note, if you are using Git Bash for Windows, man pages are not available. However, you can find them
on the Web if you search for a term such as “man cp”. You can also some help from many commands
with the --help
option, whether using Git Bash and other systems:
$ cp --help
Most man pages contain more information than can fit in one terminal screen.
To help facilitate reading, the man
command tries to use a “pager” to move and search
through the information screenfull by screenfull. The most common pager is called less
.
Detailed information is available using man less
. less
is typically the default
pager for Unix systems and other tools may use it for output paging as well.
When less
displays a colon ‘:’,
we can press the space bar to get the next page,
the letter ‘h’ to get help,
or the letter ‘q’ to quit.
man
’s output is typically complete but concise,
as it is designed to be used as a reference rather than a tutorial.
Most man pages are divided into sections:
- NAME: gives the name of the command and a brief description
- SYNOPSIS: how to run the command, including optional and mandatory parameters. (We will explain the syntax later.)
- DESCRIPTION: a fuller description than the synopsis, including a description of all the options to the command. This section may also include example usage or details about how the command works.
- EXAMPLES: self-explanatory.
- SEE ALSO: list other commands that we might find useful or other sources of information that might help us.
Other sections we might see include AUTHOR, REPORTING BUGS, COPYRIGHT, HISTORY, (known) BUGS, and COMPATIBILITY.
How to Read the Synopsis
Here is the is synopsis for the cp
command on Ubuntu Linux:
SYNOPSIS
cp [OPTION]... [-T] SOURCE DEST
cp [OPTION]... SOURCE... DIRECTORY
cp [OPTION]... -t DIRECTORY SOURCE...
This tells the reader that there are three ways to use the command. Let’s look at the first usage:
cp [OPTION]... [-T] SOURCE DEST
[OPTION]
means the cp
command can be followed by
one or more optional flags.
We can tell they’re optional because of the square brackets,
and we can tell that one or more are welcome because of the ellipsis (…).
For example,
the fact that [-T]
is in square brackets,
but after the ellipsis,
means that it’s optional,
but if it’s used,
it must come after all the other options.
SOURCE
refers to the source file or directory,
and DEST
to the destination file or directory.
Their precise meanings are explained at the top of the DESCRIPTION
section.
The other two usage examples can be read in similar ways.
Note that to use the last one, the -t
option is mandatory
(because it isn’t shown in square brackets).
The DESCRIPTION
section starts with a few paragraphs explaining the command and its use,
then expands on the possible options one by one.
Finding Help on Specific Options
If we want to skip ahead to the option you’re interested in,
we can search for it using the slash key ‘/’.
(This isn’t part of the man
command:
it’s a feature of less
.)
For example,
to find out about -t
,
we can type /-t
and press return.
After that,
we can use the ‘n’ key to navigate to the next match
until we find the detailed information we need:
-t, --target-directory=DIRECTORY
copy all SOURCE arguments into DIRECTORY
This means that this option has the short form -t
and the long form --target-directory
and that it takes an argument.
Its meaning is to copy all the SOURCE arguments into DIRECTORY.
Limitations of Man Pages
Man pages can be useful for a quick confirmation of how to run a command, but they are not famous for being readable. If you can’t find what you need in the man page— or you can’t understand what you’ve found— try entering “unix command copy file” into your favorite search engine: it will often produce more helpful results.
You May Also Enjoy…
The explainshell.com site does a great job of breaking complex Unix commands into parts and explaining what each does. Sadly, it doesn’t work in reverse…
Key Points
man command
displays the manual page for a given command.
[OPTION]...
means the given command can be followed by one or more optional flags.Flags specified after ellipsis are still optional but must come after all other flags.
While inside the manual page,use
/
followed by your pattern to do interactive searching.
Loops
Overview
Teaching: 35 min
Exercises: 15 minQuestions
How can I perform the same actions on many different files?
Objectives
Write a loop that applies one or more commands separately to each file in a set of files.
Trace the values taken on by a loop variable during execution of the loop.
Explain the difference between a variable’s name and its value.
Explain why spaces and some punctuation characters shouldn’t be used in file names.
Demonstrate how to see what commands have recently been executed.
Re-run recently executed commands without retyping them.
Loops are a programming construct which allow us to repeat a command or set of commands for each item in a list. As such they are key to productivity improvements through automation. Similar to wildcards and tab completion, using loops also reduces the amount of typing required (and hence reduces the number of typing mistakes).
Suppose we have several hundred files containing population time series data.
For this example, we’ll use the exercise-data/populations
directory which only has six such files,
but the principles can be applied to many many more files at once. Each file contains population time series for one species, from the Living Planet Database of the Living Planet Index.
The structure of these files is the same: each line gives data for one population time series, as tab-delimited text.
Column headings are given on the first line of the combined-data file six_species.csv
, which can be displayed as follows:
$ head -n 1 six-species.csv
Let’s look at the files:
$ head -n 5 bowerbird.txt dunnock.txt python.txt shark.txt toad.txt wildcat.txt
Due to the amount of data in each line, the output is visually confusing.
We would like to print out the class (high-level classification) for the species in each file. Class is given in the fifth column.
For each file, we would need to execute the command cut -f 5
and pipe this to sort
and uniq
.
We’ll use a loop to solve this problem, but first let’s look at the general form of a loop,
using the pseudo-code below:
for thing in list_of_things
do
operation_using $thing # Indentation within the loop is not required, but aids legibility
done
and we can apply this to our example like this:
$ for filename in bowerbird.txt dunnock.txt python.txt shark.txt toad.txt wildcat.txt
> do
> cut -f 5 $filename | sort | uniq
> done
Aves
Aves
Reptilia
Elasmobranchii
Amphibia
Mammalia
This shows us the first two files contain data on a species in the class Aves, the third contains data on a species in Reptilia, and so on.
Follow the Prompt
The shell prompt changes from
$
to>
and back again as we were typing in our loop. The second prompt,>
, is different to remind us that we haven’t finished typing a complete command yet. A semicolon,;
, can be used to separate two commands written on a single line.
When the shell sees the keyword for
,
it knows to repeat a command (or group of commands) once for each item in a list.
Each time the loop runs (called an iteration), an item in the list is assigned in sequence to
the variable, and the commands inside the loop are executed, before moving on to
the next item in the list.
Inside the loop,
we call for the variable’s value by putting $
in front of it.
The $
tells the shell interpreter to treat
the variable as a variable name and substitute its value in its place,
rather than treat it as text or an external command.
In this example, the list is six filenames: bowerbird.txt
, dunnock.txt
, python.txt
, shark.txt
, toad.txt
and wildcat.txt
.
Each time the loop iterates, it will assign a file name to the variable filename
and run the cut
command.
The first time through the loop,
$filename
is bowerbird.txt
.
The interpreter runs the command cut -f 5
on bowerbird.txt
and pipes the output to the sort
command. Then it pipes the output of the sort
command to the uniq
command, which
prints its output to the terminal.
For the second iteration, $filename
becomes
dunnock.txt
.
The interpreter runs the command cut -f 5
on dunnock.txt
and pipes the output to the sort
command. Then it pipes the output of the sort
command to the uniq
command, which
prints its output to the terminal.
This continues until each of the filenames in turn has been assigned to the variable $filename
.
After the final item, wildcat.txt
, the shell exits the for
loop.
Same Symbols, Different Meanings
Here we see
>
being used as a shell prompt, whereas>
is also used to redirect output. Similarly,$
is used as a shell prompt, but, as we saw earlier, it is also used to ask the shell to get the value of a variable.If the shell prints
>
or$
then it expects you to type something, and the symbol is a prompt.If you type
>
or$
yourself, it is an instruction from you that the shell should redirect output or get the value of a variable.
When using variables it is also
possible to put the names into curly braces to clearly delimit the variable
name: $filename
is equivalent to ${filename}
, but is different from
${file}name
. You may find this notation in other people’s programs.
We have called the variable in this loop filename
in order to make its purpose clearer to human readers.
The shell itself doesn’t care what the variable is called;
if we wrote this loop as:
$ for x in bowerbird.txt dunnock.txt python.txt shark.txt toad.txt wildcat.txt
> do
> cut -f 5 $x | sort | uniq
> done
or:
$ for temperature in bowerbird.txt dunnock.txt python.txt shark.txt toad.txt wildcat.txt
> do
> cut -f 5 $temperature | sort | uniq
> done
it would work exactly the same way.
Don’t do this.
Programs are only useful if people can understand them,
so meaningless names (like x
) or misleading names (like temperature
)
increase the odds that the program won’t do what its readers think it does.
In the above examples, the variables (thing
, filename
, x
and temperature
)
could have been given any other name, as long as it is meaningful to both the person
writing the code and the person reading it.
Note also that loops can be used for other things than filenames, like a list of numbers or a subset of data.
Write your own loop
How would you write a loop that echoes all 10 numbers from 0 to 9?
Solution
$ for loop_variable in 0 1 2 3 4 5 6 7 8 9 > do > echo $loop_variable > done
0 1 2 3 4 5 6 7 8 9
Variables in Loops
This exercise refers to the
shell-lesson-data/exercise-data/populations
directory.ls *.txt
gives the following output:bowerbird.txt dunnock.txt python.txt shark.txt toad.txt wildcat.txt
What is the output of the following code?
$ for datafile in *.txt > do > ls *.txt > done
Now, what is the output of the following code?
$ for datafile in *.txt > do > ls $datafile > done
Why do these two loops give different outputs?
Solution
The first code block gives the same output on each iteration through the loop. Bash expands the wildcard
*.txt
within the loop body (as well as before the loop starts) to match all files ending in.txt
and then lists them usingls
. The expanded loop would look like this:$ for datafile in bowerbird.txt dunnock.txt python.txt shark.txt toad.txt wildcat.txt > do > ls bowerbird.txt dunnock.txt python.txt shark.txt toad.txt wildcat.txt > done
bowerbird.txt dunnock.txt python.txt shark.txt toad.txt wildcat.txt bowerbird.txt dunnock.txt python.txt shark.txt toad.txt wildcat.txt bowerbird.txt dunnock.txt python.txt shark.txt toad.txt wildcat.txt bowerbird.txt dunnock.txt python.txt shark.txt toad.txt wildcat.txt bowerbird.txt dunnock.txt python.txt shark.txt toad.txt wildcat.txt bowerbird.txt dunnock.txt python.txt shark.txt toad.txt wildcat.txt
The second code block lists a different file on each loop iteration. The value of the
datafile
variable is evaluated using$datafile
, and then listed usingls
.bowerbird.txt dunnock.txt python.txt shark.txt toad.txt wildcat.txt
Limiting Sets of Files
What would be the output of running the following loop in the
shell-lesson-data/exercise-data/populations
directory?$ for filename in t* > do > ls $filename > done
- No files are listed.
- All files are listed.
- Only
python.txt
toad.txt
andwildcat.txt
are listed.- Only
toad.txt
is listed.Solution
4 is the correct answer.
*
matches zero or more characters, so any file name starting with the lettert
, followed by zero or more other characters will be matched.How would the output differ from using this command instead?
$ for filename in *t* > do > ls $filename > done
- The same files will be listed.
- The files
bowerbird.txt
,dunnock.txt
,python.txt
,shark.txt
,toad.txt
andwildcat.txt
will be listed.- No files are listed this time.
- The files
python.txt
andtoad.txt
will be listed.- Only the file
six-species.csv
will be listed.Solution
2 is the correct answer.
*
matches zero or more characters, so a file name with zero or more characters before a lettert
and zero or more characters after the lettert
will be matched. In other words, and file name containing at least onet
will be listed.
Saving to a File in a Loop - Part One
In the
shell-lesson-data/exercise-data/populations
directory, what is the effect of this loop?for species in *.txt do echo $species cat $species > species.txt done
- Prints
bowerbird.txt
,dunnock.txt
,python.txt
,shark.txt
,toad.txt
andwildcat.txt
, and the text fromwildcat.txt
will be saved to a file calledspecies.txt
.- Prints
bowerbird.txt
,dunnock.txt
,python.txt
,shark.txt
, ,toad.txt
andwildcat.txt
, and the text from all six files would be concatenated and saved to a file calledspecies.txt
.- Prints
bowerbird.txt
,dunnock.txt
,python.txt
,shark.txt
,toad.txt
andwildcat.txt
, and the text frombowerbird.txt
will be saved to a file calledspecies.txt
.- None of the above.
Solution
- The text from each file in turn gets written to the
species.txt
file. However, the file gets overwritten on each loop iteration, so the final content ofspecies.txt
is the text from thewildcat.txt
file.
Saving to a File in a Loop - Part Two
Also in the
shell-lesson-data/exercise-data/populations
directory, remove the file you created above:rm species.txt
Use
ls
to check you only have the files we provided, i.e.bowerbird.txt dunnock.txt python.txt shark.txt six-species.csv toad.txt wildcat.txt
Now, what would be the output of the following loop?
for datafile in *.txt do cat $datafile >> all.txt done
- All of the text from
bowerbird.txt
,dunnock.txt
,python.txt
,shark.txt
andtoad.txt
would be concatenated and saved to a file calledall.txt
.- The text from
bowerbird.txt
will be saved to a file calledall.txt
.- All of the text from
bowerbird.txt
,dunnock.txt
,python.txt
,shark.txt
,toad.txt
andwildcat.txt
would be concatenated and saved to a file calledall.txt
.- All of the text from
bowerbird.txt
,dunnock.txt
,python.txt
,shark.txt
,toad.txt
andwildcat.txt
would be printed to the screen and saved to a file calledall.txt
.Solution
3 is the correct answer.
>>
appends to a file, rather than overwriting it with the redirected output from a command. Given the output from thecat
command has been redirected, nothing is printed to the screen.
Here’s a slightly more complicated loop:
$ for filename in *.txt
> do
> echo $filename
> head -n 10 $filename | tail -n 1
> done
The shell starts by expanding *.txt
to create the list of files it will process.
The loop body
then executes two commands for each of those files.
The first command, echo
, prints its command-line arguments to standard output.
For example:
$ echo hello there
prints:
hello there
In this case,
since the shell expands $filename
to be the name of a file,
echo $filename
prints the name of the file.
Note that we can’t write this as:
$ for filename in *.txt
> do
> $filename
> head -n 10 $filename | tail -n 1
> done
because then the first time through the loop,
when $filename
expanded to bowerbird.txt
, the shell would try to run bowerbird.txt
as
a program.
Finally,
the head
and tail
combination selects line 10
from whatever file is being processed
(assuming the file has at least 10 lines; otherwise it selects the last line of the file).
Spaces in Names
Spaces are used to separate the elements of the list that we are going to loop over. If one of those elements contains a space character, we need to surround it with quotes, and do the same thing to our loop variable. Suppose our data files are named:
red dragon.txt purple unicorn.txt
To loop over these files, we would need to add double quotes like so:
$ for filename in "red dragon.txt" "purple unicorn.txt" > do > head -n 10 "$filename" | tail -n 1 > done
It is simpler to avoid using spaces (or other special characters) in filenames.
The files above don’t exist, so if we run the above code, the
head
command will be unable to find them, however the error message returned will show the name of the files it is expecting:head: cannot open 'red dragon.txt' for reading: No such file or directory head: cannot open 'purple unicorn.txt' for reading: No such file or directory
Try removing the quotes around
$filename
in the loop above to see the effect of the quote marks on spaces.head: cannot open 'red' for reading: No such file or directory head: cannot open 'dragon.txt' for reading: No such file or directory head: cannot open 'purple' for reading: No such file or directory head: cannot open 'unicorn.txt' for reading: No such file or directory
We would like to modify each of the six files for individual species in
shell-lesson-data/exercise-data/populations
,
but also save a version
of the original files, naming the copies original-bowerbird.txt
, original-dunnock.txt
,
original-python.txt
, and so on.
We can’t use:
$ cp *.txt original-*.txt
because that would expand to:
$ cp bowerbird.txt dunnock.txt python.txt shark.txt toad.txt wildcat.txt original-*.txt
This wouldn’t back up our files, instead we get an error:
cp: target `original-*.txt' is not a directory
This problem arises when cp
receives more than two inputs. When this happens, it
expects the last input to be a directory where it can copy all the files it was passed.
Since there is no directory named original-*.txt
in the populations
directory we get an
error.
Instead, we can use a loop:
$ for filename in *.txt
> do
> cp $filename original-$filename
> done
This loop runs the cp
command once for each filename.
The first time,
when $filename
expands to bowerbird.txt
,
the shell executes:
cp bowerbird.txt original-bowerbird.txt
The second time, the command is:
cp dunnock.txt original-dunnock.txt
The third time, the command is:
cp python.txt original-python.txt
and so on, until a copy of each of the six files has been made.
Since the cp
command does not normally produce any output, it’s hard to check
that the loop is doing the correct thing.
However, we learned earlier how to print strings using echo
, and we can modify the loop
to use echo
to print our commands without actually executing them.
As such we can check what commands would be run in the unmodified loop.
The following diagram
shows what happens when the modified loop is executed, and demonstrates how the
judicious use of echo
is a good debugging technique.
Keyboard shortcuts for moving around the command line
We can move to the beginning of a line in the shell by typing Ctrl+A and to the end using Ctrl+E. This may be easier and faster than using the left and right cursor keys.
An extensive range of shortcuts is provided by the shell. To discover more, try a Web search for “bash keyboard shortcuts”.
Those Who Know History Can Choose to Repeat It
Another way to repeat previous work is to use the
history
command to get a list of the last few hundred commands that have been executed, and then to use!123
(where ‘123’ is replaced by the command number) to repeat one of those commands. For example, if a user types this:$ history | tail -n 5
and happens to see this in the output:
456 ls -l NENE0*.txt 457 rm stats-NENE01729B.txt.txt 458 bash goostats.sh NENE01729B.txt stats-NENE01729B.txt 459 ls -l NENE0*.txt 460 history
then she can re-run
goostats.sh
onNENE01729B.txt
simply by typing!458
.
Other History Commands
There are a number of other shortcut commands for getting at the history.
- Ctrl+R enters a history search mode ‘reverse-i-search’ and finds the most recent command in your history that matches the text you enter next. Press Ctrl+R one or more additional times to search for earlier matches. You can then use the left and right arrow keys to choose that line and edit it then hit Return to run the command.
!!
retrieves the immediately preceding command (you may or may not find this more convenient than using ↑)!$
retrieves the last word of the last command. That’s useful more often than you might expect: afterbash goostats.sh NENE01729B.txt stats-NENE01729B.txt
, you can typeless !$
to look at the filestats-NENE01729B.txt
, which is quicker than doing ↑ and editing the command-line.
Doing a Dry Run
A loop is a way to do many things at once — or to make many mistakes at once if it does the wrong thing. One way to check what a loop would do is to
echo
the commands it would run instead of actually running them.Suppose we want to preview the commands the following loop will execute without actually running those commands:
$ for datafile in *.txt > do > cat $datafile >> all.txt > done
What is the difference between the two loops below, and which one would we want to run?
# Version 1 $ for datafile in *.txt > do > echo cat $datafile >> all.txt > done
# Version 2 $ for datafile in *.txt > do > echo "cat $datafile >> all.txt" > done
Solution
The second version is the one we want to run. This prints to screen everything enclosed in the quote marks, expanding the loop variable name because we have prefixed it with a dollar sign. It also does not modify nor create the file
all.txt
, as the>>
is treated literally as part of a string rather than as a redirection instruction.The first version appends the output from the command
echo cat $datafile
to the file,all.txt
. This file will just contain the list;cat bowerbird.txt
,cat dunnock.txt
,cat python.txt
etc.Try both versions for yourself to see the output! Be sure to open the
all.txt
file to view its contents.
Nested Loops
Suppose we want to set up a directory structure to organize some experiments measuring reaction rate constants with different compounds and different temperatures. What would be the result of the following code:
$ for species in bowerbird dunnock python > do > for continent in Africa Asia Europe > do > mkdir $species-$continent > done > done
Solution
We have a nested loop, i.e. contained within another loop, so for each species in the outer loop, the inner loop (the nested loop) iterates over the list of three continents, and creates a new directory for each combination.
Try running the code for yourself to see which directories are created!
Key Points
A
for
loop repeats commands once for every thing in a list.Every
for
loop needs a variable to refer to the thing it is currently operating on.Use
$name
to expand a variable (i.e., get its value).${name}
can also be used.Do not use spaces, quotes, or wildcard characters such as ‘*’ or ‘?’ in filenames, as it complicates variable expansion.
Give files consistent names that are easy to match with wildcard patterns to make it easy to select them for looping.
Use the up-arrow key to scroll up through previous commands to edit and repeat them.
Use Ctrl+R to search through the previously entered commands.
Use
history
to display recent commands, and![number]
to repeat a command by number.
Shell Scripts
Overview
Teaching: 30 min
Exercises: 15 minQuestions
How can I save and re-use commands?
Objectives
Write a shell script that runs a command or series of commands for a fixed set of files.
Run a shell script from the command line.
Write a shell script that operates on a set of files defined by the user on the command line.
Create pipelines that include shell scripts you, and others, have written.
We are finally ready to see what makes the shell such a powerful programming environment. We are going to take the commands we repeat frequently and save them in files so that we can re-run all those operations again later by typing a single command. For historical reasons, a bunch of commands saved in a file is usually called a shell script, but make no mistake: these are actually small programs.
Not only will writing shell scripts make your work faster — you won’t have to retype the same commands over and over again — it will also make it more accurate (fewer chances for typos) and more reproducible. If you come back to your work later (or if someone else finds your work and wants to build on it) you will be able to reproduce the same results simply by running your script, rather than having to remember or retype a long list of commands.
For this example, we’ll agin use the exercise-data/populations
directory containing population time series for six species, from the Living Planet Database of the Living Planet Index.
Let’s start by going back to populations/
and creating a new file, middle.sh
, which will
become our shell script. Use cd
if required, to change to this directory, then pwd
to check you
are in the right directory. Then:
$ nano middle.sh
The command nano middle.sh
opens the file middle.sh
within the text editor ‘nano’
(which runs within the shell).
If the file does not exist, it will be created.
We can use the text editor to directly edit the file – we’ll simply insert the following line:
head -n 10 shark.txt | tail -n 2
This is a variation on the pipe we constructed earlier:
it selects lines 9-10 of the file shark.txt
.
Remember, we are not running it as a command just yet:
we are putting the commands in a file.
Then we save the file (Ctrl-O
in nano),
and exit the text editor (Ctrl-X
in nano).
Check that the directory populations
now contains a file called middle.sh
.
Once we have saved the file,
we can ask the shell to execute the commands it contains.
Our shell is called bash
, so we run the following command:
$ bash middle.sh
19586 Carcharodon_carcharias 0 Dicken_M._L._M._J._Smale_et_al._(2013)._White_sharks_Carcharodon_carcharias_at_Bird_Island_Algoa_Bay_South_Africa._African_Journal_of_Marine_Science_35(2):_175-182 Elasmobranchii Lamniformes Lamnidae Carcharodon carcharias (Linnaeus_1758) Great_white_shark Bird_Island_Algoa_Bay_Eastern_Cape South_Africa South_Africa Africa NULL NULL -33.5 25.775554 1 Marine NULL NULL NULL NULL Tropical_and_subtropical_Indo-Pacific Indian_Ocean Unknown 0 Sightings_per_unit_effort_SPUE_(**hr) Visual_census NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 0.225 0.487 0 NULL NULL NULL NULL NULL NULL NULL NULL NULL
19587 Carcharodon_carcharias 0 Ryklief_R._P._A._Pistorius_et_al._(2014)._Spatial_and_seasonal_patterns_in_sighting_rate_and_life-history_composition_of_the_white_shark_Carcharodon_carcharias_at_Mossel_Bay_South_Africa._African_Journal_of_Marine_Science_36(4):_449-453 Elasmobranchii Lamniformes Lamnidae Carcharodon carcharias (Linnaeus_1758) Great_white_shark Seal_Island_Mossel_Bay_Western_Cape South_Africa South_Africa Africa NULL NULL -34.151089 22.119689 1 Marine NULL NULL NULL NULL Tropical_and_subtropical_Indo-Pacific Indian_Ocean Unknown 0 Sightings_per_unit_effort_SPUE_(**hr) Visual_census_Feb-Dec NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 1.6809 1.0745 2.1702 NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
Sure enough, our script’s output is exactly what we would get if we ran that pipeline directly.
Text vs. Whatever
We usually call programs like Microsoft Word or LibreOffice Writer “text editors”, but we need to be a bit more careful when it comes to programming. By default, Microsoft Word uses
.docx
files to store not only text, but also formatting information about fonts, headings, and so on. This extra information isn’t stored as characters and doesn’t mean anything to tools likehead
. When editing programs, therefore, you must either use a plain text editor, or be careful to save files as plain text.
What if we want to select lines from an arbitrary file?
We could edit middle.sh
each time to change the filename,
but that would probably take longer than typing the command out again
in the shell and executing it with a new file name.
Instead, let’s edit middle.sh
and make it more versatile:
$ nano middle.sh
Now, within “nano”, replace the text shark.txt
with the special variable called $1
:
head -n 10 "$1" | tail -n 2
Inside a shell script,
$1
means ‘the first filename (or other argument) on the command line’.
We can now run our script like this:
$ bash middle.sh shark.txt
19586 Carcharodon_carcharias 0 Dicken_M._L._M._J._Smale_et_al._(2013)._White_sharks_Carcharodon_carcharias_at_Bird_Island_Algoa_Bay_South_Africa._African_Journal_of_Marine_Science_35(2):_175-182 Elasmobranchii Lamniformes Lamnidae Carcharodon carcharias (Linnaeus_1758) Great_white_shark Bird_Island_Algoa_Bay_Eastern_Cape South_Africa South_Africa Africa NULL NULL -33.5 25.775554 1 Marine NULL NULL NULL NULL Tropical_and_subtropical_Indo-Pacific Indian_Ocean Unknown 0 Sightings_per_unit_effort_SPUE_(**hr) Visual_census NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 0.225 0.487 0 NULL NULL NULL NULL NULL NULL NULL NULL NULL
19587 Carcharodon_carcharias 0 Ryklief_R._P._A._Pistorius_et_al._(2014)._Spatial_and_seasonal_patterns_in_sighting_rate_and_life-history_composition_of_the_white_shark_Carcharodon_carcharias_at_Mossel_Bay_South_Africa._African_Journal_of_Marine_Science_36(4):_449-453 Elasmobranchii Lamniformes Lamnidae Carcharodon carcharias (Linnaeus_1758) Great_white_shark Seal_Island_Mossel_Bay_Western_Cape South_Africa South_Africa Africa NULL NULL -34.151089 22.119689 1 Marine NULL NULL NULL NULL Tropical_and_subtropical_Indo-Pacific Indian_Ocean Unknown 0 Sightings_per_unit_effort_SPUE_(**hr) Visual_census_Feb-Dec NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 1.6809 1.0745 2.1702 NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
or on a different file like this:
$ bash middle.sh toad.txt
9084 Bufo_bufo 0 Cooke_A._S._and_R._S._Oldham_(1995)._Establishment_of_populations_of_the_common_frog_Rana_temporaria_and_common_toad_Bufo_bufo_in_a_newly_created_reserve_following_translocation._Herpetological_Journal_5(1):_173-180. Amphibia Anura Bufonidae Bufo bufo NULL (Linnaeus_1758) Common_toad The_Boardwalks_Reserve_north_bank_of_the_River_Nene_near_the_western_edge_of_Peterborough United_Kingdom United_Kingdom Europe Europe_and_Central_Asia Central_and_Western_Europe 52.55444 -0.26444 0 Freshwater NULL NULL Palearctic Temperate_floodplain_rivers_and_wetlands NULL NULL NULL 0 Peak_total_toad_count Counts_during_breeding_season NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 0 NULL 127 311 181 328 306 NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
18832 Bufo_bufo 0 Jedrzejewska_B._et_al._(2002)._Seasonal_dynamics_and_breeding_of_amphibians_in_pristine_forests_(Bialowieza_National_Park_E_Poland)_in_dry_years._Folia_Zoologica_52(1):_77-86. Amphibia Anura Bufonidae Bufo bufo (Linnaeus_1758) Common_toad Oak-hornbeam-lime_forests_Bia?�owie??a_National_Park_East_Poland Poland Poland Europe Europe_and_Central_Asia Central_and_Western_Europe 52.75 23.916667 Terrestrial Palearctic Temperate_broadleaf_and_mixed_forests NULL NULL NULL NULL NULL 0 Number_of_individuals*ha Live_trapping_on_8_30x30m_grids NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 71.5953 45.1319 NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
Double-Quotes Around Arguments
For the same reason that we put the loop variable inside double-quotes, in case the filename happens to contain any spaces, we surround
$1
with double-quotes.
Currently, we need to edit middle.sh
each time we want to adjust the range of
lines that is returned.
Let’s fix that by configuring our script to instead use three command-line arguments.
Each additional argument that we
provide will be accessible via the special variables $1
, $2
, $3
,
which refer to the first, second, third command-line arguments, respectively.
Knowing this, we can use additional arguments to define the range of lines to
be passed to head
and tail
respectively:
$ nano middle.sh
head -n "$2" "$1" | tail -n "$3"
We can now run:
$ bash middle.sh shark.txt 10 2
19586 Carcharodon_carcharias 0 Dicken_M._L._M._J._Smale_et_al._(2013)._White_sharks_Carcharodon_carcharias_at_Bird_Island_Algoa_Bay_South_Africa._African_Journal_of_Marine_Science_35(2):_175-182 Elasmobranchii Lamniformes Lamnidae Carcharodon carcharias (Linnaeus_1758) Great_white_shark Bird_Island_Algoa_Bay_Eastern_Cape South_Africa South_Africa Africa NULL NULL -33.5 25.775554 1 Marine NULL NULL NULL NULL Tropical_and_subtropical_Indo-Pacific Indian_Ocean Unknown 0 Sightings_per_unit_effort_SPUE_(**hr) Visual_census NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 0.225 0.487 0 NULL NULL NULL NULL NULL NULL NULL NULL NULL
19587 Carcharodon_carcharias 0 Ryklief_R._P._A._Pistorius_et_al._(2014)._Spatial_and_seasonal_patterns_in_sighting_rate_and_life-history_composition_of_the_white_shark_Carcharodon_carcharias_at_Mossel_Bay_South_Africa._African_Journal_of_Marine_Science_36(4):_449-453 Elasmobranchii Lamniformes Lamnidae Carcharodon carcharias (Linnaeus_1758) Great_white_shark Seal_Island_Mossel_Bay_Western_Cape South_Africa South_Africa Africa NULL NULL -34.151089 22.119689 1 Marine NULL NULL NULL NULL Tropical_and_subtropical_Indo-Pacific Indian_Ocean Unknown 0 Sightings_per_unit_effort_SPUE_(**hr) Visual_census_Feb-Dec NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 1.6809 1.0745 2.1702 NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
By changing the arguments to our command we can change our script’s behaviour:
$ bash middle.sh shark.txt 4 3
7701 Carcharodon_carcharias 0 Dudley_S._F._J._(2002)._Shark_Catch_Trends_and_Effort_Reduction_in_the_Beach_Protection_Program_KwaZulu-Natal_South_Africa._SCIENTIFIC_COUNCIL_MEETING_-_SEPTEMBER_2002_NAFO._*_Dudley_S._F._J._and_C._A._Simpfendorfer_(2006)._Population_status_of_14_shark_species_caught_in_the_protective_gillnets_off_KwaZulu-Natal_beaches_South_Africa_1978-2003._Marine_and_Freshwater_Research_57:_225-240. Elasmobranchii Lamniformes Lamnidae Carcharodon carcharias (Linnaeus_1758) Great_white_shark Beaches_of_KwaZulu-Natal_province_South_Africa South_Africa South_Africa Africa NULL NULL -29.25 33.08333 0 Marine NULL NULL NULL NULL Tropical_and_subtropical_Indo-Pacific Indian_Ocean Unknown 0 number*km-net_year shark_net_catch_rates NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 1.8 1.6 1.4 0.9 0.6 0.6 1.4 1.1 0.75 0.7 0.9 1.4 0.9 0.5 0.7 0.9 1.12 1.19 0.99 0.65 0.25 1.1 0.55 0.65 0.87 1.37 NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
9057 Carcharodon_carcharias 1 Cliff_G._S._F._J._Dudley_et_al._(1996)._Catches_of_white_sharks_in_KwaZulu-Natal_South_Africa_and_environmental_influences._Great_white_sharks:_the_biology_of_Carcharodon_carcharias._A._P._Klimley_and_D._G._Ainley:_351-362. Elasmobranchii Lamniformes Lamnidae Carcharodon carcharias NULL (Linnaeus_1758) Great_white_shark Natal_Coast_South_Africa South_Africa South_Africa Africa NULL NULL -31.71667 30.38333 0 Marine NULL NULL NULL NULL Tropical_and_subtropical_Indo-Pacific Indian_Ocean Unknown 0 CPUE_(no.*km-net*yr) Shark_nets NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 3.9 1.9 3.5 1.3 0.9 0.6 0.3 1.8 1.1 1.5 1.7 0.9 2.2 1.8 1.3 0.7 0.6 0.4 1.5 1.2 0.7 0.8 1 1.5 1 0.8 1.6 1.8 NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
9058 Carcharodon_carcharias 0 Cliff_G._S._F._J._Dudley_et_al._(1996)._Catches_of_white_sharks_in_KwaZulu-Natal_South_Africa_and_environmental_influences._Great_white_sharks:_the_biology_of_Carcharodon_carcharias._A._P._Klimley_and_D._G._Ainley:_351-362. Elasmobranchii Lamniformes Lamnidae Carcharodon carcharias NULL (Linnaeus_1758) Great_white_shark Richards_Bay_South_Africa South_Africa South_Africa Africa NULL NULL -28.85 32.23333 0 Marine NULL NULL NULL NULL Tropical_and_subtropical_Indo-Pacific Indian_Ocean Unknown 0 CPUE_(no.*km-net*yr) Shark_nets NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 5.2 3.5 1.5 2 2 2 1.1 1.8 3.2 1.4 0.5 1.1 2.9 NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
This works,
but it may take the next person who reads middle.sh
a moment to figure out what it does.
We can improve our script by adding some comments at the top:
$ nano middle.sh
# Select lines from the middle of a file.
# Usage: bash middle.sh filename end_line num_lines
head -n "$2" "$1" | tail -n "$3"
A comment starts with a #
character and runs to the end of the line.
The computer ignores comments,
but they’re invaluable for helping people (including your future self) understand and use scripts.
The only caveat is that each time you modify the script,
you should check that the comment is still accurate:
an explanation that sends the reader in the wrong direction is worse than none at all.
What if we want to process many files in a single pipeline?
For example, if we want to sort our .txt
files by length, we would type:
$ wc -l *.txt | sort -n
because wc -l
lists the number of lines in the files
(recall that wc
stands for ‘word count’, adding the -l
option means ‘count lines’ instead)
and sort -n
sorts things numerically.
We could put this in a file,
but then it would only ever sort a list of .txt
files in the current directory.
If we want to be able to get a sorted list of other kinds of files,
we need a way to get all those names into the script.
We can’t use $1
, $2
, and so on
because we don’t know how many files there are.
Instead, we use the special variable $@
,
which means,
‘All of the command-line arguments to the shell script’.
We also should put $@
inside double-quotes
to handle the case of arguments containing spaces
("$@"
is special syntax and is equivalent to "$1"
"$2"
…).
Here’s an example:
$ nano sorted.sh
# Sort files by their length.
# Usage: bash sorted.sh one_or_more_filenames
wc -l "$@" | sort -n
$ bash sorted.sh *.txt ../numbers.txt
1 python.txt
3 bowerbird.txt
4 wildcat.txt
5 ../numbers.txt
11 dunnock.txt
18 shark.txt
20 toad.txt
62 total
List Unique Species
Remember, you can see the column headings for our population time series files as follows:
$ head -n 1 six-species.csv
Count manually to confirm that “Binomial” (the binomial species name) is the second column, and “Country” is the 15th column and “System” is the 22nd column.
We can use the command
cut -f 2,14,22 shark.txt | sort | uniq
to display the unique combinations of species, country and system inshark.txt
. (Note, the columns appear ragged due to the positioning of tab stops. But all the data are there.) In order to avoid having to type out this series of commands every time, a scientist may choose to write a shell script instead.Write a shell script called
species.sh
that takes any number of filenames as command-line arguments, and uses a variation of the above command to print a list of the unique species appearing in each of those files separately.Solution
# Script to find unique combinations of species, country and # system in tab-delimited text files where the data are in # columns 2, 14 and 22. # This script accepts any number of file names as command line arguments. # Loop over all files for file in $@ do echo "Unique combinations of species, country and system within $file:" # Extract binomial species names, countries and systems cut -f 2,14,22 $file | sort | uniq done
Suppose we have just run a series of commands that did something useful — for example, that created a graph we’d like to use in a paper. We’d like to be able to re-create the graph later if we need to, so we want to save the commands in a file. Instead of typing them in again (and potentially getting them wrong) we can do this:
$ history | tail -n 5 > redo-figure-3.sh
Depending on which commands we have typed recently, the file redo-figure-3.sh
might now contain:
297 bash goostats.sh NENE01729B.txt stats-NENE01729B.txt
298 bash goodiff.sh stats-NENE01729B.txt /data/validated/01729.txt > 01729-differences.txt
299 cut -d ',' -f 2-3 01729-differences.txt > 01729-time-series.txt
300 ygraph --format scatter --color bw --borders none 01729-time-series.txt figure-3.png
301 history | tail -n 5 > redo-figure-3.sh
After a moment’s work in an editor to remove the serial numbers on the commands,
and to remove the final line where we called the history
command,
we have a completely accurate record of how we created that figure.
Why Record Commands in the History Before Running Them?
If you run the command:
$ history | tail -n 5 > recent.sh
the last command in the file is the
history
command itself, i.e., the shell has addedhistory
to the command log before actually running it. In fact, the shell always adds commands to the log before running them. Why do you think it does this?Solution
If a command causes something to crash or hang, it might be useful to know what that command was, in order to investigate the problem. Were the command only be recorded after running it, we would not have a record of the last command run in the event of a crash.
In practice, most people develop shell scripts by running commands at the shell prompt a few times
to make sure they’re doing the right thing,
then saving them in a file for re-use.
This style of work allows people to recycle
what they discover about their data and their workflow with one call to history
and a bit of editing to clean up the output
and save it as a shell script.
Variables in Shell Scripts
In the
populations
directory, imagine you have a shell script calledscript.sh
containing the following commands:head -n $2 $1 tail -n $3 $1
While you are in the
populations
directory, you type the following command:$ bash script.sh '*.txt' 1 1
Which of the following outputs would you expect to see?
- All of the lines between the first and the last lines of each file ending in
.txt
in thepopulations
directory- The first of each file ending in
.txt
in thepopulations
directory, followed by the last line of each such file- The first and the last line of each file in the
populations
directory- An error because of the quotes around
*.txt
Solution
The correct answer is 2.
The special variables $1, $2 and $3 represent the command line arguments given to the script, such that the commands run are:
$ head -n 1 bowerbird.txt dunnock.txt python.txt shark.txt toad.txt wildcat.txt $ tail -n 1 bowerbird.txt dunnock.txt python.txt shark.txt toad.txt wildcat.txt
The shell does not expand
'*.pdb'
because it is enclosed by quote marks. As such, the first argument to the script is'*.txt'
which gets expanded within the script byhead
andtail
.Note,
python.txt
only contains a single line, so for this file the line is output twice (being both the first line and the last line.)
Find the Longest File With a Given Extension
Write a shell script called
longest.sh
that takes the name of a directory and a filename extension as its arguments, and prints out the name of the file with the most lines in that directory with that extension. For example:$ bash longest.sh shell-lesson-data/exercise-data/populations txt
would print the name of the
.txt
file inshell-lesson-data/exercise-data/populations
that has the most lines.Feel free to test your script on another directory e.g.
$ bash longest.sh shell-lesson-data/exercise-data/writing txt
Solution
# Shell script which takes two arguments: # 1. a directory name # 2. a file extension # and prints the name of the file in that directory # with the most lines which matches the file extension. wc -l $1/*.$2 | sort -g | tail -n 2 | head -n 1
The first part of the pipeline,
wc -l $1/*.$2 | sort -g
, counts the lines in each file and sorts them numerically (largest last). When there’s more than one file,wc
also outputs a final summary line, giving the total number of lines across all files. We usetail -n 2 | head -n 1
to throw away this last line.With
wc -l $1/*.$2 | sort -n | tail -n 1
we’ll see the final summary line: we can build our pipeline up in pieces to be sure we understand the output.
Script Reading Comprehension
For this question, consider the
shell-lesson-data/exercise-data/populations
directory once again. This contains a number of files containing population time series data, in addition to any other files you may have created. Explain what each of the following three scripts would do when run asbash script1.sh *.txt
,bash script2.sh *.txt
, andbash script3.sh *.txt
respectively.# Script 1 echo *.*
# Script 2 for filename in $1 $2 $3 do cat $filename done
# Script 3 echo $@.txt
Solutions
In each case, the shell expands the wildcard in
*.txt
before passing the resulting list of file names as arguments to the script.Script 1 would print out a list of all files containing a dot in their name. The arguments passed to the script are not actually used anywhere in the script.
Script 2 would print the contents of the first 3 files with a
.txt
file extension.$1
,$2
, and$3
refer to the first, second, and third argument respectively.Script 3 would print all the arguments to the script (i.e. all the
.txt
files), followed by.txt
.$@
refers to all the arguments given to a shell script.bowerbird.txt dunnock.txt python.txt script.txt shark.txt toad.txt wildcat.txt.txt
Debugging Scripts
Suppose you have saved the following script in a file called
do-errors.sh
in Phillipa’snorth-pacific-gyre/scripts
directory:# Calculate stats for data files. for datafile in "$@" do echo $datfile bash goostats.sh $datafile stats-$datafile done
When you run it from the
north-pacific-gyre
directory:$ bash do-errors.sh NENE*A.txt NENE*B.txt
the output is blank. To figure out why, re-run the script using the
-x
option:$ bash -x do-errors.sh NENE*A.txt NENE*B.txt
What is the output showing you? Which line is responsible for the error?
Solution
The
-x
option causesbash
to run in debug mode. This prints out each command as it is run, which will help you to locate errors. In this example, we can see thatecho
isn’t printing anything. We have made a typo in the loop variable name, and the variabledatfile
doesn’t exist, hence returning an empty string.
Key Points
Save commands in files (usually called shell scripts) for re-use.
bash [filename]
runs the commands saved in a file.
$@
refers to all of a shell script’s command-line arguments.
$1
,$2
, etc., refer to the first command-line argument, the second command-line argument, etc.Place variables in quotes if the values might have spaces in them.
Letting users decide what files to process is more flexible and more consistent with built-in Unix commands.
Finding Things
Overview
Teaching: 30 min
Exercises: 20 minQuestions
How can I find files?
How can I find things in files?
Objectives
Use
grep
to select lines from text files that match simple patterns.Use
find
to find files and directories whose names match simple patterns.Use the output of one command as the command-line argument(s) to another command.
Explain what is meant by ‘text’ and ‘binary’ files, and why many common tools don’t handle the latter well.
In the same way that many of us now use ‘Google’ as a verb meaning ‘to find’, Unix programmers often use the word ‘grep’. ‘grep’ is a contraction of ‘global/regular expression/print’, a common sequence of operations in early Unix text editors. It is also the name of a very useful command-line program.
grep
finds and prints lines in files that match a pattern.
For our examples,
we will use a file that contains three haiku taken from a
1998 competition
in Salon magazine (Credit to authors Joy Rothke, Howard Korder, and
Margaret Segall, respectively. See
Haiku Error Messsages archived
Page 1
and
Page 2
.). For this set of examples,
we’re going to be working in the writing subdirectory:
$ cd
$ cd Desktop/shell-lesson-data/exercise-data/writing
$ cat haiku.txt
The Web site you seek
cannot be located but
endless others exist.
With searching comes loss
and the presence of absence:
"My Thesis" not found.
Yesterday it worked
Today it is not working
Software is like that.
Let’s find lines that contain the word ‘not’:
$ grep not haiku.txt
cannot be located but
"My Thesis" not found.
Today it is not working
Here, not
is the pattern we’re searching for.
The grep command searches through the file, looking for matches to the pattern specified.
To use it type grep
, then the pattern we’re searching for and finally
the name of the file (or files) we’re searching in.
The output is the three lines in the file that contain the letters ‘not’.
By default, grep searches for a pattern in a case-sensitive way. In addition, the search pattern we have selected does not have to form a complete word, as we will see in the next example.
Let’s search for the pattern: ‘The’.
$ grep The haiku.txt
The Web site you seek
"My Thesis" not found.
This time, two lines that include the letters ‘The’ are outputted, one of which contained our search pattern within a larger word, ‘Thesis’.
To restrict matches to lines containing the word ‘The’ on its own,
we can give grep
with the -w
option.
This will limit matches to word boundaries.
Later in this lesson, we will also see how we can change the search behavior of grep with respect to its case sensitivity.
$ grep -w The haiku.txt
The Web site you seek
Note that a ‘word boundary’ includes the start and end of a line, so not
just letters surrounded by spaces.
Sometimes we don’t
want to search for a single word, but a phrase. This is also easy to do with
grep
by putting the phrase in quotes.
$ grep -w "is not" haiku.txt
Today it is not working
We’ve now seen that you don’t have to have quotes around single words, but it is useful to use quotes when searching for multiple words. It also helps to make it easier to distinguish between the search term or phrase and the file being searched. We will use quotes in the remaining examples.
Another useful option is -n
, which numbers the lines that match:
$ grep -n "it" haiku.txt
1:The Web site you seek
5:With searching comes loss
9:Yesterday it worked
10:Today it is not working
Here, we can see that lines 1, 5, 9, and 10 contain the letters ‘it’.
We can combine options (i.e. flags) as we do with other Unix commands.
For example, let’s find the lines that contain the word ‘the’.
We can combine the option -w
to find the lines that contain the word ‘the’
and -n
to number the lines that match:
$ grep -n -w "the" haiku.txt
6:and the presence of absence:
Now we want to use the option -i
to make our search case-insensitive:
$ grep -n -w -i "the" haiku.txt
1:The Web site you seek
6:and the presence of absence:
Now, we want to use the option -v
to invert our search, i.e., we want to output
the lines that do not contain the word ‘the’.
$ grep -n -w -v "the" haiku.txt
1:The Web site you seek
2:cannot be located but
3:endless others exist.
4:
5:With searching comes loss
7:"My Thesis" not found.
8:
9:Yesterday it worked
10:Today it is not working
11:Software is like that.
If we use the -r
(recursive) option,
grep
can search for a pattern recursively through a set of files in subdirectories.
Let’s search recursively for Yesterday
in the shell-lesson-data/exercise-data/writing
directory:
$ grep -r Yesterday .
./haiku.txt:Yesterday it worked
./LittleWomen.txt:"Yesterday, when Aunt was asleep and I was trying to be as still as a
./LittleWomen.txt:Yesterday at dinner, when an Austrian officer stared at us and then
./LittleWomen.txt:Yesterday was a quiet day spent in teaching, sewing, and writing in my
grep
has lots of other options. To find out what they are, we can type:
$ grep --help
Usage: grep [OPTION]... PATTERN [FILE]...
Search for PATTERN in each FILE or standard input.
PATTERN is, by default, a basic regular expression (BRE).
Example: grep -i 'hello world' menu.h main.c
Regexp selection and interpretation:
-E, --extended-regexp PATTERN is an extended regular expression (ERE)
-F, --fixed-strings PATTERN is a set of newline-separated fixed strings
-G, --basic-regexp PATTERN is a basic regular expression (BRE)
-P, --perl-regexp PATTERN is a Perl regular expression
-e, --regexp=PATTERN use PATTERN for matching
-f, --file=FILE obtain PATTERN from FILE
-i, --ignore-case ignore case distinctions
-w, --word-regexp force PATTERN to match only whole words
-x, --line-regexp force PATTERN to match only whole lines
-z, --null-data a data line ends in 0 byte, not newline
Miscellaneous:
... ... ...
Using
grep
Which command would result in the following output:
and the presence of absence:
grep "of" haiku.txt
grep -E "of" haiku.txt
grep -w "of" haiku.txt
grep -i "of" haiku.txt
Solution
The correct answer is 3, because the
-w
option looks only for whole-word matches. The other options will also match ‘of’ when part of another word (in this case, the wordSoftware
).
Wildcards
grep
’s real power doesn’t come from its options, though; it comes from the fact that patterns can include wildcards. (The technical name for these is regular expressions, which is what the ‘re’ in ‘grep’ stands for.) Regular expressions are both complex and powerful; if you want to do complex searches, please look at the lesson on our website. As a taster, we can find lines that have an ‘o’ in the second position like this:$ grep -E "^.o" haiku.txt
Today it is not working Software is like that.
We use the
-E
option and put the pattern in quotes to prevent the shell from trying to interpret it. (If the pattern contained a*
, for example, the shell would try to expand it before runninggrep
.) The^
in the pattern anchors the match to the start of the line. The.
matches a single character (just like?
in the shell), while theo
matches an actual ‘o’.
Little Women
You and your friend, having just finished reading Little Women by Louisa May Alcott, are in an argument. Of the four sisters in the book, Jo, Meg, Beth, and Amy, your friend thinks that Jo was the most mentioned. You, however, are certain it was Amy. Luckily, you have a file
LittleWomen.txt
containing the full text of the novel (shell-lesson-data/exercise-data/writing/LittleWomen.txt
). Using afor
loop, how would you tabulate the number of times each of the four sisters is mentioned?Hint: one solution might employ the commands
grep
andwc
and a|
, while another might utilizegrep
options. There is often more than one way to solve a programming task, so a particular solution is usually chosen based on a combination of yielding the correct result, elegance, readability, and speed.Solutions
for sis in Jo Meg Beth Amy do echo $sis: grep -ow $sis LittleWomen.txt | wc -l done
Alternative, slightly inferior solution:
for sis in Jo Meg Beth Amy do echo $sis: grep -ocw $sis LittleWomen.txt done
This solution is inferior because
grep -c
only reports the number of lines matched. The total number of matches reported by this method will be lower if there is more than one match per line.Perceptive observers may have noticed that character names sometimes appear in all-uppercase in chapter titles (e.g. ‘MEG GOES TO VANITY FAIR’). If you wanted to count these as well, you could add the
-i
option for case-insensitivity (though in this case, it doesn’t affect the answer to which sister is mentioned most frequently).
While grep
finds lines in files,
the find
command finds files themselves.
Again,
it has a lot of options;
to show how the simplest ones work, we’ll use the shell-lesson-data/exercise-data
directory tree shown below.
.
├── numbers.txt
├── populations/
│ ├── bowerbird.txt
│ ├── dunnock.txt
│ ├── python.txt
│ ├── shark.txt
│ ├── six-species.csv
| ├── toad.txt
│ └── wildcat.txt
|
└── writing/
├── haiku.txt
└── LittleWomen.txt
The exercise-data
directory contains one file, numbers.txt
, and two directories:
populations
and writing
containing various files.
For our first command,
let’s run find .
(remember to run this command from the shell-lesson-data/exercise-data
folder).
$ find .
.
./numbers.txt
./populations
./populations/bowerbird.txt
./populations/dunnock.txt
./populations/python.txt
./populations/shark.txt
./populations/six-species.csv
./populations/toad.txt
./populations/wildcat.txt
./writing
./writing/haiku.txt
./writing/LittleWomen.txt
As always, the .
on its own means the current working directory,
which is where we want our search to start.
find
’s output is the names of every file and directory
under the current working directory.
This can seem useless at first but find
has many options
to filter the output and in this lesson we will discover some
of them.
The first option in our list is
-type d
that means ‘things that are directories’.
Sure enough, find
’s output is the names of the five directories (including .
):
$ find . -type d
.
./populations
./writing
Notice that the objects find
finds are not listed in any particular order.
If we change -type d
to -type f
,
we get a listing of all the files instead:
$ find . -type f
./numbers.txt
./populations/bowerbird.txt
./populations/dunnock.txt
./populations/python.txt
./populations/shark.txt
./populations/six-species.csv
./populations/toad.txt
./populations/wildcat.txt
./writing/haiku.txt
./writing/LittleWomen.txt
Now let’s try matching by name:
$ find . -name *.txt
./numbers.txt
We expected it to find all the text files,
but it only prints out ./numbers.txt
.
The problem is that the shell expands wildcard characters like *
before commands run.
Since *.txt
in the current directory expands to ./numbers.txt
,
the command we actually ran was:
$ find . -name numbers.txt
find
did what we asked; we just asked for the wrong thing.
To get what we want,
let’s do what we did with grep
:
put *.txt
in quotes to prevent the shell from expanding the *
wildcard.
This way,
find
actually gets the pattern *.txt
, not the expanded filename numbers.txt
:
$ find . -name "*.txt"
./numbers.txt
./populations/bowerbird.txt
./populations/dunnock.txt
./populations/python.txt
./populations/shark.txt
./populations/toad.txt
./populations/wildcat.txt
./writing/haiku.txt
./writing/LittleWomen.txt
Listing vs. Finding
ls
andfind
can be made to do similar things given the right options, but under normal circumstances,ls
lists everything it can, whilefind
searches for things with certain properties and shows them.
As we said earlier,
the command line’s power lies in combining tools.
We’ve seen how to do that with pipes;
let’s look at another technique.
As we just saw,
find . -name "*.txt"
gives us a list of all text files in or below the current directory.
How can we combine that with wc -l
to count the lines in all those files?
The simplest way is to put the find
command inside $()
:
$ wc -l $(find . -name "*.txt")
5 ./numbers.txt
3 ./populations/bowerbird.txt
11 ./populations/dunnock.txt
1 ./populations/python.txt
18 ./populations/shark.txt
20 ./populations/toad.txt
4 ./populations/wildcat.txt
11 ./writing/haiku.txt
21022 ./writing/LittleWomen.txt
21095 total
When the shell executes this command,
the first thing it does is run whatever is inside the $()
.
It then replaces the $()
expression with that command’s output.
Since the output of find
is the nine filenames ending in .txt
– ./numbers.txt
, ./populations/bowerbird.txt
,
./populations/dunnock.txt
, and so on – the shell constructs the command:
$ wc -l ./numbers.txt ./populations/bowerbird.txt ./populations/dunnock.txt ./populations/python.txt ./populations/shark.txt ./populations/toad.txt ./populations/wildcat.txt ./writing/haiku.txt ./writing/LittleWomen.txt
which is what we wanted.
This expansion is exactly what the shell does when it expands wildcards like *
and ?
,
but lets us use any command we want as our own ‘wildcard’.
It’s very common to use find
and grep
together.
The first finds files that match a pattern;
the second looks for lines inside those files that match another pattern.
Here, for example, we can find txt files that contain the word “searching”
by looking for the string ‘searching’ in all the .txt
files in the current directory:
$ grep "searching" $(find . -name "*.txt")
./writing/haiku.txt:With searching comes loss
./writing/LittleWomen.txt:sitting on the top step, affected to be searching for her book, but was
Matching and Subtracting
The
-v
option togrep
inverts pattern matching, so that only lines which do not match the pattern are printed. Given that, which of the following commands will find all .txt files inpopulations
excepttoad.txt
? Once you have thought about your answer, you can test the commands in theshell-lesson-data/exercise-data
directory.
find populations -name "*.txt" | grep -v toad
find populations -name *.txt | grep -v toad
grep -v "toad" $(find populations -name "*.txt")
- None of the above.
Solution
Option 1. is correct. Putting the match expression in quotes prevents the shell expanding it, so it gets passed to the
find
command.Option 2 would also works in this instance if there were no
*.txt
files in the current directory. In this case, the shell tries to expand*.txt
but finds no match, so the wildcard expression gets passed tofind
. (We first encountered this in Episode 3.)Option 3 is incorrect because it searches the contents of the files for lines which do not match ‘toad’, rather than searching the file names.
Binary Files
We have focused exclusively on finding patterns in text files. What if your data is stored as images, in databases, or in some other format?
A handful of tools extend
grep
to handle a few non text formats. But a more generalizable approach is to convert the data to text, or extract the text-like elements from the data. Alternatively, we might recognize that the shell and text processing have their limits, and to use another programming language.
find
Pipeline Reading ComprehensionWrite a short explanatory comment for the following shell script:
wc -l $(find . -name "*.csv") | sort -n
Solution
- Find all files with a
.csv
extension recursively from the current directory- Count the number of lines each of these files contains
- Sort the output from step 2 numerically
Key Points
find
finds files with specific properties that match patterns.
grep
selects lines in files that match patterns.
--help
is an option supported by many bash commands, and programs that can be run from within Bash, to display more information on how to use these commands or programs.
man [command]
displays the manual page for a given command.
$([command])
inserts a command’s output in place.
Transferring Files
Overview
Teaching: 5 min
Exercises: 0 minQuestions
How to use wget to transfer a file?
Objectives
Learn what wget is
Use wget to transfer a remote file to your local computer
Many of us use remote repositories with git
(e.g. GitHub). This allows us to download code or
other files written by colleagues.
What about files that do not exist in a git repository? If we wish to download online files from the shell, we can use tools such as Wget.
Wget
Wget is a simple tool developed for the GNU Project that downloads files with the HTTP, HTTPS and FTP protocols. It is widely used by Unix-like users and is available with most Linux distributions. (However, it is not part of Git Bash for Windows.)
To download this lesson (located at https://edcarp.github.io/shell-intermediate-esces/05-file-transfer/index.html) from the Web, we can simply type:
$ wget https://edcarp.github.io/shell-intermediate-esces/05-file-transfer/index.html
We will see something similar to this:
--2023-05-15 08:08:03-- https://edcarp.github.io/shell-intermediate-esces/05-file-transfer/index.html
Resolving edcarp.github.io (edcarp.github.io)... 185.199.110.153, 185.199.108.153, 185.199.109.153, ...
Connecting to edcarp.github.io (edcarp.github.io)|185.199.110.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 15409 (15K) [text/html]
Saving to: ‘index.html’
index.html 100%[=================================================>] 15.05K --.-KB/s in 0.001s
2023-05-15 08:08:03 (20.5 MB/s) - ‘index.html’ saved [15409/15409]
We can then view or edit index.html
using the usual tools, for example cat
, nano
or head
. We can also double-click it in our laptop GUI to open it in a Web browser.
Other commands
Alternatively, we can use curl
, which supports a much larger range of protocols including common mail based protocols like pop3 and smtp. Or we might use lftp
.
Please refer to the man pages by typing man wget
, man curl
, and man lftp
in the shell for more information.
Applications
wget
, curl
and lftp
allow a degree of automation that is not possible using a Web browser.
Possible applications include:
- Downloading large files direct to a server (rather than via your laptop)
- Mirroring a Web site
- Automatically updating local datafiles to keep up-to-date with a top copy online.
Key Points
There are multiple ways to copy remote files at the command-line.
Compared to using a Web browser to access and save files, these allow greater opportunities for automation.
Working Remotely
Overview
Teaching: 10 min
Exercises: 0 minQuestions
How do I use ‘
ssh
’ and ‘scp
’ ?Objectives
Learn what SSH is
Learn how to work remotely using
ssh
andscp
What if we want to run some commands on another machine, such as the server in the basement that manages our database of experimental results? To do this, we have to first log in to that machine. We call this a remote login.
Once our local client is connected to the remote server, everything we type into the client is passed on, by the server, to the shell running on the remote computer. That remote shell runs those commands on our behalf, just as a local shell would, then sends back output, via the server, to our client, for our computer to display.
The SSH protocol uses encryption to ensure that outsiders can’t see what’s in the messages going back and forth between different computers.
The remote login server which accepts connections from client programs
is known as the SSH daemon, sshd
.
The client program we use to login remotely is
the secure shell
or ssh
.
The ssh
login client has a companion program called scp
,
which allows us to copy files to or from a remote computer using the same kind of encrypted connection.
A remote login using ssh
Depending on security settings on the server and network, we may have to be connected to the local network; or if working remotely, we may have to use the institution’s VPN.
Then, we issue the command ssh username@computer
,
which tries to make a connection to the SSH daemon running on the remote computer we have specified.
Typing exit
, or Control-D on an empty line,
terminates the remote shell.
In the example below,
the remote machine’s command prompt is moon>
instead of just $
.
To make it clearer which machine is doing what,
we’ll indent the commands sent to the remote machine
and their output.
$ pwd
/c/Users/nelle/Desktop
$ ssh nelle@moon.euphoric.edu
Password: ********
Assuming this connection works (this specific example will not!), any commands we issue,
for example hostname
, pwd
, ls
or some scientific analysis software, will run on the
remote server. This will continue until we exit the remote shell, for example:
moon> exit
pwd
confirms we are now running commands on the local computer again (not the remote server):
$ pwd
/c/Users/nelle/Desktop
Copying files to, and from a remote machine using scp
To copy a file, we specify the source and destination paths, either of which may include computer names.
For example, this command might be used to copy our latest results to Nelle’s backups
directory of server backupserver.euphoric.edu
,
printing out its progress as it does so:
$ scp results.dat nelle@backupserver.euphoric.edu:backups/results-2023-16-05.dat
Password: ********
results.dat 100% 9 1.0 MB/s 00:00
Note the colon :
, seperating the hostname of the server and the pathname of
the file we are copying to.
Copying a whole directory betwen remote machines uses the same syntax as the cp
command:
we just use the -r
option to signal that we want copying to be recursively.
For example,
this command copies all of our results from the backup
directory on the backupserver.euphoric.edu
server to our laptop:
$ scp -r nelle@backupserver.euphoric.edu:backups ./backups
Password: ********
Key Points
SSH is a secure means to access a remote Linux computer
The ‘ssh’ and ‘scp’ utilities are secure alternatives to walking over to a machine, logging into it, and copying files off it
‘ssh’ and ‘scp’ are essential for using remote Linux servers
Permissions
Overview
Teaching: 10 min
Exercises: 0 minQuestions
Understanding file/directory permissions
Objectives
What are file/directory permissions?
How to view permissions?
How to change permissions?
File/directory permissions in Windows are different
Unix controls who can read, modify, and run files using permissions.
We’ll discuss how Windows handles permissions at the end of the section: the concepts are similar, but the rules are different. (The examples in this lesson do work with Git Bash for Windows. But where control of permissions is crucial under Windows, further effort may be required.)
Every user has a unique user name. They also have an integer user ID.
Users can belong to any number of groups. Each group has a unique name and integer group IDs.
Every file and directory on a Unix computer belongs to one owner and one group.
For each file on the system, every user falls into one of three categories: the owner of the file, someone in the file’s group, and everyone else.
For each of these three categories, the computer keeps track of whether people in that category can read the file; write to the file; or run the file as a program. (In the case of directories, the latter has a different meaning - whether people in that category can search the directory.)
For example, if a file had the following set of permissions:
user | group | all | |
---|---|---|---|
read | yes | yes | no |
write | yes | no | no |
execute | no | no | no |
it would mean that:
- the file’s owner can read and write it, but not run it;
- other people in the file’s group can read it, but not modify it or run it; and
- everybody else can do nothing with it at all.
Permissions are revealed with ls -l
, for example in the exercise-data
directory:
$ ls -l
$ ls -l
total 5
-rw-r--r-- 1 nelle 1049089 18 Feb 16 17:40 numbers.txt
drwxr-xr-x 1 nelle 1049089 0 Mar 21 09:06 populations/
drwxr-xr-x 1 nelle 1049089 0 Feb 16 17:40 writing/
On the right side, we have the files’ names. On the left, we see permissions.
Let’s have a closer look at one of those permission strings:
-rw-r--r--
.
The first character tells us what type of thing this is:
‘-‘ means it’s a regular file,
while ‘d’ means it’s a directory.
The next three characters tell us what permissions the file’s owner has.
Here, the owner can read and write the file: rw-
.
The middle triplet shows us the group’s permissions.
If the permission is turned off, we see a dash, so r--
means “read, but not write or execute”.
The final triplet shows us what everyone who isn’t the file’s owner, or in the file’s group, can do.
In this case, it’s ‘r–’ again, so everyone on the system can look at the file’s contents but do nothing else.
To change permissions, we use the chmod
command
(whose name stands for “change mode”).
In research contexts, removing write permission can help reduce the chance of accidental damage to important files.
This command removes write permission on the numbers.txt
for all users:
$ chmod -w numbers.txt
-r--r--r-- 1 nelle 1049089 18 Feb 16 17:40 numbers.txt
Now, all users of the system can read the file but nothing else.
To restore write permission for the user who owns the file (only), the command is:
$ chmod u+w numbers.txt
The ‘u’ signals that we’re changing the privileges
of the user (i.e., the file’s owner),
and +w
means we should add write permission.
A quick ls -l
shows us that it worked,
because the owner’s permissions are now set to read and write:
-rw-r--r-- 1 nelle 1049089 18 Feb 16 17:40 numbers.txt
What about Windows?
Those are the basics of permissions on Unix. As we said at the outset, though, things work differently on Windows. There, permissions are defined by access control lists, or ACLs. An ACL is a list of pairs, each of which combines a “who” with a “what”. For example, you could give a user permission to append data to a file without giving permission to read or delete it.
This is more flexible that the Unix model, but it’s also more complex to administer and understand on small systems.
Key Points
File permissions describe who and what can read, write, modify, and access a file.
Use
ls -l
to view the permissions for a specific file.Use
chmod
to change permissions on a file or directory.
Shell Variables
Overview
Teaching: 10 min
Exercises: 0 minQuestions
How are variables set and accessed in the Unix shell?
Objectives
Understand how variables are implemented in the shell
Explain how the shell uses the
PATH
variable to search for executablesRead the value of an existing variable
Create new variables and change their values
The shell is just a program, and like other programs, it has variables. Those variables control its execution, so by changing their values you can change how the shell and other programs behave.
Let’s start by running the command set
and looking at some of the variables in a typical shell session:
$ set | less
What you see is highly system-dependent. As an example, the start of the output might look something like this:
ACLOCAL_PATH=/mingw64/share/aclocal:/usr/share/aclocal
ALLUSERSPROFILE='C:\ProgramData'
APPDATA='C:\Users\nelle\AppData\Roaming'
BASH=/usr/bin/bash
BASHOPTS=cmdhist:complete_fullquote:expand_aliases:extquote:force_fignore:hostcomplete:interactive_comments:login_shell:progcomp:promptvars:sourcepath
BASH_ALIASES=()
BASH_ARGC=()
BASH_ARGV=()
BASH_CMDS=()
BASH_LINENO=()
BASH_SOURCE=()
BASH_VERSINFO=([0]="4" [1]="4" [2]="23" [3]="1" [4]="release" [5]="x86_64-pc-msys")
BASH_VERSION='4.4.23(1)-release'
COLUMNS=80
COMMONPROGRAMFILES='C:\Program Files\Common Files'
COMPLETION_PATH='C:/Program Files/Git/mingw64/share/git/completion'
COMPUTERNAME=MY-COMPUTER
COMP_WORDBREAKS=$' \t\n"\'@><=;|&(:'
COMSPEC='C:\WINDOWS\system32\cmd.exe'
CONFIG_SITE=/etc/config.site
CommonProgramW6432='C:\Program Files\Common Files'
Every variable has a name. By convention, variables that are always present are given upper-case names. All shell variables’ values are strings.
Some variables (like PATH
) store lists of values. To see the value stored in your PATH
variable, do:
$ echo $PATH
In the case of PATH
, the convention is to use a colon ‘:’ as a separator.
If a program wants the individual elements of such a list,
it’s the program’s responsibility to split the variable’s string value into pieces.
The PATH
Variable
Let’s have a closer look at that PATH
variable.
Its value defines the shell’s search path,
i.e., the list of directories that the shell looks in for runnable programs
when you type in a program name without specifying what directory it is in.
The rule it uses is simple:
the shell checks each directory in the PATH
variable in turn,
looking for a program with the requested name in that directory.
As soon as it finds a match, it stops searching and runs the program.
To show how this works,
here are the components of the above PATH
listed one per line:
/c/Users/nelle/bin
/mingw64/bin
/usr/local/bin
/usr/bin
/bin
/mingw64/bin
/usr/bin
/c/Users/nelle/bin
/c/Program Files (x86)/Common Files/Oracle/Java/javapath
/c/Program Files (x86)/Common Files/Intel/Shared Libraries/redist/intel64/compiler
/c/WINDOWS/system32
/c/WINDOWS
/c/WINDOWS/System32/Wbem
/c/WINDOWS/System32/WindowsPowerShell/v1.0
/c/WINDOWS/System32/OpenSSH
/cmd
/c/Users/nelle/AppData/Local/anaconda3
/c/Users/nelle/AppData/Local/anaconda3/Library/mingw-w64/bin
/c/Users/nelle/AppData/Local/anaconda3/Library/usr/bin
/c/Users/nelle/AppData/Local/anaconda3/Library/bin
/c/Users/nelle/AppData/Local/anaconda3/Scripts
/c/Users/nelle/AppData/Local/Microsoft/WindowsApps
/c/Users/nelle/AppData/Local/Programs/Microsoft VS Code/bin
/usr/bin/vendor_perl
/usr/bin/core_perl
Let’s say there are two programs called analyze
,
in two different directories:
/bin/analyze
and
/c/Users/nelle/bin/analyze
.
Since the shell searches the directories in the order they’re listed in PATH
,
it finds /bin/analyze
first and runs that.
Showing the Value of a Variable
Let’s show the value of the variable HOME
:
$ echo HOME
HOME
That just prints “HOME”, which isn’t what we wanted. Let’s try this instead:
$ echo $HOME
/c/Users/nelle
The dollar sign tells the shell that we want the value of the variable rather than its name.
This works just like wildcards:
the shell does the replacement before running the program we’ve asked for.
Thanks to this expansion, what we actually run is echo /c/Users/nelle
,
which displays the right thing.
Creating and Changing Variables
Creating a variable is easy—we just assign a value to a name using “=”, without any spaces either side:
$ SECRET_IDENTITY=Daniel
$ echo $SECRET_IDENTITY
Daniel
To change the value, just assign a new one:
$ SECRET_IDENTITY=Chris
$ echo $SECRET_IDENTITY
Chris
If we want to set some variables automatically every time we run a shell,
the can go in files called .bashrc
and/or .bash_profile
in our home directory.
(The ‘.’ character at the front prevents ls
from listing this file
unless we specifically ask it to using -a
.)
For example, this provides a mechanism to add custom software to the search path so it can be run like any other command.
Key Points
The
PATH
variable defines the shell’s search pathVariables are assigned using “
=
” and recalled using the variable’s name prefixed by “$
”