Command line basics
Basic proficiency with the Unix shell is essential for anyone who wants to start doing computational work outside of their laptop and Excel. Unix shells provide an interface for interacting with Unix-like (Mac OS, Linux, etc.) operating systems and a scripting language for controlling the system. The Unix philosopy is a set of software engineering norms and concepts that guide how the tools of the Unix shell interact with one another. Learning a few of these command line tools, and how they can be strung together into what are called “pipes”, is a powerful skill for developing quick and composable bioinformatics programs. Here, we’ll describe some essential commands to get you started using the command line.
Accessing the terminal
First, you’ll have to open the Terminal application. If you’re on Mac OS, the quickest way to access your terminal is: “command + space”, typing “terminal” and pressing Enter. On Windows, you’ll have to install Windows Subsystem for Linux which will allow you to interact with a (default) Ubuntu OS.
Once you’ve opened the terminal app, you’re ready to start typing commands at the command line.
Where am I?
The first command you should know is pwd
. pwd
will print your current working directory. This command is used to display where you are currently in the file system. For example, if I open a terminal window in my “Downloads” directory and type and hit Enter:
pwd
It will return
/home/gennaro/Downloads
indicating that I am in my “Downloads” directory.
Listing files
Now that I’m in my “Downloads” directory I want to see what files I’ve downloaded. To do this, I can use the ls
command to list files in the directory.
ls
which returns:
BDNF-data.tsv CORI_Candidate_SNP_draft_250528_clean.docx differential-expression2.tsv
Your “Downloads” directory will of course have different files. If I need to display more information about these files, such as the time that they were created or how large they are, I can supply the ls
command with arguments.
For example
ls -lah
Returns
total 15M
drwxr-xr-x 2 gennaro gennaro 4.0K Jun 1 14:52 .
drwxr-x--- 51 gennaro gennaro 4.0K Jun 1 09:34 ..
-rw-rw-r-- 1 gennaro gennaro 6.8K May 30 18:00 BDNF-data.tsv
-rw-rw-r-- 1 gennaro gennaro 973K May 30 17:34 CORI_Candidate_SNP_draft_250528_clean.docx
-rw-rw-r-- 1 gennaro gennaro 14M May 30 12:54 differential-expression2.tsv
Which provides information about the file permissions, the file sizes, and when the files were created.
Learning more about a command
To learn more about what arguments are available to any of the command line programs you run, you can use the man
, or manual, command. This command will open the user manual for the given command.
Try typing
man ls
to view all of the options available when listing files with ls
.
Moving around
Let’s say I want to move from my “Downloads” directory to my “Documents” directory. The command I have to use is cd
, short for “change directory”. We can use the cd
command with the argument for the target directory we want to go to. For example, to move to my “Documents” directory
cd /home/gennaro/Documents
Directory shortcuts
The shell has a few shortcuts that make moving around a little easier. Running cd
without any arguments will bring you back into your home directory.
cd
In Bash, there is an additional shortcut to specify the “/home” as well. You can use ~
in place of “/home”. For example, to move into my “Documents” folder I can use
cd ~/Documents
instead of typing the full path. To go up one level in the directory you can use ..
. So to go from my Documents directory ‘up’ into my “/home” directory I can use
cd ..
Finally, to go back to the same directory that you were just in you can use
cd -
Creating files
You can create files with the touch
command. For example, to create an empty file in my “Downloads” directory called “A.txt” I can run
touch ~/Downloads/A.txt
Redirection
I’ll add some content to this file using the echo
command. echo
simply prints it’s arguments back out to the terminal. I’ll also use what is called redirection to append the results of the echo
command into the text file.
Redirection is a core concept in Unix pipes. It allows you to take the output from one program and use it as input to another program. In this example, I’ll take the output from echo
and redirect it to the file “A.txt” that we just created.
echo "This is a new line in the file" >> ~/Downloads/A.txt
echo "Here is another new line in the file" >> ~/Downloads/A.txt
The >>
took the output of the echo
command and inserted it as a new line in “A.txt”. Importantly, >>
appended these lines into “A.txt”. If I were instead to use >
like
echo "This will replace the current contents of A.txt" > ~/Downloads/A.txt
“A.txt” will be overwritten with the new contents. The final essential redirection operator is the pipe |
. The pipe lets you take the output from one program and use it as input to another. I’ll show an example of this later.
Displaying the content of files
The simplest way to display the contents of a file on the command line is by using the cat
command. The cat
command is actually designed to concatenate file together, but running it on a single file will print the entire contents of the file to the command line. For example, to print the contents of “A.txt”
cat ~/Downloads/A.txt
Will print
This will replace the current contents of A.txt
to the console. If you have a lot of text that you would like to display cat
can result in too much information being displayed on the screen. Instead, you can use the less
command. less
will print the contents of the file as pages on the screen. You can use the d
key to scroll down a page, or the u
key to scroll up a page.
Another way to display only some of the contents of a file is to use the head
or tail
commands. head -n10
will print the first 10 lines of a file, whereas tail -n10
can be used to print the last 10 lines of a file.
Copying files
You can copy a file using the cp
command. For example, to copy the “A.txt” file into a new file “B.txt” I can use
cp ~/Downloads/A.txt ~/Downloads/B.txt
To copy an entire directory you need to supply the -r
, or recursive, argument to the cp
command. For example, to create a copy of my Downloads directory inside of my Documents directory
cp -r ~/Downloads ~/Documents/Downloads-copy
Moving and renaming files
The mv
command can be used to move files and rename them. For example, to move the “A.txt” file into my Documents directory I can use
mv A.txt ~/Documents
If I now want to change the name of that file I can also use the mv
command. Now you need to specify the new file name instead of the location to move the file to
mv ~/Documents/A.txt ~/Documents/C.txt
Making new directories
To make a new directory you can use the mkdir
command. To make a new directory inside of my Downloads directory I can use
mkdir ~/Downloads/textfiles
By default, the mkdir
command doesn’t allow you to create nested directories. To enable this, set the mkdir -p
flag. For example I can create a parent folder and subfolders using
mkdir -p ~/Downloads/imagefiles/jpegs
Shell expansion
Another useful trick is to learn shell expansion. Shell expansion ‘expands’ the arguments. Shell expansion can be a shortcut when creating new project directories. For example
mkdir -p data doc scripts results/{figures,data-files,rds-files}
The results/{figures,data-files,rds-files}
expands this command into
mkdir -p data doc scripts results/figures results/data-files results/rds-files
Which saves some typing. Shell expansion can also be used in other contexts. For example, I can create 260 empty text files using the following command
touch ~/Downloads/textfiles/{A..Z}{1..10}.txt
Another useful shell expansion is *
. For example, if I needed to display the contents of each of the file we just created I could run
cat ~/Downloads/textfiles/*.txt
Removing files
Removing files on the command line can be done with the rm
command. Unlike when using a GUI, when you remove files on the command line you cannot get them back so use rm
wisely. To remove one of the empty files I just created I can use
rm ~/Downloads/textfiles/A1.txt
If I want to remove the entire “textfiles” directory I can use the -r
, or recursive flag with rm
.
rm -r ~/Downloads/textfiles
Be careful when using rm
. A simple space can mean removing entire file systems by mistake!
Finding files
One incredibly useful but often overlooked command line tools is find
. find
does exactly what you expect it to do, it finds files and folders. find
has many arguments but the simplest usage is for finding files using a specific pattern. For example, to find all text (.txt) files in a particular directory and all of its subdirectories you can use:
find . -name "*.txt" -type f
This command says, “find any file (-type f) that has a name like ‘.txt’”. find
is especially powerful when combined with the -exec
argument. For example, to remove all .txt file in a directory you can use:
find . -name "*.txt" -type f -exec rm {} \;
Downloading files
curl
and wget
are both command line utilities for downloading files from remote resources. curl
will download and stream the results to your terminal by default. wget
will save the result to a file by default.
curl https://www.gutenberg.org/cache/epub/100/pg100.txt
wget https://www.gutenberg.org/cache/epub/100/pg100.txt
Searching the contents of files
grep
is a tool that’s use to search the contents of files for specific text patterns. For example, if you wanted to find every line in a text file that contains the word “the” you could use:
grep "the" pg100.txt
grep
also has many useful arguments. One of the most useful is that grep
can return the count of the number of lines that are returned. For example, to count the number of lines in a text file that contain the word “the”:
grep -c "the" pg100.txt
Replacing file contents
sed 's/find/replace/' myfile.txt
Piping commands
The Unix pipe is what makes the command line so powerful. You can string together small programs to build up solutions to complex problems. The pipe allows you to take the output from one program and use it as input to another program directly.
For example, suppose we wanted count the top 10 most frequently used words across all of the works of Shakespeare
curl https://www.gutenberg.org/cache/epub/100/pg100.txt | \
sed 's/[^a-zA-Z ]/ /g' | \
tr 'A-Z ' 'a-z\n' | \
grep '[a-z]' | \
sort | \
uniq -c | \
sort -nr -k1 | \
head -n10
curl
downloads the text file from Project Gutenberg and streams it to stdoutsed
replaces all characters that are not spaces or letters, with spaces.tr
changes all of the uppercase letters into lowercase and converts the spaces in the lines of text to newlines (each ‘word’ is now on a separate line)grep
includes only lines that contain at least one lowercase alphabetical character (removing any blank lines)sort
sorts the list of ‘words’ into alphabetical orderuniq
counts the occurrences of each wordsort
sorts the occurrences numerically in descending orderhead
shows the top 10 lines
Compressing and uncompressing
Bioinformatics and command line tools can generally work with compressed data. Data compression saves space which can be really beneficial when transferring files over the internet. gzip
is an old but commonly used compression utility that is compatible with many command line utilities. To compress a file for example.
gzip pg100.txt
Will produce a compressed version of the “pg100.txt” file called “pg100.txt.gz”. Many Unix tools can work directly with gzipped files. For example,
zcat pg100.txt.gz
Will unzip and print the contents of the file to the terminal and zgrep
can be used to directly search the contents of a gzipped file without the need to decompress the entire file
zgrep -c "the" pg100.txt.gz
To decompress a file you use the ‘un’ version of the compression command, gunzip
.
gunzip somefile.txt.gz
For-loops
The command line is also a scripting language and like any scripting language, it provides some basic control flow utilities. One of the more useful of these is the basic for loop. In Bash, the for-loop takes the form of a for each loop. the looping variable can be referred to in the loop by using the $
syntax. For example, to loop through
for F in *.txt; do sort $F | uniq -c | sort -nr -k1 | head -n1 >> top-words.txt; done
GNU parallel
GNU parallel is a command line tool that takes away the need to use for-loops entirely. parallel
is extremely powerful and feature filled. Importantly, it lets you run commands across multiple jobs. For example, instead of writing a for loop we can process the text files above using 8 jobs at once with parallel
parallel --jobs 8 "sort {} | uniq -c | sort -nr -k1 | head -n1 >> top-words.txt" ::: *.txt
Editors
You’ll eventually need to edit some code or files from the terminal. Two options for code editing from the command line are vim
and nano
. vim
can be more difficult to use for a beginner but is very powerful.
vim
Once you’re in vim
you can use the i
key to enter “input” mode. “input” mode let’s you type in new characters. Once you’ve typed away, save your work and exit with esc + :wq
. vim
can be difficult to get used to which lead to the most famous of StackOverflow questions: nano
provides a more user friendly interface.
nano
Resources
- Terminus is a fun game designed to get you comfortable navigating the command line
- OverTheWire is another game designed to tech you command line tools through the lense of a ‘hacker’
- vimtutor can be used to learn
vim