Command line basics
Basic proficiency with the Unix shell is essential for anyone who wants to start doing computational work outside of their laptop and Excel. Unix shells provide an interface for interacting with Unix-like (Mac OS, Linux, etc.) operating systems and a scripting language for controlling the system. The Unix philosopy is a set of software engineering norms and concepts that guide how the tools of the Unix shell interact with one another. Learning a few of these command line tools, and how they can be strung together into what are called “pipes”, is a powerful skill for developing quick and composable bioinformatics programs. Here, we’ll describe some essential commands to get you started using the command line.
Accessing the terminal
First, you’ll have to open the Terminal application. If you’re on Mac OS, the quickest way to access your terminal is: “command + space”, typing “terminal” and pressing Enter. On Windows, you’ll have to install Windows Subsystem for Linux which will allow you to interact with a (default) Ubuntu OS.
Once you’ve opened the terminal app, you’re ready to start typing commands at the command line.
Where am I?
The first command you should know is pwd. pwd will print your current working directory. This command is used to display where you are currently in the file system. For example, if I open a terminal window in my “Downloads” directory and type and hit Enter:
pwd It will return
/home/gennaro/Downloadsindicating that I am in my “Downloads” directory.
Listing files
Now that I’m in my “Downloads” directory I want to see what files I’ve downloaded. To do this, I can use the ls command to list files in the directory.
lswhich returns:
BDNF-data.tsv  CORI_Candidate_SNP_draft_250528_clean.docx  differential-expression2.tsvYour “Downloads” directory will of course have different files. If I need to display more information about these files, such as the time that they were created or how large they are, I can supply the ls command with arguments.
For example
ls -lahReturns
total 15M
drwxr-xr-x  2 gennaro gennaro 4.0K Jun  1 14:52 .
drwxr-x--- 51 gennaro gennaro 4.0K Jun  1 09:34 ..
-rw-rw-r--  1 gennaro gennaro 6.8K May 30 18:00 BDNF-data.tsv
-rw-rw-r--  1 gennaro gennaro 973K May 30 17:34 CORI_Candidate_SNP_draft_250528_clean.docx
-rw-rw-r--  1 gennaro gennaro  14M May 30 12:54 differential-expression2.tsvWhich provides information about the file permissions, the file sizes, and when the files were created.
Learning more about a command
To learn more about what arguments are available to any of the command line programs you run, you can use the man, or manual, command. This command will open the user manual for the given command.
Try typing
man lsto view all of the options available when listing files with ls.
Moving around
Let’s say I want to move from my “Downloads” directory to my “Documents” directory. The command I have to use is cd, short for “change directory”. We can use the cd command with the argument for the target directory we want to go to. For example, to move to my “Documents” directory
cd /home/gennaro/DocumentsDirectory shortcuts
The shell has a few shortcuts that make moving around a little easier. Running cd without any arguments will bring you back into your home directory.
cdIn Bash, there is an additional shortcut to specify the “/home” as well. You can use ~ in place of “/home”. For example, to move into my “Documents” folder I can use
cd ~/Documentsinstead of typing the full path. To go up one level in the directory you can use ... So to go from my Documents directory ‘up’ into my “/home” directory I can use
cd ..Finally, to go back to the same directory that you were just in you can use
cd -Creating files
You can create files with the touch command. For example, to create an empty file in my “Downloads” directory called “A.txt” I can run
touch ~/Downloads/A.txtRedirection
I’ll add some content to this file using the echo command. echo simply prints it’s arguments back out to the terminal. I’ll also use what is called redirection to append the results of the echo command into the text file.
Redirection is a core concept in Unix pipes. It allows you to take the output from one program and use it as input to another program. In this example, I’ll take the output from echo and redirect it to the file “A.txt” that we just created.
echo "This is a new line in the file" >> ~/Downloads/A.txt
echo "Here is another new line in the file" >> ~/Downloads/A.txtThe >> took the output of the echo command and inserted it as a new line in “A.txt”. Importantly, >> appended these lines into “A.txt”. If I were instead to use > like
echo "This will replace the current contents of A.txt" > ~/Downloads/A.txt“A.txt” will be overwritten with the new contents. The final essential redirection operator is the pipe |. The pipe lets you take the output from one program and use it as input to another. I’ll show an example of this later.
Displaying the content of files
The simplest way to display the contents of a file on the command line is by using the cat command. The cat command is actually designed to concatenate file together, but running it on a single file will print the entire contents of the file to the command line. For example, to print the contents of “A.txt”
cat ~/Downloads/A.txtWill print
This will replace the current contents of A.txtto the console. If you have a lot of text that you would like to display cat can result in too much information being displayed on the screen. Instead, you can use the less command. less will print the contents of the file as pages on the screen. You can use the d key to scroll down a page, or the u key to scroll up a page.
Another way to display only some of the contents of a file is to use the head or tail commands. head -n10 will print the first 10 lines of a file, whereas tail -n10 can be used to print the last 10 lines of a file.
Copying files
You can copy a file using the cp command. For example, to copy the “A.txt” file into a new file “B.txt” I can use
cp ~/Downloads/A.txt ~/Downloads/B.txtTo copy an entire directory you need to supply the -r, or recursive, argument to the cp command. For example, to create a copy of my Downloads directory inside of my Documents directory
cp -r ~/Downloads ~/Documents/Downloads-copyMoving and renaming files
The mv command can be used to move files and rename them. For example, to move the “A.txt” file into my Documents directory I can use
mv A.txt ~/DocumentsIf I now want to change the name of that file I can also use the mv command. Now you need to specify the new file name instead of the location to move the file to
mv ~/Documents/A.txt ~/Documents/C.txtMaking new directories
To make a new directory you can use the mkdir command. To make a new directory inside of my Downloads directory I can use
mkdir ~/Downloads/textfilesBy default, the mkdir command doesn’t allow you to create nested directories. To enable this, set the mkdir -p flag. For example I can create a parent folder and subfolders using
mkdir -p ~/Downloads/imagefiles/jpegsShell expansion
Another useful trick is to learn shell expansion. Shell expansion ‘expands’ the arguments. Shell expansion can be a shortcut when creating new project directories. For example
mkdir -p data doc scripts results/{figures,data-files,rds-files}The results/{figures,data-files,rds-files} expands this command into
mkdir -p data doc scripts results/figures results/data-files results/rds-filesWhich saves some typing. Shell expansion can also be used in other contexts. For example, I can create 260 empty text files using the following command
touch ~/Downloads/textfiles/{A..Z}{1..10}.txtAnother useful shell expansion is *. For example, if I needed to display the contents of each of the file we just created I could run
cat ~/Downloads/textfiles/*.txtRemoving files
Removing files on the command line can be done with the rm command. Unlike when using a GUI, when you remove files on the command line you cannot get them back so use rm wisely. To remove one of the empty files I just created I can use
rm ~/Downloads/textfiles/A1.txtIf I want to remove the entire “textfiles” directory I can use the -r, or recursive flag with rm.
rm -r ~/Downloads/textfilesBe careful when using rm. A simple space can mean removing entire file systems by mistake!
Finding files
One incredibly useful but often overlooked command line tools is find. find does exactly what you expect it to do, it finds files and folders. find has many arguments but the simplest usage is for finding files using a specific pattern. For example, to find all text (.txt) files in a particular directory and all of its subdirectories you can use:
find . -name "*.txt" -type fThis command says, “find any file (-type f) that has a name like ‘.txt’”. find is especially powerful when combined with the -exec argument. For example, to remove all .txt file in a directory you can use:
find . -name "*.txt" -type f -exec rm {} \;Downloading files
curl and wget are both command line utilities for downloading files from remote resources. curl will download and stream the results to your terminal by default. wget will save the result to a file by default.
curl https://www.gutenberg.org/cache/epub/100/pg100.txt
wget https://www.gutenberg.org/cache/epub/100/pg100.txtSearching the contents of files
grep is a tool that’s use to search the contents of files for specific text patterns. For example, if you wanted to find every line in a text file that contains the word “the” you could use:
grep "the" pg100.txtgrep also has many useful arguments. One of the most useful is that grep can return the count of the number of lines that are returned. For example, to count the number of lines in a text file that contain the word “the”:
grep -c "the" pg100.txtReplacing file contents
sed 's/find/replace/' myfile.txtPiping commands
The Unix pipe is what makes the command line so powerful. You can string together small programs to build up solutions to complex problems. The pipe allows you to take the output from one program and use it as input to another program directly.
For example, suppose we wanted count the top 10 most frequently used words across all of the works of Shakespeare
curl https://www.gutenberg.org/cache/epub/100/pg100.txt | \
sed 's/[^a-zA-Z ]/ /g' | \
tr 'A-Z ' 'a-z\n' | \
grep '[a-z]' | \
sort | \
uniq -c | \
sort -nr -k1 | \
head -n10curldownloads the text file from Project Gutenberg and streams it to stdoutsedreplaces all characters that are not spaces or letters, with spaces.trchanges all of the uppercase letters into lowercase and converts the spaces in the lines of text to newlines (each ‘word’ is now on a separate line)grepincludes only lines that contain at least one lowercase alphabetical character (removing any blank lines)sortsorts the list of ‘words’ into alphabetical orderuniqcounts the occurrences of each wordsortsorts the occurrences numerically in descending orderheadshows the top 10 lines
Compressing and uncompressing
Bioinformatics and command line tools can generally work with compressed data. Data compression saves space which can be really beneficial when transferring files over the internet. gzip is an old but commonly used compression utility that is compatible with many command line utilities. To compress a file for example.
gzip pg100.txtWill produce a compressed version of the “pg100.txt” file called “pg100.txt.gz”. Many Unix tools can work directly with gzipped files. For example,
zcat pg100.txt.gzWill unzip and print the contents of the file to the terminal and zgrep can be used to directly search the contents of a gzipped file without the need to decompress the entire file
zgrep -c "the" pg100.txt.gzTo decompress a file you use the ‘un’ version of the compression command, gunzip.
gunzip somefile.txt.gzFor-loops
The command line is also a scripting language and like any scripting language, it provides some basic control flow utilities. One of the more useful of these is the basic for loop. In Bash, the for-loop takes the form of a for each loop. the looping variable can be referred to in the loop by using the $ syntax. For example, to loop through
for F in *.txt; do sort $F | uniq -c | sort -nr -k1 | head -n1 >> top-words.txt; doneGNU parallel
GNU parallel is a command line tool that takes away the need to use for-loops entirely. parallel is extremely powerful and feature filled. Importantly, it lets you run commands across multiple jobs. For example, instead of writing a for loop we can process the text files above using 8 jobs at once with parallel
parallel --jobs 8 "sort {} | uniq -c | sort -nr -k1 | head -n1 >> top-words.txt" ::: *.txtEditors
You’ll eventually need to edit some code or files from the terminal. Two options for code editing from the command line are vim and nano. vim can be more difficult to use for a beginner but is very powerful.
vimOnce you’re in vim you can use the i key to enter “input” mode. “input” mode let’s you type in new characters. Once you’ve typed away, save your work and exit with esc + :wq. vim can be difficult to get used to which lead to the most famous of StackOverflow questions: nano provides a more user friendly interface.
nanoResources
- Terminus is a fun game designed to get you comfortable navigating the command line
 - OverTheWire is another game designed to tech you command line tools through the lense of a ‘hacker’
 - vimtutor can be used to learn 
vim