Learning to copy files using the command line is one of the most difficult tasks some students will encounter during Workshop practicals.
The faculty are not forcing the students to copy files using the command line based on a 鈥渢hat鈥檚 the way we did it鈥 mentality, but rather on our current experience. Much of the data analysis that happens today is done on computing clusters in which the users鈥 only interaction with the computer is on the command line. Learning to use the command line effectively is an extremely important skill for the toolbox of anybody with data analysis ambitions.
This document is long, because it is attempting to explain things from as near to first principles as possible. You are likely familiar with many of the concepts discussed in the next session, in which case skip to the topics that will be useful.
Quick start
If you are already familiar with navigating directories and using the command line to copy files, then you should find getting started with the practicals to be straightforward.
You will create some directories to organize things, and then in most cases copy scripts and data out of the appropriate directory in /faculty
to the directories you鈥檝e created.
Some definitions
Because for many of the students this may be a completely new topic, We鈥檒l start with some definitions.
Computer words
- command line: The command line refers to a text based interface with a computer. Examples of these are the MS Window鈥檚 Command Shell and PowerShell and MacOS鈥 terminal. In this course connections to the cloud computing environment鈥檚 command line will be made using SSH.
- cli: An abbreviation for 鈥渃ommand line interface鈥. This is often used when describing programs that are run at the commmand line. For example, R has a cli, but there are also other methods to interact with R, such as RStudio.
plink
andPRSice
have a cli. - SSH: a versatile tool for securily transfering data between computers. In this course students may use SSH to access a command line in the cloud computing environment, or to copy files to or from the cloud computing environment.
- directory: A directory, also called a 鈥渇older鈥 is an organizational unit for computer files. Files exist within a directory. Directories may contain files, other directories (subdirectories), or even nothing at all.
- folder: Synonym for 鈥渄irectory鈥. The two terms can be used interchangeably.
- directory you鈥檙e in: This, or other phrases involving 鈥渋n鈥 refer to the current working directory for the command line. Commands that do not specify a different directory will happen on the directory you are in. For example,
less foo
will show the contents of a file named 鈥foo
鈥 in the directory you are in.less /faculty/foo
will show the contents of a file named 鈥foo
鈥 in the/faculty
directory. cd
鈥淐hange directory鈥: is the command used to change what directory you are in. It is similar tosetwd()
in R.- home directory: Each user has a 鈥渉ome directory鈥 where all of their files are stored. This can be abbreviated as
~
(the tilde symbol). - faculty directory: For convenience, all of the faculties鈥 home directories are assembled in a folder called
/faculty
. - subdirectory: A directory that is within another directory. All directories (except the 鈥渞oot鈥 directory) are subdirectories of other directories. Usually 鈥渟ubdirectory鈥 will be used when this relationship is important. For example, instructions may say 鈥渃opy the 鈥楬W2鈥 directory into a subdirectory of your 鈥楧ay1鈥 directory.鈥
/
鈥渇orward slash鈥: The forward slash is the Unix/Linux/MacOS directory separator. When writing out directory names the 鈥/
鈥 is used to separate directories and subdirectories. For example,~/Day1/HW2
refers to a subdirectory named 鈥HW2
鈥, which is inside a directory named 鈥Day1
鈥, which is inside of your home directory.- path: 鈥淧ath鈥 is used to refer to a series of directories and subdirectories.
~/Day1/HW2
is a path. - file: an entity on a computer file system. Files may contain text, data, program instructions, or application specific data, such as a PowerPoint slide deck.
.
鈥渄辞迟鈥: (A single period, or 鈥渄辞迟鈥) This represents the directory the command line is operating in...
鈥渄ot dot鈥: (Two periods, or 鈥渄ots鈥) This represents the directory one level higher in the hierarchy.- copy: The act of duplicating a file or directory from its origin to a different location or name. This action is usually done with the
cp
command. - move: The act of removing a file or directory from its origin, and putting it in a different location, or changing its name. This action is usually done with the
mv
command. cp
鈥渃辞辫测鈥: The Unix/Linux/MacOS command used to copy files or directories.mv
鈥渕辞惫别鈥: The Unix/Linux/MacOS command used to move or rename files or directories.ls
鈥渓颈蝉迟鈥: The Unix/Linux/MacOS list command. It shows the names of files and directories, and can also show other informatino about them.less
鈥渟ometimes less is more鈥: A general purpose tool for looking at the contents of a text file.mkdir
鈥渕ake directory鈥 : The command to create a directory. For example,mkdir foo
will create an empty directory named 鈥foo
鈥.*
/wild cards/globbing: These are characters which can be used to match multiple other characters. It is a powerful tool to avoid having to type multiple file names, when action is to be performed on several files. For examplefoo.*
could be used to matchfoo.bar
,foo.baz
, andfoo.boz
. The collective name of the characters used is 鈥渨ild cards,鈥 and the action of matching wild cards to files is called globbing.- command line switches or options: Extra text given to a command to affect its behavior. Switches are often preceeded by
-
or--
. For example incp -v
the 鈥-v
鈥 is a switch to the 鈥cp
鈥 command. - command line arguments: This text after a command which tells the command what to operate on. For example in
cp foo bar
鈥foo
鈥 and 鈥bar
鈥 are arguments to the 鈥cp
鈥 command. Some commands may require switches before some arguments. - ENTER or RETURN: After typing a command at the command line, the ENTER or RETURN key must be pressed to submit the command.
Display conventions in this document
Text will be shown in several different fonts and formats to express meaning.
Text in a fixed, or typewriter, font
represents text on a command line. Either something that the user types, or that the computer outputs.
A screen shot of a terminal will show a sequence of command line entries and responses.
An example screen shot
Required arguments to a command will be represented by text surrounded by pointy brackets < >
. For example in cp
it is shown that some argument must be provided in the 鈥渟ource鈥 and 鈥渄estination鈥 location. When substituting in real values for the arguments, the pointy brackets are not included. So the typed command would look like cp source destination
, to copy the file 鈥渟ource鈥 to a file named 鈥渄estination鈥.
Optional arguments are shown with square brackets [ ]
. These are arguments which are not necessary for the command to function, but may be provided by the user to achieve desired results.
Anatomy of a command line
A command line ready for input
There are several items on the default command line used at the Workshop.
- The first part is your username. In this case the example username is
student
. @
is a separator.- Then comes the computer name. In this example it is
ip-10-0-201-191
, but the exact name will be different depending on which cloud node you are connected to. :
is a separator.~
shows the current directory path.~
is used as a shorthand for the current user鈥檚 home directory.$
is the end of the command line. Anything typed will appear after the$
. Instructions later may show, for example,$ ls
which will mean the user has typedls
at the command line.- The green rectangle is the cursor. Depending on your SSH client and exact terminal settings, the exact color and shape of the cursor will vary.
Putting that all together, if you see a command line showing
smith12@ip-10-0-200-233:~/day2/R-files$
That means the user smith12
is logged into the compute node ip-10-0-200-233
and is currently in their home directory, and then in the subdirectories day2
and day2
鈥檚 subdirectory R-files
.
Looking at files and directories
The list command
ls
is the command used to list the names of files and directories.
ls
with output
The command ls
is run at the command line, and it shows a single thing is in the current directory, somethings named 鈥R
鈥 and 鈥foo
鈥.
Switches can be given to ls
to have it provide more information.
ls -l
with output
The 鈥d
鈥 in drwxr-xr-x
shows that the thing named 鈥R
鈥 is a directory, and the first 鈥-
鈥 in -rw-r--r--
shows that 鈥foo
鈥 is a regular file. The letters following the first one have to do with permissions, and aren鈥檛 important at the moment.
Next is shown the owner of the files, 鈥student
,鈥 and the group of the file, 鈥students
,鈥. These also aren鈥檛 important for what we鈥檙e doing.
Next is shown the size of the file, then the date and time the file was last modified, and finally the name of the file or directory.
ls
and ls -l
are extremely useful for seeing what files and directories exist.
ls
can be given a directory as an argument, and it will show the contents of that directory.
In all of these examples, ls
is showing directories in blue. That will probably be how your screen looks, but depending on exactly which terminal and SSH client you use directories may be shown in the same color as regular files.
ls /faculty/
with output
Looking inside a file
The less
command can be used to view the contents of a text file. Many files, such as R scripts and some data files are just text, and can be easily viewed with less
.
To view the contents of a file, run less
.
student@ip-10-0-200-228:~$ less foo
will show
Using less
becasue foo
is literally filled with some random text. The final line foo (END)
is a status message from less
. It is giving the name of the file being viewed, and showing the position in the file.
If the file is long enough, it can be scrolled by pressing the arrow keys.
To exit less
, press the q
(quit) key.
Navigating directories
Moving between directories is done using the cd
鈥渃hange directory鈥 command. The syntax of the command is cd [destination]
".
Where the destination is the name of the directory you want to move into. The destination is optional, because running cd
with no destination will return you to your home directory.
An example of cd
Entering cd R
has moved the user into the 鈥R
鈥 directory, and the command prompt has been updated to reflect this change.
A full path can be given as the argument to cd
An example of cd
with a complete path
and you will be moved to the final directory in the path. The effect is the same as using multiple cd
commands
An example of cd
in separate steps
As can be seen in the previous few examples, the 鈥/
鈥 (forward slash) character is extremely important, and it has different meanings depending on where it is in the path.
When at the start of a name, it is telling the computer to look in the 鈥渞oot鈥 directory for that item. For example 鈥/faculty
鈥 is in the 鈥渞oot鈥 folder.
When in between names, it tells the computer that those are different directories or files. For example 鈥elizabeth/2022/corrs.csv
鈥 is referencing something named 鈥corrs.csv
鈥 which is in the 鈥elizabeth
鈥 directory and then the 鈥2022
鈥 subdirectory.
Leaving out a 鈥/
鈥 means that you are referencing something in the directory you are currently in. For example
Importance of the /
There is no directory called 鈥/R
鈥, so it is not possible to change there. An error is shown, 鈥No such file or directory
鈥. This error is not serious, and does not cause any problems. It just means that the change directory command could not complete, and you should check for typos, a misplaced /
, or other problems.
Actually copying files
Using cp
鈥cp
鈥 is the primary command used to copy files at the command line.
The basic syntax
The basic syntax for cp
is
cp
cp
creates a duplicate of the source file (or directory in some circumstances) at the destination.
Copying the file 鈥foo
鈥 to another file called 鈥bar
鈥 is done with the command
cp foo bar
This will result in two identical files, foo
and bar
in the current directory.
Copying a file
cp
can be combined with wild cards to copy multiple files at the same time. For example
cp /faculty/elizabeth/2022/*.R .
will copy all of the files that end in 鈥.R
鈥 to the current directory, which is referenced by 鈥.
鈥 which is usually spoken as 鈥渄辞迟鈥.
cp
can be given the -r
鈥渞ecursive鈥 switch to cause it to copy a directory, and everything in that directory. For example
Copying a directory
has copied everything in the directory /faculty/elizabeth/2022
to the current directory. There is now a new 2022
directory which contains a copy of everything that is in the /faculty/elizabeth/2022
directory.
At the start of most practicals, you will use either cp -r
or cp
with wild cards to copy files out of the appropriate directory under /faculty
to one of your directories.
Using mv
鈥mv
鈥 is the primary command used to move files at the command line. mv
is used in a similar way to cp
, but there are some very important differences. The most important is that mv
removes the source file. After running mv
you still have the same number of files or directories you started with, they are just located someplace else, or have a different name.
For example
mv foo bar
renames the file 鈥foo
鈥 to 鈥bar
鈥. In this case 鈥foo
鈥 could have been a directory, and then it will be renamed to a directory called 鈥bar
鈥.
When moving multiple files (or files and directories), then the destination must be a directory.
mv *.R My-R
Will move all of the files in the current directory that end in 鈥.R
鈥 into the directory 鈥My-R
鈥. The destination directory, 鈥My-R
鈥 in the example, must exist before running the mv
command. It will not automatically be created.
Creating directories with mkdir
The command to 鈥渕ake directories鈥 is mkdir
. It is very simple to use, just mkdir
.
To create a directory called 鈥foo
鈥 just run
mkdir foo
Your friends TAB and Up Arrow
Two huge time savers are the use of the TAB key and the Up Arrow key.
TAB
TAB is used to complete text on the command line. For example, if I want to copy files from /faculty/elizabeth/2022
, I don鈥檛 need to type out all of those characters. This is what my typing will actually look like, with #TAB#
for each time I press the TAB key.
cp /f#TAB#
completes to
cp /faculty/
and then continue typing
cp /faculty/el#TAB#
which completes to
cp /faculty/elizabeth/
If a completion isn鈥檛 unique, then pressing TAB a second time will list the possible completions. If nothing is listed after repeated pressings of TAB, then there aren鈥檛 any possible completions.
This is what that might look like at a terminal, with a red mark inserted each time I pressed the TAB key, and the rest of the line being what was automaticaly added by the computer.
Using TAB to complete text
Use of the TAB key is highly recommended to avoid typos in long file and directory names.
Up Arrow
The Up Arrow is used to recall previous typed commands. Those commands can then be edited or used again as they are. The red arrow shows where I pressed the up arrow.
ls
with output
I pressed the Up Arrow once to recover ls
, pressed ENTER, and then I pressed the Up Arrow again to recover ls
, but edited the line to add a -l
before pressing ENTER.
That is a very trivial example, and it is hardly worth pressing the Up Arrow to recover ls
, but on long and complicated commands, the Up Arrow is a large time saver.
After pressing the Up Arrow multiple times and getting into your command 鈥渉istory鈥, it is possible to use the Down Arrow to move to more recent commands. You can return to an empty command line by either pressing the Down Arrow until you are back at a bare prompt, or pressing ctrl-c
.
Long commands can be edited by using the left and right arrows, and when modified to your satisfaction, pressing the ENTER (or RETURN) key will submit the command line.