Basic Unix File System Concepts
|Unix stores information in files;
a file is essentially just a bunch of computer bytes with a name assigned
to them. Programs, data, text, images, and a million other things
are all just files as far as Unix is concerned. A file may be
a short laundry list you typed in, a huge web page you downloaded, a complex
Microsoft Word document, an e-mail message you saved, an executable computer
program, etc. Unix works on files. One major way to get Unix
to do things for you with your files is to use a command
line interface to type in commands. Most commands operate on (do
something to or with) files. gcc,
for example, is the name of the Unix command that runs the C compiler;
File names: Although modern Unix systems can let almost any collection of characters be a file name, it's a good idea to follow some basic guidelines (not absolute rules, when you know how and why, you can violate any of these with near impunity):
Extensions: Many (not all, not even most) of the programs that make up Unix expect file names to end with what is called an extension. An extension is a period followed by up to three characters. Extensions are often optional; it's usually not a big deal to work around missing or misleading extensions, but it can be more work and in any case, extensions help you keep track of what type of data you've got in your files. So a file name of good_name.txt would be presumed to be a text file (just characters about which Unix knows nothing else your laundry list, for example); while good_name.c might be the source code of a C program you've written. After compilation, the resulting executable program (also known as a binary file, or binary image what you'll actually tell Unix to execute to run your program) might be stored in a file named good_name.exe Extensions are more for your benefit than the computer's; they're customs I recommend you follow. When you have hundreds of files, how will you remember just by looking at its name that a file named my_first_program contains a C program or an Ada program or a Basic program or whatever? And every so often you'll run into a command that is touchy about extensions like our current Ada compiler. Naming an Ada program source file good_name won't work here; to be compiled by our Ada compiler, an Ada program must be stored in a file whose name has an .adb extension.
Directories and Paths: To make it easier for you to keep track of files, Unix allows you to organize them into a hierarchical structure of directories (some other systems call them folders). You might, for example, keep all your C programs in a directory named "CS125" and all your personal correspondence in a directory named "letters". If you had a lot of letters, you might have a sub directory for letters/home, another for letters/girl_friends, yet another for letters/IRS, and so on. These sub-directories can continue for as many levels downwards as you wish, as in letters/girl_friends/Mary, letters/girl_friends/Jill, etc. Technically, to Unix, a directory is just a file that contains the names of other files as well as some information about them like, most importantly, where to find them on the hard disk, for example. (Almost everything to Unix is just a file of some sort; data is in files, directories are files, your programs will be files, even the commands you'll use are [mostly] just programs stored in files.)
So a directory is a file that contains information about other files, possibly including other directories which can then contain other files and directories, and so on. Computer scientists call such hierarchical structures, where one container can contain other containers, "trees" but draw them upside down as shown in Figure 1, below, with the root (or base) of the tree at the top and the branches spreading down instead of up. (I don't want to hear any comments here about computer scientists getting everything ass backwards; there are good historic reasons for drawing trees like this; you're just undoubtedly too young to know the history ;-) A directory can contain any mix of both "ordinary" files and other directories (known as sub-directories). Unix doesn't care, so long as everything (except one directory, called the root directory, which we'll define in a few moments) is contained within some higher level directory.
Figure 1. Sample Directory Hierarchy. The directory hierarchy forms an upside down tree.
In Unix, every file has a basic name, which just differentiates it from other files in the same directory (every file is stored in some directory, remember), and a "full" name which allows it to be differentiated from every other file in the entire Unix system. (The analogy to peoples' first names and full names is pretty close.) In Figure 1, for example, there are two files who appear to have the same name, namely April_9.txt, but actually don't because their full names are different. The fullname of one of the files there is
while the full name of the other is
I'm sure you've noticed that that full name is fairly long and would be irritating to have to type out in full any time you wanted to do something with it. Fortunately, you won't have to type a file's full name very often; usually you won't even know it. You and Unix are going to agree on a default directory and as long as you and Unix don't get confused about what directory you're both talking about, you'll just use basic file names. In the jargon of a Unix shell, we are said to always be "in" some directory (known as the "current", or "working" directory). Obviously there will have to be commands to allow you to ask what directory Unix thinks it's in, to tell it to change the directory its in, to get it to tell you the full name of something, etc. We'll come to all those commands later. But first we've got to define more precisely some of these types of file names we've been talking about basic and full, for example, and there is another type as well.
File names can be either qualified or basic (a basic name is sometimes known as simple or unqualified). A qualified file name starts with either a fully or partially qualified path, which is a list of nested directories; basic filenames do not include any path information at all. A path is fairly analogous to your street address it tells Unix how to find the file. So a qualified file name is something like a snail mail address: John Doe, 1 Main Street, Anytown, AZ. Note, however, that in snail mail addresses, reading from left to right moves in the direction of increasing generality, while for a Unix path, left to right is increasingly specific, moving down the directory tree. Anyway, the fully qualified file name
A path whose very first character is a slash,
In any event, nobody wants to type those
long, fully qualified file names
out all the time so Unix has a convention regarding a "working
directory", also known as the current
directory, and sometimes even as various
combinations of those terms (like, current working directory). At given instant, you and Unix agree on a working directory
(I'll explain how in a moment) and when you enter a command and give it just
an unqualified (or basic or simple) file name like prog1.c,
Unix looks for a file by that name only in the current working directory.
(If there is no file by that name in the current directory, Unix gives you some
sort of error message.) So, referring back to Figure 1, if the current
working directory were CS125,
you could display that file named prog1.c
with the cat command, by entering
You control the working directory. When you login to
Unix, your initial working directory, before you start moving around, is some
directory created for you by a system administrator when your Unix account was
created. (Your system administrator is said to have created a home
directory for you). The full name of my home directory here on the campus
Unix file system is /facstaff/
Anyway, back to directories: You can make new sub directories with the mkdir command. You can change the current working directory via the cd command. You can get Unix to tell you the name of the current working directory by means of the pwd command. You can get a list of the files (including other directories) within the current directory by means of the ls command. All these commands and others are explained more fully in the Basic Unix Shell Commands section. But first, there's one last type of path name to learn.
A partially qualified
(sometimes just called qualified) or relative
path is a set of nested directories not starting with a
would work But if the current working directory were CS125,
the same command wouldn't work at all since Unix wouldn't be able
to find the relative path