Basic Unix File System Concepts  


Unix stores information in files; a file is essentially just a bunch of computer bytes with a name assigned to them.  Programs, data, text, images, and a million other things are all just files as far as Unix is concerned.   A file may be a short laundry list you typed in, a huge web page you downloaded, a complex Microsoft Word document, an e-mail message you saved, an executable computer program, etc.  Unix works on files.  One major way to get Unix to do things for you with your files is to use a command line interface to type in commands.  Most commands operate on (do something to or with) files.  gcc, for example, is the name of the Unix command that runs the C compiler; so  gcc my_program.c  commands Unix to compile the C program stored in a file named  my_program.c   (You'll use an editor to create your program files.) 

File names:  Although modern Unix systems can let almost any collection of characters be a file name, it's a good idea to follow some basic guidelines (not absolute rules,  when you know how and why, you can violate any of these with near impunity): 

  • Start with a letter; after that first letter, you can use numbers, letters, and (carefully) some special characters — but see the third guideline, on special characters, below.   Note:  Unix, like most modern systems, actually lets you start file names with numbers or even other characters, but you may sometime need to move a file to an older or less forgiving system (like DOS, for example) and there can be other difficulties with odd file names as well.
  • Unlike Microsoft Windows, Unix is case sensitive, so AAA and AAa would be two different Unix filenames.
  • If the letters you type are being echoed (displayed) in upper case even though you're not depressing the "Shift" key when you type, you have probably previously inadvertently hit the "Caps Lock" key at the left of your keyboard.  Hit it again to turn off  "Caps Lock".  Now your keystrokes should be displayed in lower case unless you explicitly hit the "Shift" key.
  • Don't embed blank spaces.  The string of characters "my really cute filename"  might or might not work on any given Unix system; but even if it did, it would sometimes require you to use the "quote marks" and sometimes not.  Besides, as above, there are other, older systems that you might someday want to move your files to without renaming them.  Instead of a blank, use the _underscore_  (a capital -minus- sign, just to the right of the zero on most keyboards).  my_really_cute_filename is a good, safe Unix file name. 
  • Don't use special characters (besides the underscore) until you know enough about Unix to do so safely.  The asterisk, question mark, slash, period, square brackets, single or double quote marks, and other such characters can have special meanings to Unix or some of its commands. 

Extensions:   Many (not all, not even most) of the programs that make up Unix expect file names to end with what is called an extension.  An extension is a period followed by up to three characters.  Extensions are often optional; it's usually not a big deal to work around missing or misleading extensions, but it can be more work and in any case, extensions help you keep track of what type of data you've got in your files.  So a file name of  good_name.txt would be presumed to be a text file (just  characters about which Unix knows nothing else — your laundry list, for example); while good_name.c might be the source code of a C program you've written. After compilation, the resulting executable program (also known as a binary file, or binary image — what you'll actually tell Unix to execute to run your program)  might be stored in a file named good_name.exe    Extensions are more for your benefit than the computer's; they're customs I recommend you follow.  When you have hundreds of files, how will you remember just by looking at its name that a file named  my_first_program contains a C program or an Ada program or a Basic program or whatever?   And every so often you'll run into a command that is touchy about extensions — like our current Ada compiler.  Naming an Ada program source file good_name won't work here; to be compiled by our Ada compiler, an Ada program must be stored in a file whose name has an .adb extension.

Directories and Paths:  To make it easier for you to keep track of files, Unix allows you to organize them into a hierarchical structure of directories (some other systems call them folders).  You might, for example, keep all your C programs in a directory named "CS125" and all your personal correspondence in a directory named "letters".  If you had a lot of letters, you might have a sub directory for letters/home, another for letters/girl_friends, yet another for letters/IRS, and so on.  These sub-directories can continue for as many levels downwards as you wish, as in letters/girl_friends/Maryletters/girl_friends/Jill,  etc.   Technically, to Unix, a directory is just a file that contains the names of other files as well as some information about them — like, most importantly, where to find them on the hard disk, for example.  (Almost everything to Unix is just a file of some sort; data is in files, directories are files, your programs will be files, even the commands you'll use are [mostly] just  programs stored in files.)

So a directory is a file that contains information about other files, possibly including other directories which can then contain other files and directories, and so on.  Computer scientists call such hierarchical structures, where one container can contain other containers, "trees" but draw them upside down as shown in Figure 1, below, with the root (or base) of the tree at the top and the branches spreading down instead of up.  (I don't want to hear any comments here about computer scientists getting everything ass backwards; there are good historic reasons for drawing trees like this; you're just undoubtedly too young to know the history  ;-)     A directory can contain any mix of both "ordinary" files and other directories (known as sub-directories). Unix doesn't care, so long as everything (except one directory, called the root directory, which we'll define in a few moments) is contained within some higher level directory.  

Figure 1. Sample Directory Hierarchy. The directory hierarchy forms an upside down tree.

In Unix, every file has a basic name, which just differentiates it from other files in the same directory (every file is stored in some directory, remember), and a "full" name which allows it to be differentiated from every other file in the entire Unix system.  (The analogy to peoples' first names and full names is pretty close.)  In Figure 1, for example, there are two files who appear to have the same name, namely April_9.txt, but actually don't because their full names are different.  The fullname of one of the files there is

/letters/girlfriends/Jill/April_9.txt

while the full name of the other is

/letters/IRS/April_9.txt

I'm sure you've noticed that that full name is fairly long and would be irritating to have to type out in full any time you wanted to do something with it.  Fortunately, you won't have to type a file's full name very often; usually you won't even know it. You and Unix are going to agree on a default directory and as long as you and Unix don't get confused about what directory you're both talking about, you'll just use basic file names.  In the jargon of a Unix shell, we are said to always be "in" some directory (known as the "current", or "working" directory).  Obviously there will have to be commands to allow you to ask what directory Unix thinks it's in, to tell it to change the directory its in,  to get it to tell you the full name of something, etc.  We'll come to all those commands later.  But first we've got to define more precisely some of these types of file names we've been talking about — basic and full, for example, and there is another type as well.

File names can be either qualified or basic (a basic name is sometimes known as simple or unqualified).  A qualified file name starts with either a fully or partially qualified path, which is a list of nested directories; basic filenames do not include any path information at all.  A path is fairly analogous to your street address — it tells Unix how to find the file.  So a qualified file name is something like a snail mail address:  John Doe, 1 Main Street, Anytown, AZ.  Note, however, that in snail mail addresses, reading from left to right moves in the direction of increasing generality, while for a Unix path, left to right is increasingly specific, moving down the directory tree.  Anyway, the fully qualified file name

/letters/girlfriends/Jill/April_9.txt

consists of:

A path whose very first character is a slash, like /letters/girlfriends/Jill/, is a fully qualified path (we'll cover partially qualified in a moment, be patient).  This one, /letters/girlfriends/Jill/, tells Unix  to start following the path to the file starting in the directory letters which, as shown in Figure 1, is located within the very top level directory of the entire Unix system.  Since the file system is a  hierarchical tree structure, there must be exactly one such topmost directory — no more, no less,  just exactly one.  In conversation, you and I and all other Unix hacks call it the root directory, but when talking to Unix, the name of the root is just a slash, all by itself, like so: /

In any event, nobody wants to type those long, fully qualified file names out all the time so Unix has a convention regarding a "working directory", also known as the current directory, and sometimes even as various combinations of those terms (like, current working directory).  At given instant, you and Unix agree on a working directory (I'll explain how in a moment) and when you enter a command and give it just an  unqualified (or basic or simple) file name like prog1.c, Unix looks for a file by that name only in the current working directory.  (If there is no file by that name in the current directory, Unix gives you some sort of error message.)  So, referring back to Figure 1,  if the current working directory were CS125, you could display that file named prog1.c with the cat command, by entering cat prog1.c    But you can always give a Unix command a fully qualified file name and it will try to find the file without caring about the current working directory at all; so the command   cat /CS125/prog1.c would display the correct file (the one within the CS125 directory) no matter what the current working directory was.  (That would be true even if the current working directory also contained a file whose basic name was  prog1.c    That  file in the current directory would not be the one displayed,  /CS125/prog1.c would be displayed.)

You control the working directory. When you login to Unix, your initial working directory, before you start moving around,  is some directory created  for you by a  system administrator when your Unix account was created.  (Your system administrator is said to have created a home directory for you).  The full name of my home directory here on the campus Unix file system is /facstaff/j/jaffem;   so if I had a file there whose basic name was  my_resume.doc,  it's fully qualified name would be:

/facstaff/j/jaffem/my_resume.doc

Anyway, back to directories:  You can make new sub directories with the mkdir command.  You can change the current working directory via the cd command.  You can get Unix to tell you the name of the current working directory by means of the pwd  command.  You can get a list of the files (including other directories) within the current directory by means of the ls command.  All these commands and others are explained more fully in the Basic Unix Shell Commands section. But first, there's one last type of path name to learn.

A partially qualified (sometimes just called qualified)  or relative path is a set of nested directories not starting with a slash/   Unix must start following a path from somewhere it knows about.  If it can't start at the root, about the only other place it knows about for sure is the current working directory.  So  AAA/BBB/CCC.dd would tell Unix to look for a directory named AAA within the current working directory and then look for directory BBB within (or beneath) AAA and then look for the file CCC.dd within BBB.  Thus AAA/BBB/CCC.dd would be a partially qualified file name with AAA/BBB/as a relative (or partially qualified) path and CCC.dd as the basic file name (including the dd extension). In Figure 1, for example, if the current working directory were letters, the command

cat girlfriends/jill/June_5.txt

would work  But if the current working directory were CS125, the same command  wouldn't work at all — since Unix wouldn't be able to find the relative path girlfriends/jill/ starting from within CS125.


This page last changed 22 May 2003 by Dr. M.S. Jaffe