Useful Intermediate Shell Stuff
I am not a major league Unix hack; this material describes my current (occasionally
flawed) understanding some of the more advanced features of Unix that I personally
seem to use most frequently. As such, it represents a rough judgment as
to a useful next level of Unix shell familiarity. Almost all of
this material is probably included in the man pages for the shell (try
man tcsh) but there's way
too much other stuff there for a comfortable next step and no notion of
priority, utility, or importance. These topics below at least have some
personal evaluation behind their selection. I've put a very rough
and highly subjective utility rating in parentheses after each topic:
1 seems to me to be more likely to be immediately useful than 2, and so on.
If you have suggestions for omitted topics that you believe might be more useful
or important than the ones I've covered, email me and let me know --- the same
for any mistakes you spot here. When you want more, and there's lots
more (a Unix shell is an extremely sophisticated tool in its own right), check
one of the other online sources for Unix
information. Topics discussed briefly here include:
- .login and .cshrc (and other rc customization files) .login
and .cshrc are two text files that you can use to help customize your interactions with
your Unix shell. They are stored in your home directory but both are
"hidden" files: Normal use of the ls command does not show them, nor any
other file whose base name starts with a period, including a hidden directory.
(There is an option on the ls command to show hidden files). Both .login and .cshrc
contain the text of command lines that you could type into the shell any time you
wanted. But by putting them in these files, you don't have to type them in again
yourself in the future --- the shell will execute them for you automatically when you
login to the Unix system, before you ever see a shell prompt. (.cshrc is executed at
other times as well.) You can modify these files with any editor. Many of the
features described below will probably become so desirable to you that you will want to
invoke them in one of these rc files.
- Unless and until you learn the difference (far beyond what I'm going to cover in these
notes), put your commands in .cshrc rather than .login.
- Made a whole bunch of changes to your .cshrc and don't want to wait until your next
session for them to take effect or want to check to see if they work without having to
logout and login again? Enter source .cshrc,
which executes all the shell commands found in the file named .cshrc, or whatever other file name you type instead.
'rc' stands for "read commands". There are often several other
hidden rc files in your home directory. You can customize the behavior of any
program that uses an rc file by editing the commands in that rc file. Of course you
need to know the syntax and semantics of the appropriate commands (try the man pages for
that program).
- alias You've discovered that you like some
of the weird options available on some of the Unix commands and you want to use them all
the time but you don't want to type all those stupid little dashes and letters?
Enter alias brilliant "dumb_old_name options
and stuff" (with the "quotes" this
time, if there are any blanks in there). Now when you enter the command brilliant, Unix will substitute all the
characters (including blanks) that were between the "quotes" of the
alias definition. (You can leave out the quotes if there are no blanks.) You
can even use this to redefine an existing command: entering alias ls "ls -alF" means that
from now on when you type ls by itself, it's as if
you typed ls -alF. Obviously
you can confuse yourself pretty easily if you alias stuff improperly. Personally, I
wouldn't want to alias rm to ls; but Unix will allow it if someone is silly enough to
try it.
- Entering just alias by itself will display all
currently defined aliases
- Entering alias xyz will display just the
current value, if any, of the xyz alias
- unalias xyz does the obvious. If xyz had some other "normal" meaning to Unix,
it's restored; otherwise xyz now means whatever it
did before you used it in an alias --- i.e., probably nothing.
- Unix aliases and options interact nicely. If I have aliased ls to "ls
-al", entering ls -F
now would be just like entering ls -al -F
before I made the alias.
- That's why Unix allows either form ls -al -F or ls -alF; since, given the alias ls "ls -al", typing lsF would not reduce to ls -alF after alias substitution. In fact, no
alias substitution would be performed at all; instead, the shell would just go looking for
a command named lsF, which probably doesn't
exist.
- There's no way to temporarily remove an option from an alias short of
unaliasing and starting over. So unless you're sure that you'll
always want the three options
alF together, aliasing ls
to ls -alF is not
a good idea. Instead, you could:
- Use a different alias: alias
dir "ls -alF" leaves the original meaning of ls
intact.
- Use a shorter alias containing just the options you're absolutely sure
you'll always want, alias ls "ls
-al", and then type ls
-F when you want the three options alF.
- If you know where the "raw" or original Unix command you want
is stored (most commands are just executable files somewhere, remember),
typing a command with a fully qualified path name prevents aliasing.
So if the original ls command were located in /usr/bin/, typing /usr/bin/ls
(followed by whatever else you want, including options, of course) would
execute the original ls command
regardless of any alias you might have defined.
- Unix has a command whereis
which locates other commands for you. I learned that ls
was stored in /usr/bin/ by typing whereis
ls
- Most things that can be typed on a command line can be aliased.
alias xyz "abc -x | def >ghi"
creates an alias for a pipeline
with redirected output.
- Aliases don't carry over from one shell session to another. To avoid
having to re-enter your favorite aliases every time you login (or open a
new shell window), put them in your .cshrc
file
- file completion It's
useful to see long file names like CS216_homework_program_1.c,
but it's a pain to type them all the time. Hitting the Esc
key twice will cause the shell to attempt to complete whatever file name you're
currently typing.
If file completion has not been set for you by default when your account
was created (possibly in the original or default .login or .cshrc files the
system administrator created for you), enter set
filec in response to a shell prompt. Then edit
your .cshrc file and put it there too
so it's automatic next time you log on.
- The shell looks for the file names in your current working directory. So if there
are only two files in the directory, say my_homework_program_1.c
and letter_to_IRS.txt, all you'd have to type to
compile your C program would be cc m,
followed by the Esc key twice. But towards the end of the course, you might have CS216_homework_program_1.c, CS216_homework_program_2.c, CS216_homework_program_3.c, CS216_homework_program_4.c, etc.
What happens now if you type cc C
followed by Esc Esc? Well, the shell is pretty sensible. It beeps, lists the
choices available to you that match what you've typed so far, and then retypes the entire
command line, completing the filename as far as it can, cc CS216_homework_program_ in this
case. Type another character (or more), followed by two new hits of Esc and the
process continues until there's a single, unambiguous file name where you wanted it.
To get the benefits of both good long names and easy completion, try naming your programs progN_CS216_homework.c or something like that,
putting up front the variation in your commonly used names.
- Name completion works for commands as well as files.
Why? Because commands are just files as far as the shell is concerned. But,
you say, the files of the commands are not in my current working directory so where does
the shell look for command file names? In all the directories specified by the
current value of your path variable.
- What, you didn't know you had one of those? Read the man pages for csh or
tcsh. The initial or default value for your path
variable was set by the system administrator who created your account.
- The shell parses the command line from left to right and hence knows how to
differentiate command names from other file names --- the command name is everything you
type after the prompt up to the first "white space" (blank, tab or
carriage return). So it looks for command files in all the directories specified by
the path variable but only looks for other files
(arguments to the command) in the current working directory.
- Want to see the current value of path (and
all the other shell variables currently
defined)? Enter just set by itself.
- Want to know what they all are for? RTFM.
- Want to add a new directory to your path? Enter set
path=($path new_directory) You'll almost always want new_directory to be a fully qualified path
name for reasons you should be able to figure out by now.
- If your Unix account (your login name) is weird,
Unix will treat the string ~weird as the fully
qualified path to your home directory. So if you want a shorthand way to write the
fully qualified name of one of your files in some sub-directory of yours, use ~weird/aaa/bb/cccc
- In general, to set or change the value of a shell variable, tell the shell to set its_name=some_value
- To set or change the value of an environment variable, setenv
ITS_NAME value
- Note that, in contra-distinction to shell variables, there's no equals sign in there
(and the command name is setenv, not set).
- It is a custom that environment variable names be ALL CAPS, but it's not a
requirement. Remember that the shell is case sensitive, however, so AAA and aaa
would be two different environment variables. (Shell variables are traditionally
lower case; but that too is custom, not requirement.)
- history Having the shell keep a
history list of the last few commands you've issued can be very useful.
The shell has a great deal of "history" processing logic;
I'll just hit what I think are the most important uses here. If you
have history selected, typing !xyz
causes the shell to search your history stack looking for the most recent
command name starting with the string xyz.
(That's an exclamation point ! there in front of the xyz).
It then substitutes the entire historical command line for the !xyz
string you just typed. So if you type !xyz
and then hit Enter, you'll execute that command line again. Consider
part of a typical debugging cycle: First you edit your file to try and
fix the bug; then you try to compile again; then you edit again; then you
try to compile again, etc. Suppose your C source file is named program_1.c
and you're using the pico editor. First you edit, then you try to compile,
then you edit again, then you try to compile again, and so on (until you get
it right). Your command history might look like:
One gets tired of continuing to type the same old file names --- particularly
if one uses long names to be very descriptive of the contents of the file
(like, CS216_homework_program_1.c).
If history is enabled, you'd have to type the full command and file name only
once (but see file completion to
make even that simpler). After that, you could use the ! history
operator. So all you'd actually have to type would be !p
to edit and !c
to compile.
- Whoops, you copied a file using the cp
command somewhere in the middle of your compile/edit cycle there, so now
!c redoes the copy instead of the
compile? No problem, just add an extra character or so, viz.
!cc, to get to the command
you wanted.
- !?xyz? enters onto your
current command line buffer
the most recent command you entered that contains xyz
anywhere on the line --- !xyz just
enters the most recent command that began with xyz
- !! expands to the last (most recent, previous) command line executed.
- None of these (!abc, !?abc?, or !!) actually enter (execute) the command.
So you can add other material onto the command line afterwards. For
example, your last command generated so much output that it scrolled rapidly
off your screen? Enter !!
| more to re-execute your last command but pipe
it's output into the more program.. After you hit enter, the shell
expands the history data and echoes the full command line that it is now
executing so that, now that it's too late, you can say, "darn, that's
not what I wanted" and frantically try to press the ^c cancel key before
the machine, operating several gazillion times faster than you, can complete
ruining whatever it's ruining because of your stupid command. The
point here is that you don't know for sure what the shell is actually going
to substitute for your history reference until it's essentially far too
late for you to stop it if it isn't what you thought it was going to be.
- If you're not sure and you're worried, enter history
to display your history stack. This usually even works across login
sessions so at the beginning of a new session, you can use history
to see what you were doing last time you were logged in. (Good security
feature, that; if somebody has stolen your password and logged in as you,
that's one way you might detect that fact.)
If history has not been enabled for you by default, enable it yourself by
entering set history=n
where n is the number of "back entries" you
want the shell to keep in your history list for you (I usually use 20 or so,
myself). Obviously a good idea to add this set
history=n command to your .cshrc
file. (If you're getting an error message something like, "Event
not found" in response to your !abc and
you're sure that you just used a command that started with abc,
try setting history=n which
will ensure that history is turned on --- and then edit your .cshrc
file as well.)
- The ESC key, printing
and non-printing key codes, local echo, modifiers, keyboard mapping, etc.
This material is not particularly important to Unix productivity, but
I found that once I understood it (if, in fact, I do understand it),
it helped me make sense of a lot of other things I was trying to understand.
The Esc key (an abbreviation
for "escape", I'll explain why in a second) is usually at
the extreme upper left position of the keyboard. Esc sends a non-printing
character code to the shell (or whatever other program you're typing keyboard
input to). "non-printing" means the keyboard sends in
a unique numeric code all right but one that doesn't correspond to any of
the normally visible (printable) characters like a, b, c, ... X, Y, Z, 0,
1, 2, etc. So the shell won't "echo"
any obvious display code back to the CRT. Thus Esc is said to be a non-printable
character --- it has it's own unique standard code but it doesn't have a standard
print or display symbol normally associated with it.
Often, the Esc code itself is used to indicate the start of a series of other
codes that are to have a special meaning, different from their usual.
The left arrow key on my keyboard, for example, does not send a single unique
code. Instead it sends the Esc code followed by the code for the letter
'O' and then the code for 'D'. If the Esc code were not sent first,
the receiving program might just display OD. Instead, receiving EscOD,
it will move the cursor one place to the left. Thus Esc tells the receiving
program to "escape" from the normal processing of the subsequent
codes.
- There are lot's of other standard non printing characters besides Esc --- see any
complete table of ASCII codes --- but few of them have their own keys dedicated to them on
the keyboard . (For example, the code that causes a terminal or computer to
"beep" is a single non printing character. Since usually you don't want to
send that code to your computer --- it sends it to you when it wants your attention ---
there's normally not a key on your keyboard that's "bound" to that beep
code.) The set of codes bound to keys is called the keyboard mapping. You can, when
you run out of more productive things to do, find somewhere in the Unix reference manual
instructions about how to remap the keyboard to bind almost any code you want to almost
any physical key
- I just finished reading a great murder mystery where somebody built a bomb (physical
explosive, not logic --- there is a real world out there, remember) into somebody
else's desktop printer. To make sure that it went off as soon as the keyboard was
touched, the keyboard was remapped so that every physical key was bound to the "print
screen" code. This guy was a true hacker --- he built the bomb and remapped the
keyboard while under small arms fire from the villain whose computer he was booby
trapping. Now that's a real programmer. Nothing distracts us from our
relentless pursuit of pointless perfection. (Well, ok, this guy's hacking was not
pointless.)
- Many editors provide you some way to enter non-printable character codes into your files
(see the next bullet, below) and therefore need some way to show you non printable
character codes when they're there. Usually they assign a string of multiple
printable characters as a pseudonym for a single, non-printable one. If, under the
right circumstances, you typed the Esc key as input into such an editor, you might, for
example, get the five character string <ESC> displayed as part of the contents of
the file (5 separate printable characters displayed: <, E, S, C, and
>. There is unfortunately no standard convention on such things. There
would actually be only one character in the file there, but it would display as a 5
character pseudonym.
- There's also no standard on how you enter non-printable characters into
an editor. You might, for example, have to hit Esc Esc (the Esc key
twice in a row) to enter one Esc code into the edit buffer. (That
makes sense, sort of: the first Esc escapes the next code from its
normal meaning, right? Some editors and other programs use the \backslash
character similarly, by the way.) Of course, if you were dealing with
a shell rather than that particular (and hypothetical) editor, Esc Esc would
invoke file name completion
rather than entering the Esc code into the command
line edit buffer. Caveat emptor. (In case you didn't know,
that's latin for "tough shit", approximately --- literally, "let
the buyer beware".) Each editor has it's own conventions.
- modifier keys There are
some keys that under normal keyboard mappings effectively (not really) transmit no code
whatsoever. The "Shift" key and the "Ctrl" key are usually
interpreted this way. These are known as "modifier"
keys since, whatever else they do (beyond our scope here) they do not usually result in a
code being entered if they're just typed (depressed, activated) by themselves in
isolation. Instead, they alter or modify the unique numeric code sent to the
computer when another key is depressed at the same time. A lower case 'a' has a
different ASCII code than an upper case 'A', for example, despite the fact that there's
only one 'A' key on your keyboard. The shift key makes the difference.
- Echo. A keyboard is not a typewriter --- normally,
when you hit an 'x', for example, the keyboard hardware by itself cannot cause an 'x' to
be displayed anywhere. Instead, the keyboard hardware just sends the code bound to
the 'x' key to some low level keyboard-interrupt-handling-program which then passes it on
to some other more interesting program (like the shell, or an editor, or the more program)
which then may or may not choose to send some display code on to your CRT's display
manager. Each program is free to decide how to respond to a given keystroke
code. But it usually tries to do something visible (or perhaps audible, via the
"bell" code) so that at least you know your your keyboard is still working and
the character was received at the intended destination.
- Like some techno-jargon? That response is called lexical feedback; since each
character you type in is a "lexeme". Syntactic feedback is when some
program tells you that a command you just entered (composed of lexemes that you entered)
is syntactically valid or invalid --- indicating whether or not the program can figure out
what it thinks you're asking it to do. Semantic feedback is the result of the actual
execution of some command. The display of files you get in response to an ls command is semantic
feedback. Syntax refers to
form or structure, semantics refers to meaning.
- If it doesn't do something else already big and obvious --- like the more
program's quitting when you type a 'q' --- a program will usually just echo
the correct printable character(s) back to your display. Since this
echo of anything that you type is thus a function of the Unix program receiving
it (the more program doesn't make
the 'q' itself visible to you, while a shell or an editor does), it can't
be your keyboard that makes the decision whether or not to display
that character. So the keyboard, and sometimes (when you're networking)
even your local computer, by itself does not directly echo your keystrokes
--- Unix programs do. Some, like more,
try to treat every received key code as a command and do something big and
obvious (or at least "beep" if they can't figure out what to do
with the key code). Some other program,
like an editor or a shell, may just store your key codes somewhere, "buffering"
them until you tell it what to do with them. Such "buffering"
programs usually provide immediate, visible, lexical feedback by simply
echoing the character code back to the display. The shell, not too surprisingly,
just stores most input key codes (all the printable ones, at any rate) in
a command line buffer until it receives the "Enter"
code you send by hitting your "Enter " key. (Then the shell
parses the buffer from left to right and starts command
line processing.)
- One of the major differences among shell flavors (csh, Bourne shell, tcsh, ksh, bash,
etc.) is the mechanisms provided for your editing of your command line buffer. The
simplest shells don't let you do very much except use the backspace key to erase the
currently last character in the buffer. More sophisticated shells provide for a lot
more. RTFM.
- On some occasions when telneting or networking in some other fashion, you may
actually wind up with "local echo" accidentally turned on --- a program on your
local machine thinking that it's supposed to provide character echo and a remote shell (or
other) program also thinking that it's supposed to echo back to you. The signs are
obvious: every character you type on a command line gets echoed twice. So
you typed alias but you see aalliiaass or even sometimes aliasalias. You'll have to figure out how to turn
off local echo on your local machine, since the remote Unix shell won't know anything
about it.
- Shell echo also helps explain why networks get saturated sooner than you might think
they should. If you're connected via telnet to a Unix shell on a remote machine,
every single character you type causes a message to be sent to the remote machine which
sends a message back to your machine to echo the character onto the display. What
with network addresses and other protocol overhead, each of those network messages
probably takes several hundred bytes to transmit one "real" character --- an
effective signal-to-noise ratio of worse than 1/100.
- The "Enter" key itself is a printable character too (sort
of) but it gets echoed oddly and that can cause some problems under certain
circumstances. Under standard keyboard mappings, the enter key transmits a code that
in ASCII vocabulary is referred to as a "carriage return", let's
name that code <CR> for convenience here. If the CRT display manager receives
a <CR> it usually just moves the cursor all the way over to the left, to the
beginning of the current line. Thus, to get the visible effect you're used to, the
shell must send two characters back to the CRT display manager: a <CR>
followed by another code named <LF>, for line feed, which tells the display manager
to move the cursor down one row (and possibly causes an editor to insert a new blank
line). So the keyboard sends the shell one code but the shell "echoes" a
two code response. That can contribute to some problems:
- Sometimes when networking, well intentioned but overly simple-minded
"intermediate" programs along the network see the <CR> and just append an
<LF> without checking to see if one was already there (or maybe it wasn't where it
was supposed to be and the intermediary was not clever enough to notice --- maybe the
original program echoed <LF><CR> thinking the order couldn't matter).
The extra <LF>s are why you'll occasionally see text appearing double or even triple
spaced when it wasn't intended to be. Contrawise, some intermediaries, augmented in
response to this first problem, try to strip out surplus <LF>s. If they get
confused and strip too many, one line of text may overwrite another.
- That <CR><LF> combo is also why there can sometimes be difficulties in the
interchange between Unix programs and the "text only" output of word processors
(or other programs) such as MS Word that do "word wrapping" or otherwise try to
differentiate between "soft" and "hard" line breaks
(both of which
still have to involve <CR>s and <LF>s somehow.)
- A soft line break is inserted and removed automatically, as necessary, to make
text wrap properly within windows, which can be resized. A hard line break is one
you typed in yourself, explicitly, like at the end of a paragraph, and don't want removed
by any stupid program that comes along and feels like it.
- One reason for learning to use more advanced editors than pico is to be able to
open a file and ask to "see" and manipulate embedded, non-printing characters
like the <CR>s and <LF>s.
- pipes (and some other miscellaneous
notes on command line processing) On the command line
you can pipe the output of one command or program into the input of another.
You're debugging your program whose executable you've stored in a file named
I_am_an_idiot and it's spitting
out so much out output so fast you can't see it to figure out what happened?
Pipe it into your old friend the more program: I_am_an_idiot
| more The '|'
character is the pipe operator; it's location can vary from keyboard to
keyboard, On mine, its an uppercase \backslash. Without the
pipe, you'd have to redirect the
output of the first command into a temporary file, run the second
command with its input redirected from the temp file, and then manually
delete the temp file. A pipe does of all that for you.
- Vocabulary: A | B is a pipeline of two commands, A and B, connected via the
pipe operator
- In the I_am_an_idiot
| more example above, the receiving program
(more) displays the piped output for you; in other cases the receiving
program may not (depends on the receiving program, obviously). The
point is that the stdout output
of the left hand program will be piped into the stdin
input of the second. Unless the second program displays or saves
it for you somehow, you'll never see it.
- To solve that problem (if and when its a problem for you) the Unix plumbers gave us a tee command that makes two streams of stdout; I've not
used it in years, but it's around if you need it; RTFM.
- Have a program producing too much output and all you want to see displayed are the lines
(if any) that contain the word "wow"? Pipe the output into a grep flavor (grep, egrep, fgrep --- RTFM) like so:
loquacious_program |grep wow
(grep stands for "get regular expression";
see wild cards for a partial discussion
of regular expressions in Unix.)
- Note that grep will treat the string "wow" as data (a regular expression, in
fact), not a file name. Doesn't that contradict something I said way back in the
beginning of the basic Unix notes to the effect that the shell's command line processing logic parses command
lines as command_name file_name?
In a word, no. I mislead you then for simplicity's sake, but I didn't lie. The
full nature of the interaction between the shell and a command (program) it dispatches is
still too complicated to go into here; but I also said earlier what is still
correct: all the stuff after the command name on the command line is passed as
data to the command program. (In fact, even the command name itself is available as
data to the command program, if it wants it.) It is not the shell that ultimately
determines what to do with the rest of the command line data, it's the program. The
shell parses the entire command line and can do some strange and wonderful things with it
(like, for example, in-line command execution, which executes some new command [in the
middle of parsing the earlier one] and places the characters output by the new command in
the middle of the command line of the old command still being parsed --- RTFM); but
ultimately, it just passes all the junk it comes up with to the command (program) for
final disposition as options, data, file names or whatever. A great many shell
command programs expect the stuff that's not options to be a file name; but that's not a
rule. Perhaps grep should have been written to require the expression being searched
for to be entered as an option (something like grep
-ewow), but there are probably good reasons beyond my ken why it wasn't.
- You can have multiple pipes in a pipeline. You want to select out of the output of your
overly loquacious program just the lines anywhere containing a Z followed exactly 2 spaces
later by an A and then have just those lines displayed in alphabetical order, one
screenful at a time? Try:
loquacious_program |grep Z??A | sort |
more
Pretty cool, no?
- Use a particular pipeline often? Alias
it:
alias cool "loquacious_program | grep Z??A
| sort | more"
and remember to put that alias definition in your .cshrc
file.
- redirection On a command line, the
'<' and '>' symbols have a special meaning. They "redirect"
input and output, respectively. When a program (including the
shell itself) reads data from your keyboard, it is probably because it has
been programmed to read data from a file named "stdin" (that's
the conventional abbreviation for "standard input"). When
a program is writing data to your screen it (usually) is writing it to a file
named "stdout". You can temporarily rename or redirect
these files. (Pipes, above, are another
way to redirect.)
- Technically, stdin and stdout are not actually file names but "file
handles". The syntax of many modern programming languages requires programmers
to do I/O from things identified by their handles. Then there is some command
somewhere which temporarily associates a specific named file with a file handle.
Think of it this way: You're a really dumb bouncer in a wild west saloon.
You've been programmed to respond to a command to "eject from the premises the guy
standing next to the door." If I piss off the bar keeper, he or she might
say to me, "Matt, go stand next to the door" and then nod to the bouncer.
Later, if my friend John was a problem, the barkeeper would say, "John, how about
moving away from the bar and standing near the door." The bouncer program would be
unchanged, dealing as it did with handles ("Eject the guy standing
near the door.") and not caring about anybody's actual name.
- Remember much earlier I said that everything is a file to Unix? Even
devices like your keyboard actually have a file name somewhere in the Unix directory named
/dev (for devices). So normally, /dev/your_keyboard_name is mapped to the handle stdin
and all the programs you write and all the shell command programs reading from stdin get
their data from your keyboard. Redirection just (temporarily) reassigns another file
name to the appropriate handle (stdin or stdout).
The first program we all usually write in any new language just prints a cute message
on the screen. I usually print "Hi there!" Suppose I have compiled
and successfully executed my first program to display "Hi there!" and it's
executable code is in the file named my_first.exe
To display "Hi there!" on the screen, I would just enter the command my_first.exe Were I to enter the
command my_first.exe >new_file, no
output would appear on my screen but Unix would create a file named new_file and write the string "Hi there!" into it. If new_file were the name of a previously existing file,
its previous contents would be overwritten (and hence destroyed).
- The >> variant of output redirection appends the new output to the end of any
previously existing file rather than overwriting the previous contents. Otherwise,
it's the same as >. (It creates a new file if there's no existing file to
append to.)
- Typing a command name followed by <some_file_name
redirects the input source to the command from stdin to some_file_name.
The command then ignores the keyboard and reads any expected keyboard inputs from
the specified file. This can be really useful for debugging. You may get tired
of retyping the same input test data to your program when you're debugging it. Using
an editor, enter your test data into a file and then run your program with input
redirected from your test data file.
- Note: The end of a line in a text file end is indicated by the code
for a carriage return --- a non-printing
character that you can sort of see when you're typing (since the cursor
moves back to the left and then down), but you can't necessarily tell is
there by itself if you're just looking at text. Carriage returns (which
you enter by hitting the "Enter" or "Return" key) have
the "normal" meaning when read from a redirected input file.
If entering input from your keyboard required you to hit the "Enter"
key to get some data to your program, then you'd put that data on
its own line in any text file you wanted to use as the source of redirected
input to your program.
- You should be beginning to see why professionals like Unix so much. You
have a program that you're debugging that requires you to enter 10 sets
of numbers as input. So of course you put a test data in file
with a long, descriptive name and then redirect your input from that file.
But of course you don't even have to type in the full name of the test data
file since you use filename completion
logic. And once you've run your program even once with that test data,
you can make your next run with just a couple of keystrokes since the history
record for a command includes the redirection you used.
- You can redirect either input or output or both in the same command
- Redirection is per command line. After the command is finished, "stdin"
and "stdout" revert to their normal meanings (usually the keyboard and the CRT
screen --- but of course there's always a way to change even that. I leave it as an
exercise for bored students who really should be doing more productive work).
- Vocabulary: A program that both takes its input from
stdin and makes its output to stdout is called a filter. Filters
are what are connected via pipes to form
the interior of a pipeline. Many (most?) Unix commands are written as filters.
- Even if a program is not a filter, if it uses either stdin or stdout, that (stdin or
stdout) can still be redirected by itself.
- Creating new commands Have a bunch of commands that
you always want to run one after the other?
- With an editor, create a new file (let's call it new
here) and enter in the commands, one per line, just as you'd type them into the
shell. Don't type in the prompts that the shell would have shown you; just type in
your commands, one on each line.
- When debugging your new command, use the source
command to execute it --- e.g., source
new. Then, when you're ready to proceed, be a clever
old Unix hack (that's redundant, actually ;-) and have a single directory
where you store the source files for all new commands and add just
that one directory's fully qualified name to your path.
Alternativley, create an alias like alias
new "/fully_qualified_path/new" (Alway more
than one way to skin a cat in Unix.) In either case, you'll need to
have made your new file
directly executable via the chmod
command, viz. chmod +x new. (It's
always executable via the source
command; but now you want it executable by itself --- meaning that the name
new all by itself will
be treated as a command; using source,
new is a file name
supplied as an argument to the command named source.)
- Now use new just as you would any other command.
- Since the shell is actually a complete programming language in its own right, you can
even define command line variables for this new
command.
- Remember, after it's all done with its command
line processing, the shell just passes all the data there (on the
command line) to the program it starts; but that's getting beyond what
I want to write about here; read the manual pages for your shell or get
a good book.
- wild cards When typing
in file names as arguments to commands, the shell allows you to enter "wild
card" characters that define "regular expressions" that match many
different sets (strings) of characters. Want to list all the C programs in your
current directory (but only those you've named "properly" with a .c
extension)? Type ls *.c Want to delete every file you own with only five
keystrokes, wiping out 1000+ files representing the sum total of 4 years of academic
work? Enter rm *
* The asterisk matches any string of 0 or more
characters. a*z matches az, axz, avz, aabbz
and lots, lots more.
? The question mark matches any single
character. a?z matches acz but not az or
abcz, for example.
Note: although a**b
is pointless (since it means the same thing as a*b)
and hence possibly illegal, a??b is quite
different from a?b (and hence perfectly sensible).
[d-q] matches any single character in the specified range --- d
through q, inclusive, in this example.
[d,q] matches just the listed characters --- d and q, in this
example.
- There are other wild card conventions, but these four, above, are the probably the most useful.
- Wild cards can be combined. a*b?[0-9]z would match abx3z, acdebb0z, and so
on. Note: In a pattern like a*b*c, the sub string between the a and the
b does not have to be the same as the sub string between the b and the c
for this pattern to match. In fact, from the Unix command line shell itself, I'm not
aware of a way to generate such a pattern (where the two sub strings have to be the same);
although there are several Unix utility programs around that specialize in pattern
matching and could handle that easily. Try perl,
for starters.
- Want to impress me? Tell me how to build a pattern that matches the comma itself
here. For example, what pattern would you feed to the ls command to get it to list just those filenames whose
second character was either a comma or an 'x'? Hint: ?[,,x]*
won't work.