CS125 Frequently Asked Questions (FAQ)
1. What is source code? What is a source file?
2. What does machine language really look like?
3. How do I compile my program?
4. How do I interpret the compiler's error messages?
5. Where do my comments go?
6. Why is the first week of this course so confusing?
7. What is syntax?
8. Can I use SSH to connect to sun from off campus?
9. Why was my homework marked late even though I turned it in before midnight?
10. What's the difference between Linux and Unix?
11. What's the difference between sun and prclab?
12. What does RTFM stand for?
13. How come there are extra characters in my script file that I didn't type in or otherwise ask for?
14. What is an ohnosecond?
I'll add to this FAQ as time goes on. I'm always open to suggestions; what do you think I've missed that should be here?
1. What is source code? What is a source file? Code (instructions) written in a high level language like Matlab, C, Ada, Fortran, COBOL, or Basic, for example, is called source code because it is the human-readable description/specification (i.e., source) of the logic in the ultimately executable computer program. But source code itself cannot be directly executed by any computer you'll ever see (there may be some lab curiosities left in the museums that were built to allow direct execution of some simple source language, but nothing as complex as C). To produce an executable program, source code must first be translated into some target machine's language — which is the binary language the hardware designers built into that machine's circuitry and is the only language that that particular machine's hardware can ever actually execute. The file containing the source code used as input to a translator program is said to be a source file. Programmers create and modify source files with an editor. For a compiled language like C, the source file is then provided as the input to some compiler, which then outputs a completely separate file containing the machine language equivalent for the entire file of source code. (The ouput from the compiler is called the "executable" file, although that's actually a bit of an oversimplification, but we'll deal with that some other day, like in CS420.) Nothing of the programmer's source logic is executed until it is translated and the resultant executable file is executed.
Both compilers and interpreters (like Matlab) can technically be referred to as translators. The difference is when and how often the translation is done. As discussed above, a compiler translates each statement of source code only once and outputs the result (the machine language) for the entire source file as a completely separate executable file. An interpreter alternates between translation and execution, a statement at a time — first it translates (interprets) a source code statement, then it executes the resultant machine code, then it goes on to the next source language statement, interprets it, and so on. When an interpreter is supplied with a source program with a loop in it, it must translate each statement in the loop each time it is encountered; if it goes through the loop one thousand times, each statement will be interpreted (as well as executed) one thousand times. In contrast, a compiler for a compiled language (like C), translates each source statement only once, eventually producing the machine language equivalent of the entire loop. As a result of the extra work (repeat translations), programs in interpreted languages execute more slowly than compiled programs since the real-time execution of each statement must be preceded by its interpretation. Why bother with intepreted languages, then? Several reasons actually (take CS332 to learn more), perhaps the foremeost being that if something goes wrong during execution, the interpreter knows exactly which source statement was being translated/executed when the problem occurred and can provide the programmer a great deal of debugging assistance. Unless special arrangements are made (and the compiler is sophisticated enough to support such arrangements) the executable program produced by a compiler has no information whatsoever about the original source code. The source code may have been lost years ago, in fact, and the executable program would still be perfectly usable (but sure not easily modifiable ;-) You don't have the source code for Internet Explorer now do you? But you use it all the time, I presume.
2. What does machine language really look like? You really want to see it? It's pretty dull. Here's the machine language for my "Hello world!" program, compiled on and for sun/prclab when sun/prclab was hosted on an Ultrasparc computer built by Sun Microsystems (now part of Oracle). It's actually shown in formatted hexadecimal rather than binary (which is essentially unreadable by human beings) to make it somewhat easier to understand — don't say I never do anything for you. (Later in this course we'll discus why programmers and computer engineers use hexadecimal numbers so often.) Remember, this particular example of machine language only works for machines based on the Sparc chip set manufactured by Sun Microsystems. This machine language wouldn't mean anything at all to an Intel/Pentium based machine like the Dell computers in the King labs. I could get my "Hello world!" program written in C to run on the Intel/Pentium architecture, but first I'd have to find a compiler that could produce code in the Intel/Pentium machine language. Source code is portable and can (if done right) be moved from one machine to another (although it will have to be recompiled for each new type of machine); machine code is not portable except to a pretty exact duplicate of the original machine it was compiled for.
3. How do I compile my program? Issue to the Unix shell the following command:
gcc -o output_file input_file
where input_file is the name of the source file containing your program's source code written in C and output_file is the name of a file you want the compiler to place the actual executable version (machine language) of the program in. The compiler, gcc, is a translator, remember; it translates from its input file containing source code which you can understand (eventually ;-) but the computer hardware can't, into its output file containing executable machine code, which the computer hardware can understand but you can't (unless you're really, like seriously, weird and strange ;-) Your source files containing source code are created and modified by you using your favorite editor. Any time you change source code, you must recompile it for your changes to take effect. The computer hardware can't execute your source code and doesn't know or care which source file generated (was compiled into) which executable file. Changing a source file has no effect on any executable file unless and until the modified source file is recompiled.
The actual name you choose for your executable output_file is totally arbitrary (as long as it is a valid name to Unix) but for your benefit it should be chosen to "match" its corresponding source file so that you know which executable comes from which source file. For CS125, I'd like your source files to be named program_n.c and the resultant executable (produced by the compiler) to be named program_n.exe, where n is the unique number I give you for each of your programming assignments.
4. How do I interpret the compiler's error messages? With difficulty ;-) Initial interpretation of the error messages is straightforward enough. Here's a simple one, for example, from a compilation of a file named ctime.c:ctime.c: In function 'main':
ctime.c:16: parse error before '='
The first entry on the first line, before the colon, is the name of the file being compiled (ctime.c, in this example). The rest of that first line tells you which function in that file the error messages pertain to. ("Functions" are the building blocks of programs written in the C programming language. Initially, all your programs will consist of only one function and its name will be main; later, you'll be writing more complicated programs composed of several functions.) Subsequent lines in the error message give you the line numbers containing the errors (line #16,in this example), plus as much information as the compiler can manage about each error itself. Note:
- The line numbers are from the beginning of the source file, not from the beginning of the function.
- Always start correcting your errors starting from the very first one the compiler reports, as it's the only one you can be sure is correct, meaning that there really is an error where the compiler says it is. The compiler usually attempts to continue on (after reporting an error) looking for other problems; but often your first error throws things off so much that its subsequent error messages may be pointless. The result is that fixing the first actual error may in fact make a lot of later, apparently unrelated, error messages disappear. Later, when you're more experienced, you can sometimes ignore this rule a bit and try to fix several problems at a time or fix them "out of order"; but for now, do it as I've recommended here: Fix the first problem first and then recompile and move on to the next "real" error.
You'd think there would be a lot of really good web pages out there describing how to interpret gcc error messages, but there aren't. Here is a decent one valid as of late Dec 2008:
If you're working outside of the scheduled lab time for this course (meaning that I won't be within 50' or so) and can't figure out what some error message means, post your code and the message on the class discussion board on Blackboard and I'll answer it within a few hours. If you find some better web pages on this subject than the one I noted above, be so kind as to let me know please; I and your classmates will be grateful.
5. Where do my comments go? Comments are part of your source code and go in your source file, not your script file that you turn in to me. They show up in the script file, of course, because the entire source file shows up there, along with other information like the execution trace of your program. But comments are part of your C code. They are part of your programming style and are designed to make it easier for people (including you yourself) to read your source code and figure out what it does and how it does it. External documentation (documentation that is not in comments in source files but in separate documents like User Manuals, for example) can and does get lost. But source code usually doesn't. (If it does, you can forget about ever making any further changes to the program, right? If you had a compiled version of the program, you could continue to use it, but without source code, it could never be changed unless you wanted to go back to the good old days when programmers programmed directly in machine code. I've done that; you wouldn't want to; trust me on this.) I recommend that you try early in your career as a programmer (like now!) to get in the habit of routinely commenting your source code as you write it rather than regarding it a separate step to be undertaken only after the program is working correctly. Many is the time I've done a bunch of code at 3 in the morning with no comments and tried to go back and fix it after getting some sleep only to discover that I couldn't understand my own code from the night before. In the beginning of the CS125 semester, the code may be trivial enough that you can't see how this would be possible (not remember how my "Hello world!" program works?) but by the end of the semester you will be writing much more sophisticated code and you might easily forget what you meant if it were not adequately commented. Start developing good programming habits early. Note: Any initial comments you put in as you are working are mostly for your benefit; they don't have to be as complete or as "proper" as the final ones for your "deliverable" code (code that you deliver to someone else, like me). But to avoid those unpleasant mornings after, get in the habit of putting in at least least a rough set of comments even at 3 in the morning.
6. Why is the first week of this course so confusing? We are dealing with several different programs designed and built by totally different people for totally different reasons and there is no consistency among them. The ERAU/Prescott lab machines' operating system is Microsoft Windows, then we use a commercial SSH product on Windows to connect to a Linux server where we will run an editor from the University of Washington to produce source code for a compiler provided by the Free Software Foundation. This is nuts, you say; haven't you people (the faculty idiots in charge of this course) ever heard of an Integrated Development Environment (IDE)? Well, sure we have. We even have one already installed on all the lab machines. So it's an integrated development environment for a Windows based system. But the aerospace industry generally does not use Windows for software development. Any effort we (ERAU) asked you to expend to learn a Windows IDE (Visual Studio, in specific) would be essentially worthless as far as the aerospace industry is concerned. Wouldn't learning one IDE make it easier to learn other ones, the ones used in industry, later? Well, not really. If you don't know it already, let me be the first to tell you: Microsoft very rarely invents anything; they borrow the basic ideas of their products from other sources and then have to change them a lot to avoid copyright infringement, patent issues, etc. So Microsoft's IDE doesn't look anything like the Unix-based stuff the aerospace industry uses. Well, you say, how about skipping Microsoft completely and just putting the Unix environment (or Linux, a very close cousin) and a "real" industrial IDE on our lab machines? Which lab machines, just the ones in the King building? So if the King building were closed you couldn't use the machines in building 58 the way you can now? And the (hypothetical) Unix machines in King wouldn't be able to run Microsoft Word and PowerPoint and Excel so you wouldn't be able to use the King labs for term papers and other minor stuff like that. And of course you couldn't work from off campus or the dorms since the "real" Unix IDEs are expensive and you probably wouldn't want to have to buy your own copy for your personal computer(s). Serious programmers in the aerospace industry are expected to know how to work in a Unix environment like the one we have here. Maybe, even probably, a big firm will have a Unix-based IDE for its staff, but they usually don't expect you to know it (the IDE they use) at the start and you can always get started with the sort of basic Unix environment we have here — that's almost always available almost anywhere you go in aerospace, and academia too, for that matter, for those of you who will eventually think about graduate school..
OK, you say, but how about this script file business? Isn't that unnecessarily complicated? After all, what does it have to do with writing and compiling C programs? Well, either you're going to submit your programming assignments electronically or you're going to print them out and hand them in hardcopy . Everybody got a printer at home? No? Everybody want to stand in line for the one printer in the King building (and it's usually out of paper anyway)? No? I thought not. And you're going to staple those pages together for me so I don't lose any of them and flunk you and ruin your college career and your life? Staplers always in the lab? Never run out of staples? As I see it, submitting your script file via Blackboard is our only reasonable choice. (It's got to be the same for everyone or I'd go nuts trying to keep track of everyone's work properly.)
Unix is a set of Lego-like building blocks. In the Unix world, which came from (and still is) the world of research labs and universities (at least the engineering colleges), the burden of stringing together a bunch of (essentially) simple freeware tools needed to perform a complex task is up to the user. In the Microsoft world, vendors try to bundle a set of tools together, slap a couple of buttons on the front along with their logo and charge you money for the proprietary result. The more they can put together in one product, the more they can charge you. The "glue" that holds the Unix building blocks together is the file system. Every Unix tool takes input from files and makes output to files and the format of the files is described in the Unix manual so if you want to write a new tool, or better yet, combine a set of tools to do something new, it's easy to figure out how. In the Microsoft world, the vendors don't want you making changes to things; they want to make all changes and charge you for them. If you wanted to write a program that automatically added information to a Microsoft Word document, where would you find a description of the internal layout of a Word document? Right — you can't. So if there's a Microsoft world integrated toolset that does exactly what you want, it's likely to appear to be easier to use than an equivalent set of Unix tools. But (1) you have to pay for it, (2) if it doesn't do exactly what you want, you're out of luck, you can't change it, and (3) if you go somewhere else they'll have a completely different tool that is incompatible with the one you're used to.
I've oversimplified this story, of course, but not by much, really. Learning to take the very first interesting steps with separate Unix tools is probably harder than learning to do something cool with more integrated Windows ones; but the Unix skill set will be useful forever and allows you more power and flexibility and even ease of use once you are used to it. Windows stuff is designed to make it look easy from the start (so as to not scare off the naive potential customers) at the cost of downstream power, flexibility, and productivity. You're in the learning curve phase of Unix now, and it is more confusing; but it will pay off in the end. Trust me.
7. What is syntax? Syntax refers to the structural rules of a language describing what symbols may be used and how they may and may not be combined. ¿Spanish uses some different symbols than English, does it not? And in German, sentences often with a verb will end. Which is not true of English syntax. Each computer language (e.g., C, Fortran, Matlab, Ada) has its own syntax rules. Sometimes languages "borrow" from one another, sometimes they don't. Matlab's 'for' statement is based on C's so their syntax is very similar; Ada's is very different.
8. Can I use SSH to connect to sun/prclab from off campus? Yes.
9. Why was my homework marked late even though I turned it in before midnight? For us software types COB (close of business) usually means before midnight, so when I tell Blackboard that the assignment is due by COB on some date, Blackboard marks everything submitted after 11:59PM of that date as being late. But the Blackboard server ERAU uses is somewhere on the east coast so Blackboard's COB is 11:59 Eastern Time, which is 9:59PM Arizona time (Mountain Standard Time, or MST) when the east coast is not on Daylight Savings Time which changes to 8:59PM MST when the east coast is on Daylight Savings Time --- which Arizona doesn't use, except for the Navajo reservation, which does, except for the parts of the Hopi reservation inside the Navajo reservation, which don't. Anyway, consider yourself informed (more or less ;-) as to what COB means in practice to CS125.
10. What's the difference between Unix and Linux? For our purposes in CS125, the answer is, not much. In CS420 here (ERAU), we'll talk about the theory of operating systems and what choices their designers have to make (and so how their working internals may differ). But for CS125, we're interested in an operating system only insofar as it is a (transparent, we hope) host for our programming environment. The two aspects of the programming environment that you'll use the most heavily are the shell and the compiler and neither of them is an intrinsic feature of any operating system. I.e., most OS's can host most common shells and compilers. It's up to their owners (the ERAU Information Technology department, on our case) to choose which ones to install. Unix and Linux both are easily capable of hosting all major shells and compilers, most of which are also available for Windows if you want them. gcc is commonly referred to as a Unix/Linux-based compiler (it's what we'll use for CS125) but I have it installed here (home) on my Windows 7 machine as well. All OS's come bundled with at least one shell (for Windows, it's misnamed DOS; for Unix/Linux, it's usually bash); but a machine like sun/prclab intended for use as a general purpose programming environment will typically have many shells installed so that each programmer can pick his or her favorite (choice of shells tends to be a religious issue with many programmers ;-)
When you hear someone say they are a Linux programmer or a Unix programmer, you can guess that two things are probably true:
- They've done some programming in an environment that forced them to learn a bit about bash or csh or one of the other popular Linux/Unix shells. (IT has set up prclab so that your default shell, the one you see immediately after you login, is tcsh.)
- They're idiots, who don't know the difference between a shell and an OS
They're are important technical differences between Unix and Linux from the standpoint of the factors we'll study extensively in CS420; but very few programmers actually deal with the internals of an OS. Unless they are writing what is loosely referred to as "systems" code, programmers deal with editors, shells, and compilers and need to know nothing about the operating system hosting their development environment; and most don't. I think I personally have met only one real Linux programmer in my life — i.e., a programmer who wrote some of the code that's in Linux. Even if you write apps for Linux or Unix, you're likely not to need to know its internal structure, merely the APIs for the service packages you need.
11. What's the difference between sun and prclab? There is none. sun is just another name (technically, an alias) for prclab. Years ago, when I first started preparing web pages like this one for my classes, the main campus Unix server was named sun. It was an unfortunate name, since the hardware manufacturer for that computer was also named Sun (Sun Microsystems, later bought out by Oracle). So the campus had a computer named sun that was a Sun and it also had a computer named moon that was a Sun (that got replaced by a Sun named mercury that I think is still around. At one point I think we even had a Sun named pluto.) At some point, campus IT decided to replace sun with a new machine named prclab, for Prescott lab, and I don't think it was a Sun anymore either, as by then IT started to phase them out in favor of HP servers, I think, with Linux rather than Unix (see FAQ #10, above). But by then I had created close to a hundred web pages with references to sun and was not looking forwarding to manually changing all references to sun to references to prclab while leaving all references to Sun untouched. (Although in theory a single one line command in a langauge called grep could have accomplished it for my entire website Coward that I am, I was afraid to try it. After all, the downside would be screwing up a website of a hundred pages.) Anyay, it's simple enough to create multiple aliases for the same physical machine so I asked IT if they would alias sun to prclab and they kindly agreed. You'll see this a lot out on the web, where one physical server can host many different logical servers or vice versa — I mean, do you really think www.google.com refers to a single piece of hardware somewhere?
12. What does RTFM stand for? "Read the freaking manual" Note that I am defining RTFM for you there, not rudely telling you to look it up for yourself. That is an example of what philosophers refer to as the use-mention distinction.
13. How come there are sometimes extra characters in my script file that I didn't type or ask for? Your keyboard often sends "extra" keycodes to the shell or the programs you're working with (e.g., pico) and they, the shell in particular, may often send "extra" stuff back down to your display. Some of that stuff is captured by the script program and, depending on what program you use to look at the scriptfile, may show up as funny, apparently printable strings of garbage which are really printable versions of unprintable characters (like the ASCII character code or escape sequences for the DEL, Backspace , or arrow keys, for example). Here's a sample from a script file I just created:
Script started on Sun Aug 19 08:23:16 2012
[36mprclab: [32m~% [0m cat temp.c
[36mprclab: [32m~% [0m gcc temp.c
temp.c: In function 'main':
temp.c:5: error: 'x' undeclared (first use in this function)
temp.c:5: error: (Each undeclared identifier is reported only once
temp.c:5: error: for each function it appears in.)
[36mprclab: [32m~% [0m exit
The [36m that shows up in there represents the escape sequence and control codes the shell sent down to change the color on my shell prompt; the [32m represents what is sent by the shell to tell my terminal window to display all subsequent characters in the default color. (Note: "Represents" is not the same things as, "this is what actually got sent" because of the problem with unprintable ASCII acharacters.) There are lots of such escape and control sequences and there's an explanation for all of them, of course, but I certainly don't know what they all mean and why script handles some correctly but not others. If my life depended on it, I'm sure I could figure them all out; but since it doesn't, I haven't. My message to you: Don't worry about them unless you think your script file is so filled with garbage the grader won't be able to make sense of it. In that case, build a new script file and try to avoid making typing mistakes that you have to delete, or using the arrow keys, or cutting and pasting, or indvertently opening pico within the script --- all of which are guaranteed to put funny looking stuff in the script file.
This page last changed 19 Aug 2012 by Dr. M.S. Jaffe