1.3. A description of Example 1.1

1.3.1. What was in it

Even such a small example has introduced a lot of C. Among other things, it contained two functions, a #include ‘statement’, and some comment. Since comment is the easiest bit to handle, let's look at that first.

1.3.2. Layout and comment

The layout of a C program is not very important to the compiler, although for readability it is important to use this freedom to carry extra information for the human reader. C allows you to put space, tab or newline characters practically anywhere in the program without any special effect on the meaning of the program. All of those three characters are the same as far as the compiler is concerned and are called collectively white space, because they just move the printing position without causing any ‘visible’ printing on an output device. White space can occur practically anywhere in a program except in the middle of identifiers, strings, or character constants. An identifier is simply the name of a function or some other object; strings and character constants will be discussed later—don't worry about them for the moment.

Apart from the special cases, the only place that white space must be used is to separate things that would otherwise run together and become confused. In the example above, the fragment void show_message needs space to separate the two words, whereas show_message( could have space in front of the ( or not, it would be purely a matter of taste.

Comment is introduced to a C program by the pair of characters /*, which must not have a space between them. From then on, everything found up to and including the pair of characters */ is gobbled up and the whole lot is replaced by a single space. In Old C, this was not the case. The rule used to be that comment could occur anywhere that space could occur: the rule is now that comment is space. The significance of the change is minor and eventually becomes apparent in Chapter 7 where we discuss the preprocessor. A consequence of the rule for the end of comment is that you can't put a piece of comment inside another piece, because the first */ pair will finish all of it. This is a minor nuisance, but you learn to live with it.

It is common practice to make a comment stand out by making each line of multi-line comment always start with a *, as the example illustrates.

1.3.3. Preprocessor statements

The first statement in the example is a preprocessor directive. In days gone by, the C compiler used to have two phases: the preprocessor, followed by the real compiler. The preprocessor was a macro processor, whose job was to perform simple textual manipulation of the program before passing the modified text on to be compiled. The preprocessor rapidly became seen as an essential aspect of the compiler and so has now been defined as part of the language and cannot be bypassed.

The preprocessor only knows about lines of text; unlike the rest of the language it is sensitive to the end of a line and though it is possible to write multi-line preprocessor directives, they are uncommon and a source of some wonder when they are found. Any line whose first visible character is a # is a preprocessor directive.

In Example 1.1 the preprocessor directive #include causes the line containing it to be replaced completely by the contents of another file. In this case the filename is found between the < and > brackets. This is a widely used technique to incorporate the text of standard header files into your program without having to go through the effort of typing it all yourself. The <stdio.h> file is an important one, containing the necessary information that allows you to use the standard library for input and output. If you want to use the I/O library you must include <stdio.h>. Old C was more relaxed on this point.

1.3.3.1. Define statements

Another of the preprocessor's talents which is widely exploited is the #define statement. It is used like this:

#define IDENTIFIER      replacement

which says that the name represented by IDENTIFIER will be replaced by the text of replacement whenever IDENTIFIER occurs in the program text. Invariably, the identifier is a name in upper-case; this is a stylistic convention that helps the reader to understand what is going on. The replacement part can be any text at all—remember the preprocessor doesn't know C, it just works on text. The most common use of the statement is to declare names for constant numbers:

#define PI             3.141592
#define SECS_PER_MIN   60
#define MINS_PER_HOUR  60
#define HOURS_PER_DAY  24

and to use them like this

circumf = 2*PI*radius;
if(timer >= SECS_PER_MIN){
mins = mins+1;
        timer = timer - SECS_PER_MIN;
}

the output from the preprocessor will be as if you had written this:

circumf = 2*3.141592*radius;
if(timer >= 60){
        mins = mins+1;
       timer = timer - 60;
}

Summary

Preprocessor statements work on a line-by-line basis, the rest of C does not.

#include statements are used to read the contents of a specified file, typically to facilitate the use of library functions.

#define statements are typically used to give names for constants. By convention, the names are in upper case (capitalized).

1.3.4. Function declaration and definition

1.3.4.1. Declaration

After the <stdio.h> file is included comes a function declaration; it tells the compiler that show_message is a function which takes no arguments and returns no values. This demonstrates one of the changes made by the Standard: it is an example of a function prototype, a subject which Chapter 4 discusses in detail. It isn't always necessary to declare functions in advance—C will use some (old) default rules in such cases—but it is now strongly recommended that you do declare them in advance. The distinction between a declaration and a definition is that the former simply describes the type of the function and any arguments that it might take, the latter is where the body of a function is provided. These terms become more important later.

By declaring show_message before it is used, the compiler is able to check that it is used correctly. The declaration describes three important things about the function: its name, its type, and the number and type of its arguments. The void show_message( part indicates that it is a function and that it returns a value of type void, which is discussed in a moment. The second use of void is in the declaration of the function's argument list, (void), which indicates that there are no arguments to this function.

1.3.4.2. Definition

Right at the end of the program is the function definition itself; although it is only three lines long, it usefully illustrates a complete function.

In C, functions perform the tasks that some other languages split into two parts. Most languages use a function to return a value of some sort, typical examples being perhaps trigonometric functions like sin, cos, or maybe a square root function; C is the same in this respect. Other similar jobs are done by what look very much like functions but which don't return a value: FORTRAN uses subroutines, Pascal and Algol call them procedures. C simply uses functions for all of those jobs, with the type of the function's return value specified when the function is defined. In the example, the function show_message doesn't return a value so we specify that its type is void.

The use of void in that way is either crashingly obvious or enormously subtle, depending on your viewpoint. We could easily get involved here in an entertaining (though fruitless) philosophical side-track on whether void really is a value or not, but we won't. Whichever side of the question you favour, it's clear that you can't do anything with a void and that's what it means here—“I don't want to do anything with any value this function might or might not return”.

The type of the function is void, its name is show_message. The parentheses () following the function name are needed to let the compiler know that at this point we are talking about a function and not something else. If the function did take any arguments, then their names would be put between the parentheses. This one doesn't take any, which is made explicit by putting void between the parentheses.

For something whose essence is emptiness, abnegation and rejection, void turns out to be pretty useful.

The body of the function is a compound statement, which is a sequence of other statements surrounded by curly brackets {}. There is only one statement in there, but the brackets are still needed. In general, C allows you to put a compound statement anywhere that the language allows the use of a single simple statement; the job of the brackets being to turn several statements in a row into what is effectively a single statement.

It is reasonable to ask whether or not the brackets are strictly needed, if their only job is to bind multiple statements into one, yet all that we have in the example is a single statement. Oddly, the answer is yes—they are strictly needed. The only place in C where you can't put a single statement but must have a compound statement is when you are defining a function. The simplest function of all is therefore the empty function, which does nothing at all:

void do_nothing(void){}

The statement inside show_message is a call of the library function printf. printf is used to format and print things, this example being one of the simplest of its uses. printf takes one or more arguments, whose values are passed forward from the point of the call into the function itself. In this case the argument is a string. The contents of the string are interpreted by printf and used to control the way the values of the other arguments are printed. It bears a little resemblance to the FORMAT statement in FORTRAN; but not enough to predict how to use it.

Summary

Declarations are used to introduce the name of a function, its return type and the type (if any) of its arguments.

A function definition is a declaration with the body of the function given too.

A function returning no value should have its type declared as void. For example, void func(/* list of arguments */);

A function taking no arguments should be declared with void as its argument list. For example, void func(void);

1.3.5. Strings

In C, strings are a sequence of characters surrounded by quote marks:

"like this"

Because a string is a single element, a bit like an identifier, it is not allowed to continue across a line—although space or tab characters are permitted inside a string.

"This is a valid string"
"This has a newline in it
and is NOT a valid string"

To get a very long string there are two things that you can do. You could take advantage of the fact that absolutely everywhere in a C program, the sequence ‘backslash end-of-line’ disappears totally.

"This would not be valid but doesn't have \
a newline in it as far as the compiler is concerned"

The other thing you could do is to to use the string joining feature, which says that two adjacent strings are considered to be just one.

"All this " "comes out as "
"just one string"

Back to the example. The sequence ‘\n’ in the string is an example of an escape sequence which in this case represents ‘newline’. Printf simply prints the contents of the string on the program's output file, so the output will read ‘hello’, followed by a new line.

To support people working in environments that use character sets which are ‘wider’ than U.S. ASCII, such as the shift-JIS representation used in Japan, the Standard now allows multibyte characters to be present in strings and comments. The Standard defines the 96 characters that are the alphabet of C (see Chapter 2). If your system supports an extended character set, the only place that you may use these extended characters is in strings, character constants, comment and the names of header files. Support for extended character sets is an implementation defined feature, so you will have to look it up in your system's documentation.

1.3.6. The main function

In Example 1.1 there are actually two functions, show_message and main. Although main is a bit longer than show_message it is obviously built in the same shape: it has a name, the parentheses () are there, followed by the opening bracket { of the compound statement that must follow in a function definition. True, there's a lot more stuff too, but right at the end of the example you'll find the matching closing bracket } that goes with the first one to balance the numbers.

This is a much more realistic function now, because there are several statements inside the function body, not just one. You might also have noticed that the function is not declared to be void. There is a good reason for this: it returns a proper value. Don't worry about its arguments yet; they are discussed in Chapter 10.

The most important thing about main is that it is the first function to be called. In a hosted environment your C language system arranges, magically, for a call on the main function (hence its name) when the program is first started. When the function is over, so is the program. It's obviously an important function. Equally important is the stuff inside main's compound statement. As mentioned before, there can be several statements inside a compound statement, so let's look at them in turn.

1.3.7. Declarations

The first statement is this:

int count;

which is not an instruction to do anything, but simply introduces a variable to the program. It declares something whose name is count, and whose type is ‘integer’; in C the keyword that declares integers is unaccountably shortened to int. C has an idiosyncratic approach to these keywords with some having their names spelled in full and some being shortened like int. At least int has a meaning that is more or less intuitive; just wait until we get on to static.

As a result of that declaration the compiler now knows that there is something that will be used to store integral quantities, and that its name is count. In C, all variables must be declared before they are used; there is none of FORTRAN's implicit declarations. In a compound statement, all the declarations must come first; they must precede any ‘ordinary’ statements and are therefore somewhat special.

(Note for pedants: unless you specifically ask, the declaration of a variable like count is also a definition. The distinction will later be seen to matter.)

1.3.8. Assignment statement

Moving down the example we find a familiar thing, an assignment statement. This is where the first value is assigned to the variable count, in this case the value assigned is a constant whose value is zero. Prior to the assignment, the value of count was undefined and unsafe to use. You might be a little surprised to find that the assignment symbol (strictly speaking an assignment operator) is a single = sign. This is not fashionable in modern languages, but hardly a major blemish.

So far then, we have declared a variable and assigned the value of zero to it. What next?

1.3.9. The while statement

Next is one of C's loop control statements, the while statement. Look carefully at its form. The formal description of the while statement is this:

while(expression)
        statement

Is that what we have got? Yes it is. The bit that reads

count < 10

is a relational expression, which is an example of a valid expression, and the expression is followed by a compound statement, which is a form of valid statement. As a result, it fits the rules for a properly constructed while statement.

What it does must be obvious to anyone who has written programs before. For as long as the relationship count < 10 holds true, the body of the loop is executed and the comparison repeated. If the program is ever to end, then the body of the loop must do something that will eventually cause the comparison to be false: of course it does.

There are just two statements in the body of the loop. The first one is a function call, where the function show_message is invoked. A function call is indicated by the name of the function followed by the parentheses () which contain its argument list—if it takes no arguments, then you provide none. If there were any arguments, they would be put between the parentheses like this:

/* call a function with several arguments */
function_name(first_arg, second_arg, third_arg);

and so on. The call of printf is another example. More is explained in Chapter 4.

The last statement in the loop is another assignment statement. It adds one to the variable count, so that the requirement for program to stop will eventually be met.

1.3.10. The return statement

The last statement that is left to discuss is the return statement. As it is written, it looks like another function call, but in fact the rule is that the statement is written

return expression;

where the expression is optional. The example uses a common stylistic convention and puts the expression into parentheses, which has no effect whatsoever.

The return causes a value to be returned from the current function to its caller. If the expression is missing, then an unknown value is passed back to the caller—this is almost certainly a mistake unless the function returns void. Main wasn't declared with any type at all, unlike show_message, so what type of value does it return? The answer is int. There are a number of places where the language allows you to declare things by default: the default type of functions is int, so it is common to see them used in this way. An equivalent declaration for main would have been

int main(){

and exactly the same results would have occurred.

You can't use the same feature to get a default type for variables because their types must be provided explicitly.

What does the value returned from main mean, and where does it go? In Old C, the value was passed back to the operating system or whatever else was used to start the program running. In a UNIX-like environment, the value of 0 meant ‘success’ in some way, any other value (often -1) meant ‘failure’. The Standard has enshrined this, stating that 0 stands for correct termination of the program. This does not mean that 0 is to be passed back to the host environment, but whatever is the appropriate ‘success’ value for that system. Because there is sometimes confusion around this, you may prefer to use the defined values EXIT_SUCCESS and EXIT_FAILURE instead, which are defined in the header file <stdlib.h>. Returning from the main function is the same as calling the library function exit with the return value as an argument. The difference is that exit may be called from anywhere in the program, and terminates it at that point, after doing some tidying up activities. If you intend to use exit, you must include the header file <stdlib.h>. From now on, we shall use exit rather than returning from main.

Summary

The main function returns an int value.

Returning from main is the same as calling the exit function, but exit can be called from anywhere in a program.

Returning 0 or EXIT_SUCCESS is the way of indicating success, anything else indicates failure.

1.3.11. Progress so far

This example program, although short, has allowed us to introduce several important language features, amongst them:

although of course none of this has been covered rigorously.