1.3. A description of Example 1.1
1.3.1. What was in it
Even such a small example has introduced a lot of C. Among other
things, it contained two functions, a #include
‘statement’, and some comment. Since comment is the
easiest bit to handle, let's look at that first.
1.3.2. Layout and comment
The layout of a C program is not very important to the compiler, although for readability it is important to use this freedom to carry extra information for the human reader. C allows you to put space, tab or newline characters practically anywhere in the program without any special effect on the meaning of the program. All of those three characters are the same as far as the compiler is concerned and are called collectively white space, because they just move the printing position without causing any ‘visible’ printing on an output device. White space can occur practically anywhere in a program except in the middle of identifiers, strings, or character constants. An identifier is simply the name of a function or some other object; strings and character constants will be discussed later—don't worry about them for the moment.
Apart from the special cases, the only place that white space must be
used is to separate things that would otherwise run together and become
confused. In the example above, the fragment void show_message
needs space to separate the two words, whereas show_message(
could have space in front of the (
or not, it would be purely
a matter of taste.
Comment is introduced to a C program by the pair of characters
/*
, which must not have a space between them. From then on,
everything found up to and including the pair of characters */
is gobbled up and the whole lot is replaced by a single space. In Old C,
this was not the case. The rule used to be that comment could occur
anywhere that space could occur: the rule is now that comment is space. The
significance of the change is minor and eventually becomes apparent in
Chapter 7 where we discuss the preprocessor. A
consequence of the rule for the end of comment is that you can't put a
piece of comment inside another piece, because the first
*/
pair will finish all of it. This is a minor nuisance, but
you learn to live with it.
It is common practice to make a comment stand out by making each line of
multi-line comment always start with a *
, as the example
illustrates.
1.3.3. Preprocessor statements
The first statement in the example is a preprocessor directive. In days gone by, the C compiler used to have two phases: the preprocessor, followed by the real compiler. The preprocessor was a macro processor, whose job was to perform simple textual manipulation of the program before passing the modified text on to be compiled. The preprocessor rapidly became seen as an essential aspect of the compiler and so has now been defined as part of the language and cannot be bypassed.
The preprocessor only knows about lines of text; unlike the rest
of the language it is sensitive to the end of a line and though it is
possible to write multi-line preprocessor directives, they are uncommon and
a source of some wonder when they are found. Any line whose first visible
character is a #
is a preprocessor directive.
In Example 1.1 the preprocessor directive
#include
causes the line containing it to be replaced
completely by the contents of another file. In this case the filename is
found between the <
and >
brackets. This is
a widely used technique to incorporate the text of standard header
files into your program without having to go through the effort of
typing it all yourself. The <stdio.h>
file is an
important one, containing the necessary information that allows you to use
the standard library for input and output. If you want to use the I/O
library you must include <stdio.h>
. Old C was
more relaxed on this point.
1.3.3.1. Define statements
Another of the preprocessor's talents which is widely exploited is the
#define
statement. It is used like this:
#define IDENTIFIER replacement
which says that the name represented by IDENTIFIER
will be
replaced by the text of replacement whenever IDENTIFIER
occurs in the program text. Invariably, the identifier is a name in
upper-case; this is a stylistic convention that helps the reader to
understand what is going on. The replacement part can be any text at
all—remember the preprocessor doesn't know C, it just works on
text. The most common use of the statement is to declare names for
constant numbers:
#define PI 3.141592 #define SECS_PER_MIN 60 #define MINS_PER_HOUR 60 #define HOURS_PER_DAY 24
and to use them like this
circumf = 2*PI*radius; if(timer >= SECS_PER_MIN){ mins = mins+1; timer = timer - SECS_PER_MIN; }
the output from the preprocessor will be as if you had written this:
circumf = 2*3.141592*radius; if(timer >= 60){ mins = mins+1; timer = timer - 60; }
Summary
Preprocessor statements work on a line-by-line basis, the rest of C does not.
#include
statements are used to read the contents of a
specified file, typically to facilitate the use of library functions.
#define
statements are typically used to give names for
constants. By convention, the names are in upper case (capitalized).
1.3.4. Function declaration and definition
1.3.4.1. Declaration
After the <stdio.h>
file is included comes a
function declaration; it tells the compiler that
show_message
is a function which takes no arguments and
returns no values. This demonstrates one of the changes made by the
Standard: it is an example of a function prototype, a subject
which Chapter 4 discusses in detail. It isn't always
necessary to declare functions in advance—C will use some (old)
default rules in such cases—but it is now strongly recommended that
you do declare them in advance. The distinction between a
declaration and a definition is that the former
simply describes the type of the function and any arguments that it might
take, the latter is where the body of a function is provided. These terms
become more important later.
By declaring show_message
before it is used, the compiler
is able to check that it is used correctly. The declaration describes
three important things about the function: its name, its type, and the
number and type of its arguments. The void show_message(
part
indicates that it is a function and that it returns a value of type
void
, which is discussed in a moment. The second use of
void
is in the declaration of the function's argument list,
(void)
, which indicates that there are no arguments
to this function.
1.3.4.2. Definition
Right at the end of the program is the function definition itself; although it is only three lines long, it usefully illustrates a complete function.
In C, functions perform the tasks that some other languages split into
two parts. Most languages use a function to return a value of some sort,
typical examples being perhaps trigonometric functions like sin, cos, or
maybe a square root function; C is the same in this respect. Other similar
jobs are done by what look very much like functions but which don't return
a value: FORTRAN uses subroutines, Pascal and Algol call them
procedures. C simply uses functions for all of those jobs, with
the type of the function's return value specified when the
function is defined. In the example, the function
show_message
doesn't return a value so we specify that its
type is void
.
The use of void
in that way is either crashingly obvious or
enormously subtle, depending on your viewpoint. We could easily get
involved here in an entertaining (though fruitless) philosophical
side-track on whether void
really is a value or not, but we
won't. Whichever side of the question you favour, it's clear that you
can't do anything with a void
and that's what it means
here—“I don't want to do anything with any value this function
might or might not return”.
The type of the function is void
, its name is
show_message
. The parentheses ()
following
the function name are needed to let the compiler know that at this point
we are talking about a function and not something else. If the function
did take any arguments, then their names would be put between the
parentheses. This one doesn't take any, which is made explicit by putting
void
between the parentheses.
For something whose essence is emptiness, abnegation and rejection,
void
turns out to be pretty useful.
The body of the function is a compound statement, which is a
sequence of other statements surrounded by curly
brackets {}
. There is only one statement in there, but
the brackets are still needed. In general, C allows you to put a compound
statement anywhere that the language allows the use of a single simple
statement; the job of the brackets being to turn several statements in a
row into what is effectively a single statement.
It is reasonable to ask whether or not the brackets are strictly needed, if their only job is to bind multiple statements into one, yet all that we have in the example is a single statement. Oddly, the answer is yes—they are strictly needed. The only place in C where you can't put a single statement but must have a compound statement is when you are defining a function. The simplest function of all is therefore the empty function, which does nothing at all:
void do_nothing(void){}
The statement inside show_message is a call of the library function
printf
. printf
is used to format and print
things, this example being one of the simplest of its
uses. printf
takes one or more arguments, whose values are
passed forward from the point of the call into the function itself. In
this case the argument is a string. The contents of the string
are interpreted by printf
and used to control the way the
values of the other arguments are printed. It bears a little resemblance
to the FORMAT statement in FORTRAN; but not enough to predict how to use
it.
Summary
Declarations are used to introduce the name of a function, its return type and the type (if any) of its arguments.
A function definition is a declaration with the body of the function given too.
A function returning no value should have its type declared as
void
. For example,
void func(/* list of arguments */);
A function taking no arguments should be declared with void
as its argument list. For example, void func(void);
1.3.5. Strings
In C, strings are a sequence of characters surrounded by quote marks:
"like this"
Because a string is a single element, a bit like an identifier, it is not allowed to continue across a line—although space or tab characters are permitted inside a string.
"This is a valid string" "This has a newline in it and is NOT a valid string"
To get a very long string there are two things that you can do. You could take advantage of the fact that absolutely everywhere in a C program, the sequence ‘backslash end-of-line’ disappears totally.
"This would not be valid but doesn't have \ a newline in it as far as the compiler is concerned"
The other thing you could do is to to use the string joining feature, which says that two adjacent strings are considered to be just one.
"All this " "comes out as " "just one string"
Back to the example. The sequence ‘\n
’ in the
string is an example of an escape sequence which in this case
represents ‘newline’. Printf
simply prints the
contents of the string on the program's output file, so the output will
read ‘hello’, followed by a new line.
To support people working in environments that use character sets which are ‘wider’ than U.S. ASCII, such as the shift-JIS representation used in Japan, the Standard now allows multibyte characters to be present in strings and comments. The Standard defines the 96 characters that are the alphabet of C (see Chapter 2). If your system supports an extended character set, the only place that you may use these extended characters is in strings, character constants, comment and the names of header files. Support for extended character sets is an implementation defined feature, so you will have to look it up in your system's documentation.
1.3.6. The main function
In Example 1.1 there are actually two functions,
show_message
and main
. Although main is a bit
longer than show_message
it is obviously built in the same
shape: it has a name, the parentheses () are there, followed by the opening
bracket {
of the compound statement that must follow in a
function definition. True, there's a lot more stuff too, but right at the
end of the example you'll find the matching closing bracket }
that goes with the first one to balance the numbers.
This is a much more realistic function now, because there are several
statements inside the function body, not just one. You might also have
noticed that the function is not declared to be
void
. There is a good reason for this: it returns a proper
value. Don't worry about its arguments yet; they are discussed in
Chapter 10.
The most important thing about main
is that it is the first
function to be called. In a hosted environment your C language system
arranges, magically, for a call on the main
function (hence
its name) when the program is first started. When the function is over, so
is the program. It's obviously an important function. Equally important is
the stuff inside main
's compound statement. As
mentioned before, there can be several statements inside a compound
statement, so let's look at them in turn.
1.3.7. Declarations
The first statement is this:
int count;
which is not an instruction to do anything, but simply introduces a
variable to the program. It declares something whose name is
count
, and whose type is ‘integer’; in C the
keyword that declares integers is unaccountably shortened to
int
. C has an idiosyncratic approach to these keywords with
some having their names spelled in full and some being shortened like
int
. At least int
has a meaning that is more or
less intuitive; just wait until we get on to static
.
As a result of that declaration the compiler now knows that there is
something that will be used to store integral quantities, and that its name
is count
. In C, all variables must be declared before they are
used; there is none of FORTRAN's implicit declarations. In a compound
statement, all the declarations must come first; they must precede any
‘ordinary’ statements and are therefore somewhat special.
(Note for pedants: unless you specifically ask, the declaration of a
variable like count
is also a definition. The
distinction will later be seen to matter.)
1.3.8. Assignment statement
Moving down the example we find a familiar thing, an assignment
statement. This is where the first value is assigned to the variable
count
, in this case the value assigned is a constant whose
value is zero. Prior to the assignment, the value of count
was undefined and unsafe to use. You might be a little surprised to find
that the assignment symbol (strictly speaking an assignment
operator) is a single =
sign. This is not fashionable
in modern languages, but hardly a major blemish.
So far then, we have declared a variable and assigned the value of zero to it. What next?
1.3.9. The while statement
Next is one of C's loop control statements, the while statement. Look
carefully at its form. The formal description of the while
statement is this:
while(expression) statement
Is that what we have got? Yes it is. The bit that reads
count < 10
is a relational expression, which is an example of a valid
expression, and the expression is followed by a compound statement, which
is a form of valid statement. As a result, it fits the rules for a properly
constructed while
statement.
What it does must be obvious to anyone who has written programs
before. For as long as the relationship count < 10
holds
true, the body of the loop is executed and the comparison repeated. If the
program is ever to end, then the body of the loop must do something that
will eventually cause the comparison to be false: of course it does.
There are just two statements in the body of the loop. The first one is a
function call, where the function show_message
is invoked. A
function call is indicated by the name of the function followed by the
parentheses ()
which contain its argument list—if it
takes no arguments, then you provide none. If there were any arguments,
they would be put between the parentheses like this:
/* call a function with several arguments */ function_name(first_arg, second_arg, third_arg);
and so on. The call of printf
is another example. More is
explained in Chapter 4.
The last statement in the loop is another assignment statement. It adds
one to the variable count
, so that the requirement for program
to stop will eventually be met.
1.3.10. The return statement
The last statement that is left to discuss is the return
statement. As it is written, it looks like another function call, but in
fact the rule is that the statement is written
return expression;
where the expression is optional. The example uses a common stylistic convention and puts the expression into parentheses, which has no effect whatsoever.
The return causes a value to be returned from the current function to its
caller. If the expression is missing, then an unknown value is passed back
to the caller—this is almost certainly a mistake unless the function
returns void
. Main
wasn't declared with any type
at all, unlike show_message
, so what type of value does it
return? The answer is int
. There are a number of places where
the language allows you to declare things by default: the default type of
functions is int
, so it is common to see them used in this
way. An equivalent declaration for main
would have been
int main(){
and exactly the same results would have occurred.
You can't use the same feature to get a default type for variables because their types must be provided explicitly.
What does the value returned from main
mean, and where does
it go? In Old C, the value was passed back to the operating system or
whatever else was used to start the program running. In a UNIX-like
environment, the value of 0
meant ‘success’ in some way,
any other value (often -1
) meant ‘failure’. The Standard
has enshrined this, stating that 0
stands for correct
termination of the program. This does not mean that 0 is to be
passed back to the host environment, but whatever is the appropriate
‘success’ value for that system. Because there is sometimes confusion
around this, you may prefer to use the defined values
EXIT_SUCCESS
and EXIT_FAILURE
instead, which are
defined in the header file <stdlib.h>
. Returning from
the main
function is the same as calling the library function
exit
with the return value as an argument. The difference is
that exit may be called from anywhere in the program, and
terminates it at that point, after doing some tidying up activities. If you
intend to use exit
, you must include the header file
<stdlib.h>
. From now on, we shall use exit
rather than returning from main
.
Summary
The main
function returns an int
value.
Returning from main
is the same as calling the
exit
function, but exit
can be called from
anywhere in a program.
Returning 0
or EXIT_SUCCESS
is the way of
indicating success, anything else indicates failure.
1.3.11. Progress so far
This example program, although short, has allowed us to introduce several important language features, amongst them:
- Program structure
- Comment
- File inclusion
- Function definition
- Compound statements
- Function calling
- Variable declaration
- Arithmetic
- Looping
although of course none of this has been covered rigorously.