4.4. Linkage
Although the simple examples have carefully avoided the topic, we now have to look into the effects of scope and linkage, terms used to describe the accessibility of various objects in a C program. Why bother? It's because realistic programs are built up out of multiple files and of course libraries. It is clearly crucial that functions in one file should be able to refer to functions (or other objects) in other files and libraries; naturally there are a number of concepts and rules that apply to this mechanism.
If you are relatively new to C, there are more important subjects to cover first. Come back to this stuff later instead.
There are essentially two types of object in C: the internal and external objects. The distinction between external and internal is to do with functions: anything declared outside a function is external, anything inside one, including its formal parameters, is internal. Since no function can be defined inside another, functions themselves are always external. At the outermost level, a C program is a collection of external objects.
Only external objects participate in this cross-file and library communication.
The term used by the Standard to describe the accessibility of objects
from one file to another, or even within the same file, is
linkage. There are three types of linkage: external
linkage, internal linkage and no linkage.
Anything internal to a function—its arguments, variables and so
on—always has no linkage and so can only be accessed from
inside the function itself. (The way around this is to declare something
inside a function but prefix it with the keyword extern
which says ‘it isn't really internal’, but we needn't worry about
that just yet.)
Objects that have external linkage are all considered to be located at
the outermost level of the program; this is the default linkage for
functions and anything declared outside of a function. All instances
of a particular name with external linkage refer to the same object in
the program. If two or more declarations of the same name have
external linkage but incompatible types, then you've done something very
silly and have undefined behaviour. The most obvious example of external
linkage is the printf function, whose declaration in
<stdio.h>
is
int printf(const char *, ...);
From that we can tell that it's a function returning int and with a
particular prototype—so we know everything about its type. We also
know that it has external linkage, because that is the default for every
external object. As a result, everywhere that the name printf
is used with external linkage, we are referring to this function.
Quite often, you want to be able to declare functions and other objects within a single file in a way that allows them to reference each other but not to be accessible from outside that file. This is often necessary in the modules that support library functions, where the additional framework that makes those functions work is not interesting to the user and would be a positive nuisance if the names of those things became visible outside the module. You do it through the use of internal linkage.
Names with internal linkage only refer to the same object within a
single source file. You do this by prefixing their declarations with the
keyword static
, which changes the linkage of external
objects from external linkage to internal linkage. It is also possible to
declare internal objects to be static
, but that has an
entirely different meaning which we can defer for the moment.
It's confusing that the types of linkage and the types of object are both described by the terms ‘internal’ and ‘external’; this is to some extent historical. C archaeologists may know that at one time the two were equivalent and one implied the other—for us it's unfortunate that the terms remain but the meanings have diverged. To summarize:
Type of linkage | Type of object | Accessibility |
---|---|---|
external | external | throughout the program |
internal | external | a single file |
none | internal | local to a single function |
Finally, before we see an example, it is important to know that all objects with external linkage must have one and only one definition, although there can be as many compatible declarations as you like. Here's the example.
/* first file */ int i; /* definition */ main () { void f_in_other_place (void); /* declaration */ i = 0 } /* end of first file */ /* start of second file */ extern int i; /* declaration */ void f_in_other_place (void){ /* definition */ i++; } /* end of second file */Example 4.9
Although the full set of rules is a bit more complex, the basic way of working out what constitutes a definition and a declaration is not hard:
- A function declaration without a body for the function is just a declaration.
- A function declaration with a body for the function is a definition.
- At the external level, a declaration of an object (like the
variable
i
) is a definition unless it has the keywordextern
in front of it, when it is a declaration only.
Chapter 8 revisits the definition and declaration criteria to a depth that will cause decompression sickness when you surface.
In the example it's easy to see that each file is able to access the objects defined in the other by using their names. Just from that example alone you should be able to work out how to construct programs with multiple files and functions and variables declared or defined as appropriate in each of them.
Here's another example, using static
to restrict the
accessibility of functions and other things.
/* example library module */ /* only 'callable' is visible outside */ static buf [100]; static length; static void fillup(void); int callable (){ if (length ==0){ fillup (); } return (buf [length--]); } static void fillup (void){ while (length <100){ buf [length++] = 0; } }Example 4.10
A user of this module can safely re-use the names declared here,
length
, buf
, and fillup
, without
any danger of surprising effects. Only the name callable
is
accessible outside this module.
A very useful thing to know is that any external object that has no
other initalizer (and except for functions we haven't seen any
initializers yet) is always set to the value of zero before the program
starts. This is widely used and relied on—the previous example
relies on it for the initial value of length
.
4.4.1. Effect of scope
There's one additional complicating factor beyond simply linkage. Linkage allows you to couple names together on a per-program or a per-file basis, but scope determines the visibility of the names. Fortunately, the rules of scope are completely independent of anything to do with linkage, so you don't have to remember funny combinations of both.
What introduces the complexity is the dreaded extern keyword. The nice
regular block structure gets blown to pieces with this, which although
at a first glance is simple and obvious, does some very nasty things to
the fabric of the language. We'll leave its nasty problems to
Chapter 8, since they only rear up if you deliberately
start to do perverse things with it and then say ‘what does this
mean’? We've already seen it used to ensure that the declaration of
something at the outer block level (the external level) of the program
is a declaration and not a definition (but beware: you can still
override the extern
by, for example, providing an
initializer for the object).
Unless you prefix it with extern
, the declaration of any
data object (not a function) at the outer level is also a definition.
Look back to Example 4.9 to see this in use.
All function declarations implicitly have the extern
stuck in front of them, whether or not you put it there too. These two
ways of declaring some_function
are equivalent and are
always declarations:
void some_function(void); extern void some_function(void);
The thing that mysteriously turns those declarations into definitions is that when you also provide the body of the function, that is effectively the initializer for the function, so the comment about initializers comes into effect and the declaration becomes a definition. So far, no problem.
Now, what is going on here?
void some_function(void){ int i_var; extern float e_f_var; } void another_func(void){ int i; i = e_f_var; /* scope problem */ }
What happened was that although the declaration of
e_f_var
declares that something called e_f_var
is of type float
and is accessible throughout the entire
program, the scope of the name disappears at the end of the
function that contains it. That's why it is meaningless inside
another_func
—the name of e_f_var
is out
of scope, just as much as i_var
is.
So what use is that? It's sometimes handy if you only want to make use of an external object from within a single function. If you followed the usual practice and declared it at the head of the particular source file, then there is no easy way for the reader of that file to see which functions actually use it. By restricting the access and the scope of the name to the place where is needed, you do communicate to a later reader of the program that this is a very restricted use of the name and that there is no intention to make widespread use of it throughout the file. Of course, any half-way decent cross-reference listing would communicate that anyway, so the argument is a bit hard to maintain.
Chapter 8 is the place to find out more. There's a set of
guidelines for how to get the results that are most often wanted from
multi-file construction, and a good deal more detail on what happens
when you mix extern
, static
and internal and
external declarations. It isn't the sort of reading that you're likely
to do for pleasure, but it does answer the ‘what if’
questions.
4.4.2. Internal static
You are also allowed to declare internal objects as
static
. Internal variables with this attribute have some
interesting properties: they are initialized to zero when the program
starts, they retain their value between entry to and exit from the
statement containing their declaration and there is only one copy of
each one, which is shared between all recursive calls of the function
containing it.
Internal statics can be used for a number of things. One is to count the number of times that a function has been called; unlike ordinary internal variables whose value is lost after leaving their function, statics are convenient for this. Here's a function that always returns a number between 0 and 15, but remembers how often it was called.
They can help detect excessive recursion:
void r_func (void){ static int depth; depth++; if (depth > 200) { printf ("excessive recursion\n"); exit (1); } else { /* do usual thing, * not shown here. * This last action * occasionally results in another * call on r_func() */ x_func(); } depth--; }Example 4.12
Footnotes
1. Stroustrup B. (1991). The C++ Programming Language 2nd edn. Reading, MA: Addison-Wesley