4.4. Linkage

Although the simple examples have carefully avoided the topic, we now have to look into the effects of scope and linkage, terms used to describe the accessibility of various objects in a C program. Why bother? It's because realistic programs are built up out of multiple files and of course libraries. It is clearly crucial that functions in one file should be able to refer to functions (or other objects) in other files and libraries; naturally there are a number of concepts and rules that apply to this mechanism.

If you are relatively new to C, there are more important subjects to cover first. Come back to this stuff later instead.

There are essentially two types of object in C: the internal and external objects. The distinction between external and internal is to do with functions: anything declared outside a function is external, anything inside one, including its formal parameters, is internal. Since no function can be defined inside another, functions themselves are always external. At the outermost level, a C program is a collection of external objects.

Only external objects participate in this cross-file and library communication.

The term used by the Standard to describe the accessibility of objects from one file to another, or even within the same file, is linkage. There are three types of linkage: external linkage, internal linkage and no linkage. Anything internal to a function—its arguments, variables and so on—always has no linkage and so can only be accessed from inside the function itself. (The way around this is to declare something inside a function but prefix it with the keyword extern which says ‘it isn't really internal’, but we needn't worry about that just yet.)

Objects that have external linkage are all considered to be located at the outermost level of the program; this is the default linkage for functions and anything declared outside of a function. All instances of a particular name with external linkage refer to the same object in the program. If two or more declarations of the same name have external linkage but incompatible types, then you've done something very silly and have undefined behaviour. The most obvious example of external linkage is the printf function, whose declaration in <stdio.h> is

int printf(const char *, ...);

From that we can tell that it's a function returning int and with a particular prototype—so we know everything about its type. We also know that it has external linkage, because that is the default for every external object. As a result, everywhere that the name printf is used with external linkage, we are referring to this function.

Quite often, you want to be able to declare functions and other objects within a single file in a way that allows them to reference each other but not to be accessible from outside that file. This is often necessary in the modules that support library functions, where the additional framework that makes those functions work is not interesting to the user and would be a positive nuisance if the names of those things became visible outside the module. You do it through the use of internal linkage.

Names with internal linkage only refer to the same object within a single source file. You do this by prefixing their declarations with the keyword static, which changes the linkage of external objects from external linkage to internal linkage. It is also possible to declare internal objects to be static, but that has an entirely different meaning which we can defer for the moment.

It's confusing that the types of linkage and the types of object are both described by the terms ‘internal’ and ‘external’; this is to some extent historical. C archaeologists may know that at one time the two were equivalent and one implied the other—for us it's unfortunate that the terms remain but the meanings have diverged. To summarize:

Type of linkage Type of object Accessibility
external external throughout the program
internal external a single file
none internal local to a single function
Table 4.1. Linkage and accessibility

Finally, before we see an example, it is important to know that all objects with external linkage must have one and only one definition, although there can be as many compatible declarations as you like. Here's the example.

/* first file */

int i; /* definition */
main () {
  void f_in_other_place (void);   /* declaration */
  i = 0
}
/* end of first file */


/* start of second file */

extern int i; /* declaration */
void f_in_other_place (void){   /* definition */
  i++;
}
/* end of second file */
Example 4.9

Although the full set of rules is a bit more complex, the basic way of working out what constitutes a definition and a declaration is not hard:

Chapter 8 revisits the definition and declaration criteria to a depth that will cause decompression sickness when you surface.

In the example it's easy to see that each file is able to access the objects defined in the other by using their names. Just from that example alone you should be able to work out how to construct programs with multiple files and functions and variables declared or defined as appropriate in each of them.

Here's another example, using static to restrict the accessibility of functions and other things.

/* example library module */
/* only 'callable' is visible outside */
static buf [100];
static length;
static void fillup(void);

int
callable (){
      if (length ==0){
              fillup ();
      }
      return (buf [length--]);
}

static void
fillup (void){
      while (length <100){
              buf [length++] = 0;
      }
}
Example 4.10

A user of this module can safely re-use the names declared here, length, buf, and fillup, without any danger of surprising effects. Only the name callable is accessible outside this module.

A very useful thing to know is that any external object that has no other initalizer (and except for functions we haven't seen any initializers yet) is always set to the value of zero before the program starts. This is widely used and relied on—the previous example relies on it for the initial value of length.

4.4.1. Effect of scope

There's one additional complicating factor beyond simply linkage. Linkage allows you to couple names together on a per-program or a per-file basis, but scope determines the visibility of the names. Fortunately, the rules of scope are completely independent of anything to do with linkage, so you don't have to remember funny combinations of both.

What introduces the complexity is the dreaded extern keyword. The nice regular block structure gets blown to pieces with this, which although at a first glance is simple and obvious, does some very nasty things to the fabric of the language. We'll leave its nasty problems to Chapter 8, since they only rear up if you deliberately start to do perverse things with it and then say ‘what does this mean’? We've already seen it used to ensure that the declaration of something at the outer block level (the external level) of the program is a declaration and not a definition (but beware: you can still override the extern by, for example, providing an initializer for the object).

Unless you prefix it with extern, the declaration of any data object (not a function) at the outer level is also a definition. Look back to Example 4.9 to see this in use.

All function declarations implicitly have the extern stuck in front of them, whether or not you put it there too. These two ways of declaring some_function are equivalent and are always declarations:

void some_function(void);

extern void some_function(void);

The thing that mysteriously turns those declarations into definitions is that when you also provide the body of the function, that is effectively the initializer for the function, so the comment about initializers comes into effect and the declaration becomes a definition. So far, no problem.

Now, what is going on here?

void some_function(void){
      int i_var;
      extern float e_f_var;
}

void another_func(void){
      int i;
      i = e_f_var;    /* scope problem */
}

What happened was that although the declaration of e_f_var declares that something called e_f_var is of type float and is accessible throughout the entire program, the scope of the name disappears at the end of the function that contains it. That's why it is meaningless inside another_func—the name of e_f_var is out of scope, just as much as i_var is.

So what use is that? It's sometimes handy if you only want to make use of an external object from within a single function. If you followed the usual practice and declared it at the head of the particular source file, then there is no easy way for the reader of that file to see which functions actually use it. By restricting the access and the scope of the name to the place where is needed, you do communicate to a later reader of the program that this is a very restricted use of the name and that there is no intention to make widespread use of it throughout the file. Of course, any half-way decent cross-reference listing would communicate that anyway, so the argument is a bit hard to maintain.

Chapter 8 is the place to find out more. There's a set of guidelines for how to get the results that are most often wanted from multi-file construction, and a good deal more detail on what happens when you mix extern, static and internal and external declarations. It isn't the sort of reading that you're likely to do for pleasure, but it does answer the ‘what if’ questions.

4.4.2. Internal static

You are also allowed to declare internal objects as static. Internal variables with this attribute have some interesting properties: they are initialized to zero when the program starts, they retain their value between entry to and exit from the statement containing their declaration and there is only one copy of each one, which is shared between all recursive calls of the function containing it.

Internal statics can be used for a number of things. One is to count the number of times that a function has been called; unlike ordinary internal variables whose value is lost after leaving their function, statics are convenient for this. Here's a function that always returns a number between 0 and 15, but remembers how often it was called.

int
small_val (void) {
      static unsigned count;
      count ++;
      return (count % 16);
}
Example 4.11

They can help detect excessive recursion:

void
r_func (void){
      static int depth;
      depth++;
      if (depth > 200) {
              printf ("excessive recursion\n");
              exit (1);
      }
      else {
              /* do usual thing,
               * not shown here.
               * This last action
               * occasionally results in another
               * call on r_func()
               */
              x_func();
      }
      depth--;
}
Example 4.12

Footnotes

1. Stroustrup B. (1991). The C++ Programming Language 2nd edn. Reading, MA: Addison-Wesley