7.2. How the preprocessor works
Although the preprocessor (Figure 7.1) is probably going to be implemented as an integral part of an Standard C compiler, it can equally well be though of as a separate program which transforms C source code containing preprocessor directives into source code with the directives removed.
It's important to remember that the preprocessor is not working to the same rules as the rest of C. It works on a line-by-line basis, so the end of a line means something special to it. The rest of C thinks that end-of-line is little different from a space or tab character.
The preprocessor doesn't know about the scope rules of C. Preprocessor
directives like #define
take effect as soon as they are seen
and remain in effect until the end of the file that contains them; the
program's block structure is irrelevant. This is one of the reasons why
it's a good idea to make sparing use of these directives. The less you
have in your program that doesn't obey the ‘normal’ scope rules,
the less likely you are to make mistakes. This is mainly what gives rise
to our comments about the poor level of integration between the
preprocessor and the rest of C.
The Standard gives some complicated rules for the syntax of the preprocessor, especially with respect to tokens. To understand the operation of the preprocessor you need to know a little about them. The text that is being processed is not considered to be a uniform stream of characters, but is separated into tokens then processed piecemeal.
For a full definition of the process, it is best to refer to the Standard, but an informal description follows. Each of the terms used to head the list below is used later in descriptions of the rules.
- header-name
- ‘
<
’ almost any character ‘>
’
- ‘
- preprocessing-token
- a header-name as above but only when the subject of
#include
, - or an identifier which is any C identifier or keyword,
- or a constant which is any integral or floating constant,
- or a string-literal which is a normal C string,
- or an operator which is one of the C operators,
- or one of [ ] ( ) { } * , : = ; ... # (punctuators)
- or any non-white-space character not covered by the list above.
- a header-name as above but only when the subject of
The ‘almost any character’ above means any character
except ‘>
’ or newline.