9.10. Input and output
9.10.1. Introduction
One of the reasons that has prevented many programming languages from becoming widely used for ‘real programming’ is their poor support for I/O, a subject which has never seemed to excite language designers. C has avoided this problem, oddly enough, by having no I/O at all! The C language approach has always been to do I/O using library functions, which ensures that system designers can provide tailored I/O instead of being forced to change the language itself.
As C has evolved, a library package known as the ‘Standard I/O Library’ or stdio, has evolved with it and has proved to be both flexible and portable. This package has now become part of the Standard.
The old stdio package relied heavily on the UNIX model of file access, in particular the assumption that there is no distinction between unstructured binary files and files containing readable text. Many operating systems do maintain a distinction between the two, and to ensure that C programs can be written portably to run on both types of file model, the stdio package has been modified. There are changes in this area which affect many existing programs, although strenuous efforts were taken to limit the amount of damage.
Old C programs should still be able work unmodified in a UNIX environment.
9.10.2. The I/O model
The I/O model does not distinguish between the types of physical devices supporting the I/O. Each source or sink of data (file) is treated in the same way, and is viewed as a stream of bytes. Since the smallest object that can be represented in C is the character, access to a file is permitted at any character boundary. Any number of characters can be read or written from a movable point, known as the file position indicator. The characters will be read, or written, in sequence from this point, and the position indicator moved accordingly. The position indicator is initially set to the beginning of a file when it is opened, but can also be moved by means of positioning requests. (Where random access is not possible, the file position indicator is ignored.) Opening a file in append mode has an implementation defined effect on the stream's file position indicator.
The overall effect is to provide sequential reads or writes unless the stream was opened in append mode, or the file position indicator is explicitly moved.
There are two types of file, text files and binary files, which, within a program, are manipulated as text streams and binary streams once they have been opened for I/O. The stdio package does not permit operations on the contents of files ‘directly’, but only by viewing them as streams.
9.10.2.1. Text streams
The Standard specifies what is meant by the term text stream,
which essentially considers a file to contain lines of text. A line is
a sequence of zero or more characters terminated by a newline character.
It is quite possible that the actual representation of lines in the
external environment is different from this and there may be
transformations of the data stream on the way in and out of the program;
a common requirement is to translate the ‘\n
’
line-terminator into the sequence ‘\r\n
’ on output, and
do the reverse on input. Other translations may also be necessary.
Data read in from a text stream is guaranteed to compare equal to the data that was earlier written out to the file if the data consists only of complete lines of printable characters and the control characters horizontal-tab and newline, no newline character is immediately preceded by space characters and the last character is a newline.
It is guaranteed that, if the last character written to a text file is a newline, it will read back as the same.
It is implementation defined whether the last line written to a text file must terminate with a newline character; this is because on some implementations text files and binary files are the same.
Some implementations may strip the leading space from lines consisting only of a space followed by a newline, or strip trailing spaces at the end of a line!
An implementation must support text files with lines containing at least 254 characters, including the terminating newline.
Opening a text stream in update mode may result in a binary stream in some implementations.
Writing on a text stream may cause some implementations to truncate the file at that point—any data beyond the last byte of the current write being discarded.
9.10.2.2. Binary streams
A binary stream is a sequence of characters that can be used to record
a program's internal data, such as the contents of structures or arrays in
binary form. Data read in from a binary stream will always compare equal
to data written out earlier to the same stream, under the same
implementation. In some circumstances, an implementation-defined number
of NUL
characters may be appended to a binary stream.
The contents of binary files are exceedingly machine specific, and not, in general, portable.
9.10.2.3. Other streams
Other stream types may exist, but are implementation defined.
9.10.3. The stdio.h header file
To provide support for streams of the various kinds, a number of
functions and macros exist. The <stdio.h>
header file
contains the various declarations necessary for the functions, together
with the following macro and type declarations:
FILE
- The type of an object used to contain stream control information. Users of stdio never need to know the contents of these objects, but simply manipulate pointers to them. It is not safe to copy these objects within the program; sometimes their addresses may be ‘magic’.
fpos_t
- A type of object that can be used to record unique values of a stream's file position indicator.
_IOFBF _IOLBF _IONBF
- Values used to control the buffering of a stream in
conjunction with the
setvbuf
function. BUFSIZ
- The size of the buffer used by the
setbuf
function. An integral constant expression whose value is at least 256. EOF
- A negative integral constant expression, indicating the end-of-file condition on a stream i.e. that there is no more input.
FILENAME_MAX
- The maximum length which a filename can have, if there is a limit, or otherwise the recommended size of an array intended to hold a file name.
FOPEN_MAX
- The minimum number of files that the implementation guarantees may be held open concurrently; at least eight are guaranteed. Note that three predefined streams exist and may need to be closed if a program needs to open more than five files explicitly.
L_tmpnam
- The maximum length of the string generated by
tmpnam
; an integral constant expression. SEEK_CUR SEEK_END SEEK_SET
- Integral constant expressions used to control the actions
of
fseek
. TMP_MAX
- The minimum number of unique filenames generated by
tmpnam
; an integral constant expression with a value of at least 25. stdin stdout stderr
- Predefined objects of type (
FILE *
) referring to the standard input, output and error streams respectively. These streams are automatically open when a program starts execution.
9.10.4. Opening, closing and buffering of streams
9.10.4.1. Opening
A stream is connected to a file by means of the fopen
,
freopen
or tmpfile
functions. These functions
will, if successful, return a pointer to a FILE
object.
Three streams are available without any special action; they are
normally all connected to the physical device associated with the
executing program: usually your terminal. They are referred to by the
names stdin
, the standard input,
stdout
, the standard output, and
stderr
, the standard error streams. Normal
keyboard input is from stdin
, normal terminal output is to
stdout
, and error messages are directed to
stderr
. The separation of error messages from normal output
messages allows the stdout stream to be connected to something other than
the terminal device, and still to have error messages appear on the screen
in front of you, rather than to be redirected to this file. These files
are only fully buffered if they do not refer to interactive devices.
As mentioned earlier, the file position indicator may or may not be movable, depending on the underlying device. It is not possible, for example, to move the file position indicator on stdin if that is connected to a terminal, as it usually is.
All non-temporary files must have a filename, which is a string. The rules for what constitutes valid filenames are implementation defined. Whether a file can be simultaneously open multiple times is also implementation defined. Opening a new file may involve creating the file. Creating an existing file causes its previous contents to be discarded.
9.10.4.2. Closing
Files are closed by explicitly calling fclose
,
exit
or by returning from main
. Any buffered
data is flushed. If a program stops for some other reason, the status of
files which it had open is undefined.
9.10.4.3. Buffering
There are three types of buffering:
- Unbuffered
- Minimum internal storage is used by stdio in an attempt to send or receive data as soon as possible.
- Line buffered
- Characters are processed on a line-by-line basis. This is commonly used in interactive environments, and internal buffers are flushed only when full or when a newline is processed.
- Fully buffered
- Internal buffers are only flushed when full.
The buffering associated with a stream can always be flushed by using
fflush
explicitly. Support for the various types of
buffering is implementation defined, and can be controlled within these
limits using setbuf
and setvbuf
.
9.10.5. Direct file manipulation
A number of functions exist to operate on files directly.
#include <stdio.h> int remove(const char *filename); int rename(const char *old, const char *new); char *tmpnam(char *s); FILE *tmpfile(void);
remove
- Causes a file to be removed. Subsequent attempts to open
the file will fail, unless it is first created again. If
the file is already open, the operation of
remove
is implementation defined. The return value is zero for success, any other value for failure. rename
-
Changes the name of the file identified by
old
tonew
. Subsequent attempts to open the original name will fail, unless another file is created with the old name. As withremove
,rename
returns zero for a successful operation, any other value indicating a failure.If a file with the new name exists prior to calling
rename
, the behaviour is implementation defined.If
rename
fails for any reason, the original file is unaffected. tmpnam
-
Generates a string that may be used as a filename and is guaranteed to be different from any existing filename. It may be called repeatedly, each time generating a new name. The constant
TMP_MAX
is used to specify how many timestmpnam
may be called before it can no longer find a unique name.TMP_MAX
will be at least 25. Iftmpnam
is called more than this number of times, its behaviour is undefined by the Standard, but many implementations offer no practical limit.If the argument s is set to
NULL
, thentmpnam
uses an internal buffer to build the name, and returns a pointer to that. Subsequent calls may alter the same internal buffer. The argument may instead point to an array of at leastL_tmpnam
characters, in which case the name will be filled into the supplied buffer. Such a filename may then be created, and used as a temporary file. Since the name is generated by the function, it is unlikely to be very useful in any other context. Temporary files of this nature are not removed, except by direct calls to the remove function. They are most often used to pass temporary data between two separate programs. tmpfile
- Creates a temporary binary file, opened for update, and
returns a pointer to the stream of that file. The file
will be removed when the stream is closed. If no file
could be opened,
tmpfile
returns a null pointer.
9.10.6. Opening named files
Named files are opened by a call to the fopen
function,
whose declaration is this:
#include <stdio.h> FILE *fopen(const char *pathname, const char *mode);
The pathname
argument is the name of the file to open, such
as that returned from tmpnam
, or some program-specific
filename.
Files can be opened in a variety of modes, such as read mode for reading data, write mode for writing data, and so on.
Note that if you only want to write data to a file, fopen
will create the file if it does not already exist, or
truncate it to zero length (losing its previous contents) if it did
exist.
The Standard list of modes is shown in Table 9.3, although implementations may permit extra modes by appending extra characters at the end of the modes.
Mode | Type of file | Read | Write | Create | Truncate |
---|---|---|---|---|---|
"r" |
text | yes | no | no | no |
"rb" |
binary | yes | no | no | no |
"r+" |
text | yes | yes | no | no |
"r+b" |
binary | yes | yes | no | no |
"rb+" |
binary | yes | yes | no | no |
"w" |
text | no | yes | yes | yes |
"wb" |
binary | no | yes | yes | yes |
"w+" |
text | yes | yes | yes | yes |
"w+b" |
binary | yes | yes | yes | yes |
"wb+" |
binary | yes | yes | yes | yes |
"a" |
text | no | yes | yes | no |
"ab" |
binary | no | yes | yes | no |
"a+" |
text | yes | yes | yes | no |
"a+b" |
binary | no | yes | yes | no |
"ab+" |
binary | no | yes | yes | no |
Beware that some implementations of binary files may pad the last record
with NULL
characters, so opening them with modes
ab
, ab+
or a+b
could position the
file pointer beyond the last data written.
If a file is opened in append mode, all writes will occur at the
end of the file, regardless of attempts to move the file position indicator
with fseek
. The initial position fo the file position
indicator will be implementation defined.
Attempts to open a file in read mode, indicated by an 'r
' as
the first character in the mode string, will fail if the file does not
already exist or can't be read.
Files opened for update (‘+
’ as the second or
third character of mode) may be both read and written, but a read may not
immediately follow a write, or a write follow a read, without an
intervening call to one (or more) of fflush
,
fseek
, fsetpos
or rewind
. The
only exception is that a write may immediately follow a read if
EOF
was read.
It may also be possible in some implementations to omit the
b
in the binary modes, using the same modes for text and
binary files.
Streams opened by fopen are fully buffered only if they are not connected to an interactive device; this ensures that prompts and responses are handled properly.
If fopen
fails to open a file, it returns a null pointer;
otherwise, it returns a pointer to the object controlling the stream.
The stdin
, stdout
and stderr
objects are not necessarily modifiable and it may not be possible to use
the value returned from fopen
for assignment to one of
them. For this reason, freopen
is provided.
9.10.7. Freopen
The freopen
function is used to take an existing stream
pointer and associate it with another named file:
#include <stdio.h> FILE *freopen(const char *pathname, const char *mode, FILE *stream);
The mode
argument is the same as for fopen
.
The stream
is closed first, and any errors from the close
are ignored. On error, NULL
is returned, otherwise the new
value for stream
is returned.
9.10.8. Closing files
An open file is closed using fclose
.
#include <stdio.h> int fclose(FILE *stream);
Any unwritten data buffered for stream
is flushed out and
any unread data is thrown away. If a buffer had been
automatically allocated for the stream, it is freed. The
file is then closed.
Zero is returned on success, EOF
if any error occurs.
9.10.9. Setbuf, setvbuf
These two functions are used to change the buffering strategy for an open stream:
#include <stdio.h> int setvbuf(FILE *stream, char *buf, int type, size_t size); void setbuf(FILE *stream, char *buf);
They must be used before the file is either read from or
written to. The type
argument defines how the
stream
will be buffered (see Table 9.4).
Value | Effect |
---|---|
_IONBF |
Do not buffer I/O |
_IOFBF |
Fully buffer I/O |
_IOLBF |
Line buffer: flush buffer when full, when newline is written or when a read is requested. |
The buf
argument can be a null pointer, in which case an
array is automatically allocated to hold the buffered data. Otherwise,
the user can provide a buffer, but should ensure that its lifetime is at
least as long as that of the stream
: a common mistake is to
use automatic storage allocated inside a compound statement; in correct
usage it is usual to obtain the storage from malloc
instead.
The size of the buffer is specified by the size
argument.
A call of setbuf
is exactly the same as a call of
setvbuf
with IOFBF
for the type
argument, and BUFSIZ
for the size
argument. If
buf
is a null pointer, the value _IONBF
is
used for type
instead.
No value is returned by setbuf
, setvbuf
returns zero on success, non-zero if invalid values are provided for
type
or size
, or the request cannot be complied
with.
9.10.10. Fflush
#include <stdio.h> int fflush(FILE *stream);
If stream
refers to a file opened for output or update, any
unwritten data is ‘written’ out. Exactly what that means is
a function of the host environment, and C cannot guarantee, for example,
that data immediately reaches the surface of a disk which might be
supporting the file. If the stream is associated with a file opened
for input or update, any preceding ungetc
operation is
forgotten.
The most recent operation on the stream must have been an output operation; if not, the behaviour is undefined.
A call of fflush
with an argument of zero flushes every
output or update stream. Care is taken to avoid those streams that
have not had an output as their last operation, thus avoiding the undefined
behaviour mentioned above.
EOF
is returned if an error occurs, otherwise zero.