2.6. Real types

It's easier to deal with the real types first because there's less to say about them and they don't get as complicated as the integer types. The Standard breaks new ground by laying down some basic guarantees on the precision and range of the real numbers; these are found in the header file float.h which is discussed in detail in Chapter 9. For some users this is extremely important information, but it is of a highly technical nature and is likely only to be fully understood by numerical analysts.

The varieties of real numbers are these:

float
double
long double

Each of the types gives access to a particular way of representing real numbers in the target computer. If it only has one way of doing things, they might all turn out to be the same; if it has more than three, then C has no way of specifying the extra ones. The type float is intended to be the small, fast representation corresponding to what FORTRAN would call REAL. You would use double for extra precision, and long double for even more.

The main points of interest are that in the increasing ‘lengths’ of float, double and long double, each type must give at least the same range and precision as the previous type. For example, taking the value in a double and putting it into a long double must result in the same value.

There is no requirement for the three types of ‘real’ variables to differ in their properties, so if a machine only has one type of real arithmetic, all of C's three types could be implemented in the same way. None the less, the three types would be considered to be different from the point of view of type checking; it would be ‘as if’ they really were different. That helps when you move the program to a system where the three types really are different—there won't suddenly be a set of warnings coming out of your compiler about type mismatches that you didn't get on the first system.

In contrast to more ‘strongly typed’ languages, C permits expressions to mix all of the scalar types: the various flavours of integers, the real numbers and also the pointer types. When an expression contains a mixture of arithmetic (integer and real) types there are implicit conversions invoked which can be used to work out what the overall type of the result will be. These rules are quite important and are known as the usual arithmetic conversions; it will be worth committing them to memory later. The full set of rules is described in Section 2.8; for the moment, we will investigate only the ones that involve mixing float, double and long double to see if they make sense.

The only time that the conversions are needed is when two different types are mixed in an expression, as in the example below:

int f(void){
        float f_var;
        double d_var;
        long double l_d_var;

        f_var = 1; d_var = 1; l_d_var = 1;
        d_var = d_var + f_var;
        l_d_var = d_var + f_var;
        return(l_d_var);
}

Example 2.1

There are a lot of forced conversions in that example. Getting the easiest of them out of the way first, let's look at the assignments of the constant value 1 to each of the variables. As the section on constants will point out, that 1 has type int, i.e. it is an integer, not a real constant. The assignment converts the integer value to the appropriate real type, which is easy to cope with.

The interesting conversions come next. The first of them is on the line

d_var = d_var + f_var;

What is the type of the expression involving the + operator? The answer is easy when you know the rules. Whenever two different real types are involved in an expression, the lower precision type is first implicitly converted to the higher precision type and then the arithmetic is performed at that precision. The example involves both a double and a float, so the value of f_var is converted to type double and is then added to the value of the double d_var. The result of the expression is naturally of type double too, so it is clearly of the correct type to assign to d_var.

The second of the additions is a little bit more complicated, but still perfectly O.K. Again, the value of f_var is converted and the arithmetic performed with the precision of double, forming the sum of the two variables. Now there's a problem. The result (the sum) is double, but the assignment is to a long double. Once again the obvious procedure is to convert the lower precision value to the higher one, which is done, and then make the assignment.

So we've taken the easy ones. The difficult thing to see is what to do when forced to assign a higher precision result to a lower precision destination. In those cases it may be necessary to lose precision, in a way specified by the implementation. Basically, the implementation must specify whether and in what way it rounds or truncates. Even worse, the destination may be unable to hold the value at all. The Standard says that in these cases loss of precision may occur; if the destination is unable to hold the necessary value—say by attempting to add the largest representable number to itself—then the behaviour is undefined, your program is faulty and you can make no predictions whatsoever about any subsequent behaviour.

It is no mistake to re-emphasize that last statement. What the Standard means by undefined behaviour is exactly what it says. Once a program's behaviour has entered the undefined region, absolutely anything can happen. The program might be stopped by the operating system with an appropriate message, or just as likely nothing observable would happen and the program be allowed to continue with an erroneous value stored in the variable in question. It is your responsibility to prevent your program from exhibiting undefined behaviour. Beware!

Summary of real arithmatic

Arithmetic with any two real types is done at the highest precision of the members involved.
Assignment involves loss of precision if the receiving type has a lower precision than the value being assigned to it.
Further conversions are often implied when expressions mix other types, but they have not been described yet.

2.6.1. Printing real numbers

The usual output function, printf, can be used to format real numbers and print them. There are a number of ways to format these numbers, but we'll stick to just one for now. Table 2.4 below shows the appropriate format description for each of the real types.

Type	Format
`float`	`%f`
`double`	`%f`
`long double`	`%Lf`

Table 2.4. Format codes for real numbers

Here's an example to try:

#include <stdio.h>
#include <stdlib.h>

#define BOILING 212     /* degrees Fahrenheit */

main(){
      float f_var; double d_var; long double l_d_var;
      int i;

      i = 0;
      printf("Fahrenheit to Centigrade\n");
      while(i <= BOILING){
              l_d_var = 5*(i-32);
              l_d_var = l_d_var/9;
              d_var = l_d_var;
              f_var = l_d_var;
              printf("%d %f %f %Lf\n", i,
                      f_var, d_var, l_d_var);
              i = i+1;
      }
      exit(EXIT_SUCCESS);
}

Example 2.2

Try that example on your own computer to see what results you get.

Exercise 2.10. Which type of variable can hold the largest range of values?

Exercise 2.11. Which type of variable can store values to the greatest precision?

Exercise 2.12. Are there any problems possible when assigning a float or double to a double or long double?

Exercise 2.13. What could go wrong when assigning, say, a long double to a double?

Exercise 2.14. What predictions can you make about a program showing ‘undefined behaviour’?

Previous section | Chapter contents | Next section