2.8. Expressions and arithmetic

Expressions in C can get rather complicated because of the number of different types and operators that can be mixed together. This section explains what happens, but can get deep at times. You may need to re-read it once or twice to make sure that you have understood all of the points.

First, a bit of terminology. Expressions in C are built from combinations of operators and operands, so for example in this expression

x = a+b*(-c)

we have the operators =, + * and -. The operands are the variables x, a, b and c. You will also have noticed that parentheses can be used for grouping sub-expressions such as the -c. Most of C's unusually rich set of operators are either binary operators, which take two operands, or unary operators, which take only one. In the example, the - was being used as a unary operator, and is performing a different task from the binary subtraction operator which uses the same - symbol. It may seem like hair-splitting to argue that they are different operators when the job that they do seems conceptually the same, or at least similar. It's worth doing though, because, as you will find later, some of the operators have both a binary and a unary form where the two meanings bear no relation to each other; a good example would be the binary multiplication operator *, which in its unary form means indirection via a pointer variable!

A peculiarity of C is that operators may appear consecutively in expressions without the need for parentheses to separate them. The previous example could have been written as

x = a+b*-c;

and still have been a valid expression. Because of the number of operators that C has, and because of the strange way that assignment works, the precedence of the operators (and their associativity) is of much greater importance to the C programmer than in most other languages. It will be discussed fully after the introduction of the important arithmetic operators.

Before that, we must investigate the type conversions that may occur.

2.8.1. Conversions

C allows types to be mixed in expressions, and permits operations that result in type conversions happening implicitly. This section describes the way that the conversions must occur. Old C programmers should read this carefully, because the rules have changed — in particular, the promotion of float to double, the promotions of short integral types and the introduction of value preserving rules are genuinely different in Standard C.

Although it isn't directly relevant at the moment, we must note that the integral and the floating types are jointly known as arithmetic types and that C also supports other types (notably pointer types). The rules that we discuss here are appropriate only in expressions that have arithmetic types throughout - additional rules come into play when expressions mix pointer types with arithmetic types and these are discussed much later.

There are various types of conversion in arithmetic expressions:

Conversions between floating (real) types were discussed in Section 2.8; what we do next is to specify how the other conversions are to be performed, then look at when they are required. You will need to learn them by heart if you ever intend to program seriously in C.

The Standard has, among some controversy, introduced what are known as value preserving rules, where a knowledge of the target computer is required to work out what the type of an expression will be. Previously, whenever an unsigned type occurred in an expression, you knew that the result had to be unsigned too. Now, the result will only be unsigned if the conversions demand it; in many cases the result will be an ordinary signed type.

The reason for the change was to reduce some of the surprises possible when you mix signed and unsigned quantities together; it isn't always obvious when this has happened and the intention is to produce the ‘more commonly required’ result.

2.8.1.1. Integral promotions

No arithmetic is done by C at a precision shorter than int, so these conversions are implied almost whenever you use one of the objects listed below in an expression. The conversion is defined as follows:

This preserves both the value and the sign of the original type. Note that whether a plain char is treated as signed or unsigned is implementation dependent.

These promotions are applied very often—they are applied as part of the usual arithmetic conversions, and to the operands of the shift, unary +, -, and ~ operators. They are also applied when the expression in question is an argument to a function but no type information has been provided as part of a function prototype, as explained in Chapter 4.

2.8.1.2. Signed and unsigned integers

A lot of conversions between different types of integers are caused by mixing the various flavours of integers in expressions. Whenever these happen, the integral promotions will already have been done. For all of them, if the new type can hold all of the values of the old type, then the value remains unchanged.

When converting from a signed integer to an unsigned integer whose length is equal to or longer than the original type, then if the signed value was nonnegative, its value is unchanged. If the value was negative, then it is converted to the signed form of the longer type and then made unsigned by conceptually adding it to one greater than the maximum that can be held in the unsigned type. In a twos complement system, this preserves the original bit-pattern for positive numbers and guarantees ‘sign-extension’ of negative numbers.

Whenever an integer is converted into a shorter unsigned type, there can be no ‘overflow’, so the result is defined to be ‘the non-negative remainder on division by the number one greater than the largest unsigned number that can be represented in the shorter type’. That simply means that in a two's complement environment the low-order bits are copied into the destination and the high-order ones discarded.

Converting an integer to a shorter signed type runs into trouble if there is not enough room to hold the value. In that case, the result is implementation defined (although most old-timers would expect that simply the low-order bit pattern is copied).

That last item could be a bit worrying if you remember the integral promotions, because you might interpret it as follows—if I assign a char to another char, then the one on the right is first promoted to one of the kinds of int; could doing the assignment result in converting (say) an int to a char and provoking the ‘implementation defined’ clause? The answer is no, because assignment is specified not to involve the integral promotions, so you are safe.

2.8.1.3. Floating and integral

Converting a floating to an integral type simply throws away any fractional part. If the integral type can't hold the value that is left, then the behaviour is undefined—this is a sort of overflow.

As has already been said, going up the scale from float to double to long double, there is no problem with conversions—each higher one in the list can hold all the values of the lower ones, so the conversion occurs with no loss of information.

Converting in the opposite direction, if the value is outside the range that can be held, the behaviour is undefined. If the value is in range, but can't be held exactly, then the result is one of the two nearest values that can be held, chosen in a way that the implementation defines. This means that there will be a loss of precision.

2.8.1.4. The usual arithmetic conversions

A lot of expressions involve the use of subexpressions of mixed types together with operators such as +, * and so on. If the operands in an expression have different types, then there will have to be a conversion applied so that a common resulting type can be established; these are the conversions:

The Standard contains a strange sentence: ‘The values of floating operands and of the results of floating expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby’. This is in fact to allow the Old C treatment of floats. In Old C, float variables were automatically promoted to double, the way that the integral promotions promote char to int. So, an expression involving purely float variables may be done as if they were double, but the type of the result must appear to be float. The only effect is likely to be on performance and is not particularly important to most users.

Whether or not conversions need to be applied, and if so which ones, is discussed at the point where each operator is introduced.

In general, the type conversions and type mixing rules don't cause a lot of trouble, but there is one pitfall to watch out for. Mixing signed and unsigned quantities is fine until the signed number is negative; then its value can't be represented in an unsigned variable and something has to happen. The standard says that to convert a negative number to unsigned, the largest possible number that can be held in the unsigned plus one is added to the negative number; that is the result. Because there can be no overflow in an unsigned type, the result always has a defined value. Taking a 16-bit int for an example, the unsigned version has a range of 0–65535. Converting a signed value of -7 to this type involves adding 65536, resulting in 65529. What is happening is that the Standard is enshrining previous practice, where the bit pattern in the signed number is simply assigned to the unsigned number; the description in the standard is exactly what would happen if you did perform the bit pattern assignment on a two's complement computer. The one's complement implementations are going to have to do some real work to get the same result.

Putting it plainly, a small magnitude negative number will result in a large positive number when converted to unsigned. If you don't like it, suggest a better solution—it is plainly a mistake to try to assign a negative number to an unsigned variable, so it's your own fault.

Well, it's easy to say ‘don't do it’, but it can happen by accident and the results can be very surprising. Look at this example.

#include <stdio.h>
#include <stdlib.h>
main(){
      int i;
      unsigned int stop_val;

      stop_val = 0;
      i = -10;

      while(i <= stop_val){
              printf("%d\n", i);
              i = i + 1;
      }
      exit(EXIT_SUCCESS);
}
Example 2.7

You might expect that to print out the list of values from -10 to 0, but it won't. The problem is in the comparison. The variable i, with a value of -10, is being compared against an unsigned 0. By the rules of arithmetic (check them) we must convert both types to unsigned int first, then make the comparison. The -10 becomes at least 65526 (see <limits.h>) when it's converted, and is plainly somewhat larger than 0, so the loop is never executed. The moral is to steer clear of unsigned numbers unless you really have to use them, and to be perpetually on guard when they are mixed with signed numbers.

2.8.1.5. Wide characters

The Standard, as we've already said, now makes allowances for extended character sets. You can either use the shift-in shift-out encoding method which allows the multibyte charactes to be stored in ordinary C strings (which are really arrays of chars, as we explore later), or you can use a representation that uses more than one byte of storage per character for every character. The use of shift sequences only works if you process the characters in strict order; it is next to useless if you want to create an array of characters and access them in non-sequential order, since the actual index of each char in the array and the logical index of each of the encoded characters are not easily determined. Here's the illustration we used before, annotated with the actual and the logical array indexes:

0 1 2  3   4 5 6  7   8 9 (actual array index)
a b c <SI> a b g <SO> x y
0 1 2      3 4 5      6 7 (logical index)

We're still in trouble even if we do manage to use the index of 5 to access the ‘correct’ array entry, since the value retrieved is indistinguishable from the value that encodes the letter ‘g’ anyhow. Clearly, a better approach for this sort of thing is to come up with a distinct value for all of the characters in the character set we are using, which may involve more bits than will fit into a char, and to be able to store each one as a separate item without the use of shifts or other position-dependent techniques. That is what the wchar_t type is for.

Although it is always a synonym for one of the other integral types, wchar_t (whose definition is found in <stddef.h>) is defined to be the implementation-dependent type that should be used to hold extended characters when you need an array of them. The Standard makes the following guarantees about the values in a wide character:

There is further support for this method of encoding characters. Strings, which we have already seen, are implemented as arrays of char, even though they look like this:

"a string"

To get strings whose type is wchar_t, simply prefix a string with the letter L. For example:

L"a string"

In the two examples, it is very important to understand the differences. Strings are implemented as arrays and although it might look odd, it is entirely permissible to use array indexing on them:

"a string"[4]
L"a string"[4]

are both valid expressions. The first results in an expression whose type is char and whose value is the internal representation of the letter ‘r’ (remember arrays index from zero, not one). The second has the type wchar_t and also has the value of the internal representation of the letter ‘r’.

It gets more interesting if we are using extended characters. If we use the notation <a>, <b>, and so on to indicate ‘additional’ characters beyond the normal character set which are encoded using some form of shift technique, then these examples show the problems.

"abc<a><b>"[3]
L"abc<a><b>"[3]

The second one is easiest: it has a type of wchar_t and the appropriate internal encoding for whatever <a> is supposed to be—say the Greek letter alpha. The first one is unpredictable. Its type is unquestionably char, but its value is probably the value of the ‘shift-in’ marker.

As with strings, there are also wide character constants.

'a'

has type char and the value of the encoding for the letter ‘a’.

L'a'

is a constant of type wchar_t. If you use a multibyte character in the first one, then you have the same sort of thing as if you had written

'xy'

—multiple characters in a character constant (actually, this is valid but means something funny). A single multibyte character in the second example will simply be converted into the appropriate wchar_t value.

If you don't understand all the wide character stuff, then all we can say is that we've done our best to explain it. Come back and read it again later, when it might suddenly click. In practice it does manage to address the support of extended character sets in C and once you're used to it, it makes a lot of sense.

Exercise 2.15. Assuming that chars, ints and longs are respectively 8, 16 and 32 bits long, and that char defaults to unsigned char on a given system, what is the resulting type of expressions involving the following combinations of variables, after the usual arithmetic conversions have been applied?

  1. Simply signed char.
  2. Simply unsigned char.
  3. int, unsigned int.
  4. unsigned int, long.
  5. int, unsigned long.
  6. char, long.
  7. char, float.
  8. float, float.
  9. float, long double.

2.8.1.6. Casts

From time to time you will find that an expression turns out not to have the type that you wanted it to have and you would like to force it to have a different type. That is what casts are for. By putting a type name in parentheses, for example

(int)

you create a unary operator known as a cast. A cast turns the value of the expression on its right into the indicated type. If, for example, you were dividing two integers a/b then the expression would use integer division and discard any remainder. To force the fractional part to be retained, you could either use some intermediate float variables, or a cast. This example does it both ways.

#include <stdio.h>
#include <stdlib.h>

/*
* Illustrates casts.
* For each of the numbers between 2 and 20,
* print the percentage difference between it and the one
* before
*/
main(){
      int curr_val;
      float temp, pcnt_diff;

      curr_val = 2;
      while(curr_val <= 20){
              /*
               * % difference is
               * 1/(curr_val)*100
               */
              temp = curr_val;
              pcnt_diff = 100/temp;
              printf("Percent difference at %d is %f\n",
                      curr_val, pcnt_diff);
              /*
               * Or, using a cast:
               */
              pcnt_diff = 100/(float)curr_val;
              printf("Percent difference at %d is %f\n",
                      curr_val, pcnt_diff);
              curr_val = curr_val + 1;
      }
      exit(EXIT_SUCCESS);
}
Example 2.8

The easiest way to remember how to write a cast is to write down exactly what you would use to declare a variable of the type that you want. Put parentheses around the entire declaration, then delete the variable name; that gives you the cast. Table 2.6 shows a few simple examples—some of the types shown will be new to you, but it's the complicated ones that illustrate best how casts are written. Ignore the ones that you don't understand yet, because you will be able to use the table as a reference later.

Declaration Cast Type
int x; (int) int
float f; (float) float
char x[30]; (char [30]) array of char
int *ip; (int *) pointer to int
int (*f)(); (int (*)()) pointer to function returning int
Table 2.6. Casts

2.8.2. Operators

2.8.2.1. The multiplicative operators

Or, put another way, multiplication *, division / and the remainder operator %. Multiplication and division do what is expected of them for both real and integral types, with integral division producing a truncated result. The truncation is towards zero. The remainder operator is only defined to work with integral types, because the division of real numbers supposedly doesn't produce a remainder.

If the division is not exact and neither operand is negative, the result of / is positive and rounded toward zero—to get the remainder, use %. For example,

9/2 == 4
9%2 == 1

If either operand is negative, the result of / may be the nearest integer to the true result on either side, and the sign of the result of % may be positive or negative. Both of these features are implementation defined.

It is always true that the following expression is equal to zero:

(a/b)*b + a%b - a

unless b is zero.

The usual arithmetic conversions are applied to both of the operands.

2.8.2.2. Additive operators

Addition + and subtraction - also follow the rules that you expect. The binary operators and the unary operators both have the same symbols, but rather different meanings. For example, the expressions a+b and a-b both use a binary operator (the + or - operators), and result in addition or subtraction. The unary operators with the same symbols would be written +b or -b.

The unary minus has an obvious function—it takes the negative value of its operand; what does the unary plus do? In fact the answer is almost nothing. The unary plus is a new addition to the language, which balances the presence of the unary minus, but doesn't have any effect on the value of the expression. Very few Old C users even noticed that it was missing.

The usual arithmetic conversions are applied to both of the operands of the binary forms of the operators. Only the integral promotions are performed on the operands of the unary forms of the operators.

2.8.2.3. The bitwise operators

One of the great strengths of C is the way that it allows systems programmers to do what had, before the advent of C, always been regarded as the province of the assembly code programmer. That sort of code was by definition highly non-portable. As C demonstrates, there isn't any magic about that sort of thing, and into the bargain it turns out to be surprisingly portable. What is it? It's what is often referred to as ‘bit-twiddling’—the manipulation of individual bits in integer variables. None of the bitwise operators may be used on real operands because they aren't considered to have individual or accessible bits.

There are six bitwise operators, listed in Table 2.7, which also shows the arithmetic conversions that are applied.

Operator Effect Conversions
& bitwise AND usual arithmetic conversions
| bitwise OR usual arithmetic conversions
^ Bitwise XOR usual arithmetic conversions
<< left shift integral promotions
>> right shift integral promotions
~ one's complement integral promotions
Table 2.7. Bitwise operators

Only the last, the one's complement, is a unary operator. It inverts the state of every bit in its operand and has the same effect as the unary minus on a one's complement computer. Most modern computers work with two's complement, so it isn't a waste of time having it there.

Illustrating the use of these operators is easier if we can use hexadecimal notation rather than decimal, so now is the time to see hexadecimal constants. Any number written with 0x at its beginning is interpreted as hexadecimal; both 15 and 0xf (or 0XF) mean the same thing. Try running this or, better still, try to predict what it does first and then try running it.

#include <stdio.h>
#include <stdlib.h>

main(){
      int x,y;
      x = 0; y = ~0;

      while(x != y){
              printf("%x & %x = %x\n", x, 0xff, x&0xff);
              printf("%x | %x = %x\n", x, 0x10f, x|0x10f);
              printf("%x ^ %x = %x\n", x, 0xf00f, x^0xf00f);
              printf("%x >> 2 = %x\n", x, x >> 2);
              printf("%x << 2 = %x\n", x, x << 2);
              x = (x << 1) | 1;
      }
      exit(EXIT_SUCCESS);
}
Example 2.9

The way that the loop works in that example is the first thing to study. The controlling variable is x, which is initialized to zero. Every time round the loop it is compared against y, which has been set to a word-length independent pattern of all 1s by taking the one's complement of zero. At the bottom of the loop, x is shifted left once and has 1 ORed into it, giving rise to a sequence that starts 0, 1, 11, 111, … in binary.

For each of the AND, OR, and XOR (exclusive OR) operators, x is operated on by the operator and some other interesting operand, then the result printed.

The left and right shift operators are in there too, giving a result which has the type and value of their left-hand operand shifted in the required direction a number of places specified by their right-hand operand; the type of both of the operands must be integral. Bits shifted off either end of the left operand simply disappear. Shifting by more bits than there are in a word gives an implementation dependent result.

Shifting left guarantees to shift zeros into the low-order bits.

Right shift is fussier. Your implementation is allowed to choose whether, when shifting signed operands, it performs a logical or arithmetic right shift. This means that a logical shift shifts zeros into the most significant bit positions; an arithmetic shift copies the current contents of the most significant bit back into itself. The position is clearer if an unsigned operand is right shifted, because there is no choice: it must be a logical shift. For that reason, whenever right shift is being used, you would expect to find that the thing being shifted had been declared to be unsigned, or cast to unsigned for the shift, as in the example:

int i,j;
i = (unsigned)j >> 4;

The second (right-hand) operand of a shift operator does not have to be a constant; any integral expression is legal. Importantly, the rules involving mixed types of operands do not apply to the shift operators. The result of the shift has the same type as the thing that got shifted (after the integral promotions), and depends on nothing else.

Now something different; one of those little tricks that C programmers find helps to write better programs. If for any reason you want to form a value that has 1s in all but its least significant so-many bits, which are to have some other pattern in them, you don't have to know the word length of the machine. For example, to set the low order bits of an int to 0x0f0 and all the other bits to 1, this is the way to do it:

int some_variable;
some_variable = ~0xf0f;

The one's complement of the desired low-order bit pattern has been one's complemented. That gives exactly the required result and is completely independent of word length; it is a very common sight in C code.

There isn't a lot more to say about the bit-twiddling operators, and our experience of teaching C has been that most people find them easy to learn. Let's move on.

2.8.2.4. The assignment operators

No, that isn't a mistake, ‘operators’ was meant to be plural. C has several assignment operators, even though we have only seen the plain = so far. An interesting thing about them is that they are all like the other binary operators; they take two operands and produce a result, the result being usable as part of an expression. In this statement

x = 4;

the value 4 is assigned to x. The result has the type of x and the value that was assigned. It can be used like this

a = (x = 4);

where a will now have the value 4 assigned to it, after x has been assigned to. All of the simpler assignments that we have seen until now (except for one example) have simply discarded the resulting value of the assignment, even though it is produced.

It's because assignment has a result that an expression like

a = b = c = d;

works. The value of d is assigned to c, the result of that is assigned to b and so on. It makes use of the fact that expressions involving only assignment operators are evaluated from right to left, but is otherwise like any other expression. (The rules explaining what groups right to left and vice versa are given in Table 2.9.)

If you look back to the section describing ‘conversions’, there is a description of what happens if you convert longer types to shorter types: that is what happens when the left-hand operand of an assignment is shorter than the right-hand one. No conversions are applied to the right-hand operand of the simple assignment operator.

The remaining assignment operators are the compound assignment operators. They allow a useful shorthand, where an assignment containing the same left- and right-hand sides can be compressed; for example

x = x + 1;

can be written as

x += 1;

using one of the compound assignment operators. The result is the same in each case. It is a useful thing to do when the left-hand side of the operator is a complicated expression, not just a variable; such things occur when you start to use arrays and pointers. Most experienced C programmers tend to use the form given in the second example because somehow it ‘feels better’, a sentiment that no beginner has ever been known to agree with. Table 2.8 lists the compound assignment operators; you will see them used a lot from now on.

*= /= %=
+= -=
&= |= ^=
>>= <<=
Table 2.8. Compound assignment operators

In each case, arithmetic conversions are applied as if the expression had been written out in full, for example as if a+=b had been written a=a+b.

Reiterating: the result of an assignment operator has both the value and the type of the object that was assigned to.

2.8.2.5. Increment and decrement operators

It is so common to simply add or subtract 1 in an expression that C has two special unary operators to do the job. The increment operator ++ adds 1, the decrement -- subtracts 1. They are used like this:

x++;
++x;
x--;
--x;

where the operator can come either before or after its operand. In the cases shown it doesn't matter where the operator comes, but in more complicated cases the difference has a definite meaning and must be used properly.

Here is the difference being used.

#include <stdio.h>
#include <stdlib.h>
main(){
      int a,b;
      a = b = 5;
      printf("%d\n", ++a+5);
      printf("%d\n", a);
      printf("%d\n", b++ +5);
      printf("%d\n", b);
      exit(EXIT_SUCCESS);
}
Example 2.10

The results printed were

11
6
10
6

The difference is caused by the different positions of the operators. If the inc/decrement operator appears in front of the variable, then its value is changed by one and the new value is used in the expression. If the operator comes after the variable, then the old value is used in the expression and the variable's value is changed afterwards.

C programmers never add or subtract one with statements like this

x += 1;

they invariably use one of

x++; /* or */ ++x;

as a matter of course. A warning is in order though: it is not safe to use a variable more than once in an expression if it has one of these operators attached to it. There is no guarantee of when, within an expression, the affected variable will actually change value. The compiler might choose to ‘save up’ all of the changes and apply them at once, so an expression like this

y = x++ + --x;

does not guarantee to assign twice the original value of x to y. It might be evaluated as if it expanded to this instead:

y = x + (x-1);

because the compiler notices that the overall effect on the value of x is zero.

The arithmetic is done exactly as if the full addition expression had been used, for example x=x+1, and the usual arithmetic conversions apply.

Exercise 2.16. Given the following variable definitions

int i1, i2;
float f1, f2;
  1. How would you find the remainder when i1 is divided by i2?
  2. How would you find the remainder when i1 is divided by the value of f1, treating f1 as an integer?
  3. What can you predict about the sign of the remainders calculated in the previous two questions?
  4. What meanings can the - operator have?
  5. How would you turn off all but the low-order four bits in i1?
  6. How would you turn on all the low-order four bits in i1?
  7. How would you turn off only the low-order four bits in i1?
  8. How would you put into i1 the low-order 8 bits in i2, but swapping the significance of the lowest four with the next
  9. What is wrong with the following expression?
    f2 = ++f1 + ++f1;

2.8.3. Precedence and grouping

After looking at the operators we have to consider the way that they work together. For things like addition it may not seem important; it hardly matters whether

a + b + c

is done as

(a + b) + c

or

a + (b + c)

does it? Well, yes in fact it does. If a+b would overflow and c held a value very close to -b, then the second grouping might give the correct answer where the first would cause undefined behaviour. The problem is much more obvious with integer division:

a/b/c

gives very different results when grouped as

a/(b/c)

or

(a/b)/c

If you don't believe that, try it with a=10, b=2, c=3. The first gives 10/(2/3); 2/3 in integer division gives 0, so we get 10/0 which immediately overflows. The second grouping gives (10/2), obviously 5, which divided by 3 gives 1.

The grouping of operators like that is known as associativity. The other question is one of precedence, where some operators have a higher priority than others and force evaluation of sub-expressions involving them to be performed before those with lower precedence operators. This is almost universal practice in high-level languages, so we ‘know’ that

a + b * c + d

groups as

a + (b * c) + d

indicating that multiplication has higher precedence than addition.

The large set of operators in C gives rise to 15 levels of precedence! Only very boring people bother to remember them all. The complete list is given in Table 2.9, which indicates both precedence and associativity. Not all of the operators have been mentioned yet. Beware of the use of the same symbol for both unary and binary operators: the table indicates which are which.

Operator Direction Notes
() [] -> . left to right 1
! ~ ++ -- - + (cast) * & sizeof right to left all unary
* / % left to right binary
+ - left to right binary
<< >> left to right binary
< <= > >= left to right binary
== != left to right binary
& left to right binary
^ left to right binary
| left to right binary
&& left to right binary
|| left to right binary
?: right to left 2
= += and all combined assignment right to left binary
, left to right binary
1. Parentheses are for expression grouping, not function call.
2. This is unusual. See Section 3.4.1.
Table 2.9. Operator precedence and associativity

The question is, what can you do with that information, now that it's there? Obviously it's important to be able to work out both how to write expressions that evaluate in the proper order, and also how to read other people's. The technique is this: first, identify the unary operators and the operands that they refer to. This isn't such a difficult task but it takes some practice, especially when you discover that operators such as unary * can be applied an arbitrary number of times to their operands; this expression

a*****b

means a multiplied by something, where the something is an expression involving b and several unary * operators.

It's not too difficult to work out which are the unary operators; here are the rules.

  1. ++ and - are always unary operators.
  2. The operator immediately to the right of an operand is a binary operator unless (1) applies, when the operator to its right is binary.
  3. All operators to the left of an operand are unary unless (2) applies.

Because the unary operators have very high precedence, you can work out what they do before worrying about the other operators. One thing to watch out for is the way that ++ and -- can be before or after their operands; the expression

a + -b++ + c

has two unary operators applied to b. The unary operators all associate right to left, so although the - comes first when you read the expression, it really parenthesizes (for clarity) like this:

a + -(b++) + c

The case is a little clearer if the prefix, rather than the postfix, form of the increment/decrement operators is being used. Again the order is right to left, but at least the operators come all in a row.

After sorting out what to do with the unary operators, it's easy to read the expression from left to right. Every time you see a binary operator, remember it. Look to the right: if the next binary operator is of a lower precedence, then the operator you just remembered is part of a subexpression to evaluate before anything else is seen. If the next operator is of the same precedence, keep repeating the procedure as long as equal precedence operators are seen. When you eventually find a lower precedence operator, evaluate the subexpression on the left according to the associativity rules. If a higher precedence operator is found on the right, forget the previous stuff: the operand to the left of the higher precedence operator is part of a subexpression separate from anything on the left so far. It belongs to the new operator instead.

If that lot isn't clear don't worry. A lot of C programmers have trouble with this area and eventually learn to parenthesize these expressions ‘by eye’, without ever using formal rules.

What does matter is what happens when you have fully parenthesized these expressions. Remember the ‘usual arithmetic conversions’? They explained how you could predict the type of an expression from the operands involved. Now, even if you mix all sorts of types in a complicated expression, the types of the subexpressions are determined only from the the types of the operands in the subexpression. Look at this.

#include <stdio.h>
#include <stdlib.h>

main(){
      int i,j;
      float f;

      i = 5; j = 2;
      f = 3.0;

      f = f + j / i;
      printf("value of f is %f\n", f);
      exit(EXIT_SUCCESS);
}
Example 2.11

The value printed is 3.0000, not 5.0000—which might surprise some, who thought that because a float was involved the whole statement involving the division would be done in that real type.

Of course, the division operator had only int types on either side, so the arithmetic was done as integer division and resulted in zero. The addition had a float and an int on either side, so the conversions meant that the int was converted to float for the arithmetic, and that was the correct type for the assignment, so there were no further conversions.

The previous section on casts showed one way of changing the type of an expression from its natural one to the one that you want. Be careful though:

(float)(j/i)

would still use integer division, then convert the result to float. To keep the remainder, you should use

(float)j/i

which would force real division to be used.

2.8.4. Parentheses

C allows you to override the normal effects of precedence and associativity by the use of parentheses as the examples have illustrated. In Old C, the parentheses had no further meaning, and in particular did not guarantee anything about the order of evaluation in expressions like these:

int a, b, c;
a+b+c;
(a+b)+c;
a+(b+c);

You used to need to use explicit temporary variables to get a particular order of evaluation—something that matters if you know that there are risks of overflow in a particular expression, but by forcing the evaluation to be in a certain order you can avoid it.

Standard C says that evaluation must be done in the order indicated by the precedence and grouping of the expression, unless the compiler can tell that the result will not be affected by any regrouping it might do for optimization reasons.

So, the expression a = 10+a+b+5; cannot be rewritten by the compiler as a = 15+a+b; unless it can be guaranteed that the resulting value of a will be the same for all combinations of initial values of a and b. That would be true if the variables were both unsigned integral types, or if they were signed integral types but in that particular implementation overflow did not cause a run-time exception and overflow was reversible.

2.8.5. Side Effects

To repeat and expand the warning given for the increment operators: it is unsafe to use the same variable more than once in an expression if evaluating the expression changes the variable and the new value could affect the result of the expression. This is because the change(s) may be ‘saved up’ and only applied at the end of the statement. So f = f+1; is safe even though f appears twice in a value-changing expression, f++; is also safe, but f = f++; is unsafe.

The problem can be caused by using an assignment, use of the increment or decrement operators, or by calling a function that changes the value of an external variable that is also used in the expression. These are generally known as ‘side effects’. C makes almost no promise that side effects will occur in a predictable order within a single expression. (The discussion of ‘sequence points’ in Chapter 8 will be of interest if you care about this.)