C arrays are not pointers

Edit: edited again on June, 2nd to use the convenient code layout of wordpress.

On Feb 23, on c.l.c, the following post has been made, regarding array and pointers:

why doesn’t calling f1 (see below) cause a warning while calling f2 does?

void f1(int a[4]){}
void f2(int (*b)[4]){}

int c[4+1];
int (*d)[4+1];

f1(c);
f2(d);

This one of the most asked question in the newsgroups. So frequently asked that the question has its own entry in the c.l.c faq, under section 6.2:

6.2: But I heard that char a[] was identical to char *a.

A: Not at all. (What you heard has to do with formal parameters to functions; see question 6.4.) Arrays are not pointers. The array declaration char a[6] requests that space for six characters be set aside, to be known by the name “a”. That is, there is a location named “a” at which six characters can sit. The pointer declaration char *p, on the other hand, requests a place which holds a pointer, to be known by the name “p”. This pointer can point almost anywhere: to any char, or to any contiguous array of chars, or nowhere (see also questions 5.1 and 1.30).

As usual, a picture is worth a thousand words. The declarations

char a[] = "hello";
char *p = "world";

would initialize data structures which could be represented like this:

      +---+---+---+---+---+----+
a:    | h | e | l | l | o |    |
      +---+---+---+---+---+----+
      +-----+        +---+---+---+---+---+----+
p:    |  *======>    | w | o | r | l | d |    |
      +-----+        +---+---+---+---+---+----+

As far as the compiler is concerned, a is an address in memory, where are stored the chars ‘h’, ‘e’, ‘l’, ‘l’, ‘o’, followed by the char with numerical value zero.

And for the compiler, p is a pointer i.e. an object that can contains an address (e.g. p can point to a), a variable to store addresses. In the previous example, p contains the address of the string “world” stored in memory.

p can change from, say value 0x2b765aad to value 0x0 or 0xdeadbeef etc.

a will never change. It is a raw address in memory, it is not a placeholder.

It is important to realize that a reference like x[3] generates different code depending on whether x is an array or a pointer.
Given the declarations above, when the compiler sees the expression a[3], it emits code to start at the location “a”, move three past it, and fetch the character there. When it sees the expression p[3], it emits code to start at the location “p”, fetch the pointer value there, add three to the pointer, and finally fetch the character pointed to. In other words, a[3] is three places past (the start of) the object *named* a, while p[3] is three places past the object *pointed to* by p. In the example above, both a[3] and p[3] happen to be the character ‘l’, but the compiler gets there differently. (The essential difference is that the values of an array like a and a pointer like p are computed differently *whenever* they appear in expressions, whether or not they are being subscripted, as explained further in the next question.)

References: K&R2 Sec. 5.5 p. 104; CT&P Sec. 4.5 pp. 64-5.

So this is pretty straight forward, an array is an address in memory and a pointer contains a memory address. When an array is passed to a function it “decays” to a pointer, i.e. a pointer holding its memory address is used.

When decay occurs, array a decays to a pointer, the compiler creates a temporary pointer, like ‘p’, and store the address of ‘a’ into it. Hence the confusion between array and pointers.

But this post is slightly different: the poster knows about decay, and he thinks that in his example, because of array decay, f1 and f2 should be equivalent.

f1 expect an int[4] to be passed as a parameter, which is the same as an int* … Well, no it’s not.

The following explanation was given by Ben Bacarisse:

One could speculate on a C+ where the size was part of the type in a 1D array, but historically, that is not what happened. When you write:

void f(int p[4])
{
    p[1] = 42;
}

you get this kind of picture:

   +----+     +----+----+----+----+
p: | ---+---> |    | 42 |    |    |
   +----+     +----+----+----+----+

because the p is just a pointer variable. To find where to put the 42, the compiler only needs to know the pointer and the size of the array elements. The “4” would be useful information, but you just have to accept that that is not the way C chose to go. OK, so far, I think you know all this.

In the case of

void f(int p[][4])</p>
{
    p[1][1] = 42;
}

This is the same as f(int (*p)[4]) — the “first” array part becomes a pointer and we could draw it like this:

   +----+     +----+----+----+----+----+----+----+----+----+----+----+--
p: | ---+---> |    |    |    |    |    | 42 |    |    |    |    |    |
   +----+     +----+----+----+----+----+----+----+----+----+----+----+--
               \______ p[0] _____/ \______ p[1] _____/ \______ p[2] ____

In order to know where to put the 42, the compiler must, again, know the size of the array elements. The fact that they are arrays of 4 ints is now crucial. If you passed in a pointer to arrays of 5 ints (int (*p)[5]) then the whole thing would go wrong. (Of course, this being C, if you want it to go wrong for some reason, pass that pointer with a cast and take you chances!)

So far this is the best explanation I found for what’s going on a compiler perspective.

Enjoy, and share.

Advertisements

About this entry