Pointers in C

by Carl Burch, Hendrix College, August 2012

Pointers in C by Carl Burch is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.
Based on a work at www.cburch.com/books/cptr/.

1. Pointers
1.1. Pointer basics
1.2. The scanf() function
1.3. Arrays, revisited
2. Strings
2.1. String basics
2.2. Example: Tokenizing a string
3. Data structures
3.1. Dynamic memory
3.2. Example: Linked list

1. Pointers

Now we'll turn to a concept that is quite important to C programming, but which is unknown in Python, Java, and many other languages: the pointer.

1.1. Pointer basics

The concept of pointer is relatively unique to C: It allows you to have a variable that represents the memory address of some data. The type name for such a variable is represented by the type name for the data to which it points followed by an asterisk ('*'); for example, an int* variable will hold the memory address of an integer.

To get the memory address of a variable, you can use the ampersand ('&') operator: For example, the value of the expression “&i” is the memory address of i. Conversely, to access the memory referenced by a pointer, you can use the asterisk ('*') operator — this is called dereferencing the pointer. Consider the following example.

int i; int *p; i = 4; p = &i; *p = 5; printf("%d\n", i);

In this fragment, we have declared two variables: i, which holds an integer, and p, which holds the memory address of an integer. The computer first initializes i with the value 4 and p with the value &i. Then it executes “*p = 5;”, which says to alter the memory referenced by p (that is, i) to hold 5. Finally, we print the value of i, which is now 5.

This is a contrived example. A less contrived usage of pointers is when you want a function to change the value of a parameter. For example, say we want to write a function to swap two values. You might be tempted to write the following.

void swap(int i, int j) { /* This will not work!! */ int t; t = i; i = j; j = t; }

It won't work, though, because C passes all parameters by value: If I call swap(x, y), the values contained by x and y are copied into the i and j variables. The swap() function will swap the values contained by i and j, but this will have no effect on x and y. We can get around this by passing pointers instead.

void swap(int *ip, int *jp) { int t; t = *ip; *ip = *jp; *jp = t; }

Now we would have to call swap(&x, &y) (not swap(x, y)). The following figure illustrates how it works; an explanation is below.

Figure 4: The swap() function in action.

(a) (b) (c)

The value copied into ip will be the address of x, and the value copied into jp will be the address of y (Figure 4(a)). The line “t = *ip;” will copy the value referenced by ip (that is, x) into t. The next line will copy the value referenced by jp (that is, y) into the memory referenced by ip (that is, x) (Figure 4(b)). And the final line will copy the value of t (the original value of x) into the memory referenced by jp (that is, y) (Figure 4(c)). So the values contained by x and y will be swapped. [This is the only way to write such a function in C, where all parameters are passed by value. Some languages have a feature where you can designate a parameter to be an implicit pointer — it's called call by reference as opposed to the call by value used by C. Such a feature was added into C++; it was not retained by Java.]

Suppose the function said the following instead.

t = *ip; ip = jp; /* Was: *ip = *jp; */ *jp = t;

This would still compile, but the second line would in fact change the pointer only, so that both ip and jp point to the same place. After this line, memory would look like the following.

Thus, the actual value of x would not change with this attempt at implementation.

In C, the null pointer is called NULL. Its use is similar to None in Python or null in Java: It indicates a pointer that points to nothing.

1.2. The `scanf()` function

We've already seen the printf() function that allows you to output information to the screen.

printf("The value of i is %d.", i);

There is also a scanf function that allows you to read information from the user. Suppose, for example, that you wanted to read a number from the user. You can write the following.

printf("Type a number. "); scanf("%d", &i);

The scanf() function, like the printf() function, takes a format string indicating what sort of data the function will read from the user. The parameters following should be the memory addresses where the data read from the user should be placed. In this example, the format string “%d” indicates that the program should read an integer, written in decimal, from the user. The second parameter, &i, indicates that the value read should be placed into the i variable.

The important thing to remember about the scanf() function is that it wants memory addresses of variables, not the value of variables: Those ampersands are important. Of course, the reason it wants memory addresses is so that scanf() can save the user's typed data where the calling function wants them.

1.3. Arrays, revisited

In C, an array is basically a pointer whose value cannot be changed. In fact, when you pass an array as a parameter, the only thing that really gets passed is the memory address of the first element of the array. So you can write something like the following.

void setToZero(int *arr, int n) { int i; for (i = 0; i < n; i++) { arr[i] = 0; } } int main() { int grades[50]; setToZero(grades, 50); return 0; }

In this program, the setToZero function takes a pointer to an integer as its first parameter. When we call it with “setToZero(grades, 50)”, the address of the first number in grades is copied into the arr parameter variable. The bracket operator can also be applied to pointers as if they referenced the first item in an array, so the line “arr[i] = 0;” is legal. (Alternatively, you could write “*(arr + i) = 0;”. Adding the integer i to the pointer arr would compute the address where index i would be located if arr were an array, and the asterisk would dereference this address.)

2. Strings

2.1. String basics

C's support for strings is very limited: A string, as far as C is concerned, is simply an array of characters. Each string includes a hidden character that marks the end of the string. This marker is NUL, the ASCII character whose value is 0. (Later, we'll learn about NULL pointers in C; the words are spelled similarly, but they are different concepts: NUL is a character value, while NULL is a pointer value.) The NUL character can be referenced in a C program as “'\0'”.

If you wanted to copy all the letters from the string src to another string dst, you could use the following for loop.

for (i = 0; src[i] != '\0'; i++) { dst[i] = src[i]; } dst[i] = '\0';

The for loop copies all of the characters in src up to, but not including, the NUL character. Then it places a NUL character at the end of dst so that the copied string has the terminator also.

In practice, I'd never write such a for loop in a program. Instead, I'd use the built-in strcpy() function. There are several library functions built into C for working with strings. (Their prototypes are in the string.h header file; we'll talk about header files in Section 3.2.) Following are three.

void strcpy(char *dst, char *src): Copies all the characters of src into dst, including the terminating NUL character.
int strlen(char *src): Returns the number of characters in src (not including the terminating NUL character).
int strcmp(char *a, char *b): Returns zero if a and b are identical, a negative number if a comes before b in lexicographic order, and a positive number if a comes after b. Lexicographic order refers to the ordering based on ASCII codes. This generally matches alphabetic order, but Ada actually follows AR lexicographically since the first characters match but the second characters do not, and the ASCII value for a capital R (82) is less than the ASCII value for a lower-case d (99).

C has no support for strings of indefinite length. You can move the NUL character forward in the array to make it shorter, but you can't move it past the end of an array.

2.2. Example: Tokenizing a string

Figure 5 below shows a useful function that we'll explore as an illustration of many of the concepts we've covered so far. It defines a function that takes three parameters: a string referenced by buf, an array of pointers to strings referenced by argv, and an integer max_args. The function is to split the string buf into separate words, placing pointers to successive words into argv and returning the number of words found. The max_args parameter indicates how long the array is.

Figure 5: Splitting a string.
#include <ctype.h> /* splitLine * Breaks a string into a sequence * of words. The pointer to each * successive word is placed into * an array. The function returns * the number of words found. * * Parameters: * buf - string to be broken up * argv - array where pointers to * the separate words should go. * max_args - maximum number of * pointers the array can hold. * * Returns: * number of words found in string. */ int splitLine(char *buf, char **argv, int max_args) { int arg; /* skip over initial spaces */ while (isspace(*buf)) buf++; for (arg = 0; arg < max_args && *buf != '\0'; arg++) { argv[arg] = buf; /* skip past letters in word */ while (*buf != '\0' && !isspace(*buf)) { buf++; } /* if not at line's end, mark * word's end and continue */ if (*buf != '\0') { *buf = '\0'; buf++; } /* skip over extra spaces */ while (isspace(*buf)) buf++; } return arg; }

For example, suppose we wanted to use this function to split the sentence “The dog is agog.” into words. We'd place this into an array of characters and pass this string as buf into the function. We'd also create an array of string pointers to pass as argv, with max_args being the length of this array.

The function's job is to place pointers into argv to the individual words.

In this case, the function should return 4, since there are four words in the sentence.

The function accomplishes this by replacing spaces in the sentence with NUL characters and pointing the array entries referenced by argv into the sentence's array.

It uses the isspace() function for identifying space characters; this function's prototype is in the ctype.h header file included on line 1.

3. Data structures

Although C doesn't have the notion of class as in object-oriented languages, it does have the structure, which allows you to define a new data type representing a conglomeration of data. The primary distinction between a class and a structure is that a class can have both variables and methods, while a structure can have only variables.

A structure type definition in C looks like the following.

struct Point { int x; int y; }; /* This semicolon is required! */

Here, we've defined a type called struct Point. In C, struct is part of the type name for every structure defined. (C++ makes this keyword optional, and so the type might just be called Point. But we're talking about C now.) Each struct Point variable has two “sub-variables” (called fields), x and y. So you can write the following (which does nothing interesting except illustrate a C structure's use).

int main() { struct Point p; p.x = 50; p.y = 100; return 0; }

Notice how a structure is automatically created at the beginning of the function, with the variable declaration, and it automatically goes away at the function's end.

You can easily make an array of structures, talk about pointers to structures, or place a structure within another structure.

When you pass a structure as a parameter, you should pass a pointer to the structure, not the structure itself. The reason is that structures tend to contain lots of data, and copying all of the fields into the parameter variable is inefficient.

#include <math.h> double distToOrigin(struct Point *p) { return sqrt((*p).x * (*p).x + (*p).y * (*p).y); }

In fact, the (*ptr).field construct is so common that C includes a shortcut for it: The arrow operator allows you to write ptr->field in place of (*ptr).field. Thus, the following definition is equivalent.

#include <math.h> double distToOrigin(struct Point *p) { return sqrt(p->x * p->x + p->y * p->y); }

3.1. Dynamic memory

Sometimes you want to allocate memory in the course of running a program. To do this, you can use the malloc() function included in the C library. The malloc() function takes a single integer, which represents how many bytes the function should allocate, and it returns the memory address of the first byte of memory allocated.

In computing how many bytes to allocate, the sizeof operator is useful: Given a type, the sizeof operator computes how many bytes a value of that type requires. So sizeof(int) would be 4 on most computers (but on some computers it may be 2 or 8), and sizeof(struct Point) would have a value of 8 (or 4 or 16 depending on the computer).

So we could write the following, which allocates an array based on a length given by the user.

int main() { int n; int *arr; printf("How long an array? "); scanf("%d", &n); arr = (int*) malloc( n * sizeof(int)); return 0; }

In the malloc line, we've opted to allocate enough memory to hold n integers. The casting operator in this line is important — if we did not cast to an int*, the C compiler would report that the type returned by malloc() can't be assigned to an int* variable.

When you allocate memory in C, you should deallocate it later to free the space. To accomplish this, you can use the free() function, which takes a pointer as a parameter and marks the memory to which it points as available.

free(arr);

After you deallocate memory, you should not use it again. Neither should you deallocate the memory a second time. Avoiding both of these issues can be pretty painful.

But it should be deallocated. If the program ends, all memory used by the program will be deallocated automatically. But, if the program continues for a long period and unused memory is never deallocated, then the program's memory requirements will grow steadily, leading to strange problems later once memory runs out. This is called a memory leak.

(Python avoids the issue of memory deallocation entirely by letting the computer automatically figure out when memory becomes available. This automatic process, called garbage collection, is complex, and efficient techniques for it were unknown at the time of C's development, so C opts to let the programmer control memory deallocation. For Python's designers, faster computers and more efficient collection techniques, coupled with a different target audience, tilted the scale toward automatic garbage collection instead. Techniques for garbage collection are interesting but unfortunately beyond the scope of what we're studying here.)

3.2. Example: Linked list

Figure 6 defines a List structure for representing a linked list of numbers. Moreover, Figure 6 contains two functions, listCreate() to create a new, empty linked list and listRemove() to remove a number from the list.

Figure 6: C implementation of a linked list
#include <stdlib.h> struct Node { int data; struct Node *next; }; struct List { struct Node *first; }; /* listCreate * Creates an empty linked list. * * Returns: * allocated list, holding nothing. */ struct List* listCreate() { struct List *ret; ret = (struct List*) malloc( sizeof(struct List)); ret->first = NULL; return ret; } /* listRemove * Removes first occurrence of * number from a linked list, * returning 1 if successful * * Parameters: * list - list from which item is * to be removed. * to_remove - number to be removed * from the list. * * Returns: * 1 if item was found and removed, * 0 otherwise. */ int listRemove(struct List *list, int to_remove) { struct Node **cur_p; struct Node *out; int cur_data; cur_p = &(list->first); while (*cur_p != NULL) { cur_data = (*cur_p)->data; if (cur_data == to_remove) { out = *cur_p; *cur_p = out->next; free(out); return 1; } cur_p = &((*cur_p)->next); } return 0; }

The listRemove() function works with pointers in a peculiar way: It uses cur_p to point to the pointer to the current node; this way, once the desired number is found, we can alter the pointer within the list referencing that node. The more intuitive way of writing this function is to have a cur variable stepping through the list, but this necessitates tracking the previous node, since that node's pointer to its successor will change once cur points to the node to remove.

prev = NULL; cur = list->first while (cur != NULL) { if (cur->data == to_remove) { if (prev == NULL) { list->first = cur->next; } else { prev->next = cur->next; } free(cur); return 1; } prev = cur; cur = cur->next; } return 0;

By instead remembering the address of the pointer to the current node, the implementation of Figure 6 avoids this additional complexity. It's not really a better implementation, but it does illustrate an interesting use of pointers. Incidentally, the version used in Figure 6 would be impossible in a language without pointers, such as Python.

Let's go back to Figure 6. It would be tempting to write the listRemove() function's if statement as follows instead. (This change avoids the out variable altogether, and so we could also delete out's declaration.)

if ((*cur_p)->data == to_remove) { free(*cur_p); /* Bad !! */ *cur_p = (*cur_p)->next; }

This alteration is wrong, however, because it uses memory after the memory has already been freed: The second line in the body accesses the next field of *cur_p after *cur_p was freed by the previous line. so this memory is technically no longer available. (It looks like this would be safe, since there are no intervening memory allocations between lines 39 and 40, and indeed it would work on many systems; but it's quite possible that the free() function will write over the next field.) Thus, this change results in a wrong program.

This concludes our introduction to C programming. While we have of course omitted discussion of several features of C, you should now be well on your way to understanding most C code that you might encounter.