Tuesday 8 November 2011

What is the difference between a string and an array? in C programming

What is the difference between a string and an array?

An array is an array of anything. A string is a specific kind of an array with a well-known convention to
determine its length.

There are two kinds of programming languages: those in which a string is just an array of characters, and those in which it’s a special type. In C, a string is just an array of characters (type char), with one wrinkle: a C string always ends with a NUL character. The “value” of an array is the same as the address of (or a pointer to) the first element; so, frequently, a C string and a pointer to char are used to mean the same thing. An array can be any length. If it’s passed to a function, there’s no way the function can tell how long the array is supposed to be, unless some convention is used. The convention for strings is NUL termination; the last
character is an ASCII NUL (‘\0’) character. In C, you can have a literal for an integer, such as the value of 42; for a character, such as the value of ‘*’;
or for a floating-point number, such as the value of 4.2e1 for a float or double.

NOTE
Actually, what looks like a type char literal is just a type int literal with a funny syntax. 42 and ‘*’ are exactly the same value. This isn’t the case for C++, which has true char literals and function parameters, and which generally distinguishes more carefully between a char and an int. There’s no such thing as a literal for an array of integers, or an arbitrary array of characters. It would be very hard to write a program without string literals, though, so C provides them. Remember, C strings conventionally end with a NUL character, so C string literals do as well. “six times nine” is 15 characters long (including the NUL terminator), not just the 14 characters you can see. There’s a little-known, but very useful, rule about string literals. If you have two or more string literals, one after the other, the compiler treats them as if they were one big string literal. There’s only one terminating NUL character. That means that “Hello, “ “world” is the same as “Hello, world”, and that

char message[] =
“This is an extremely long prompt\n”
“How long is it?\n”
“It’s so long,\n”
“It wouldn’t fit on one line\n”;

is exactly the same as some code that wouldn’t fit on this page of the book.

When defining a string variable, you need to have either an array that’s long enough or a pointer to some
area that’s long enough. Make sure that you leave room for the NUL terminator. The following example code
has a problem:

char greeting[ 12 ];
strcpy( greeting, “Hello, world” ); /* trouble */
There’s a problem because greeting has room for only 12 characters, and “Hello, world” is 13 characters
long (including the terminating NUL character). The NUL character will be copied to someplace beyond
the greeting array, probably trashing something else nearby in memory. On the other hand,
char greeting[ 12 ] = “Hello, world”; /* not a string */
is OK if you treat greeting as a char array, not a string. Because there wasn’t room for the NUL terminator,
the NUL is not part of greeting. A better way to do this is to write
char greeting[] = “Hello, world”;
to make the compiler figure out how much room is needed for everything, including the terminating NUL
character.

String literals are arrays of characters (type char), not arrays of constant characters (type const char). The
ANSI C committee could have redefined them to be arrays of const char, but millions of lines of code would
have screamed in terror and suddenly not compiled. The compiler won’t stop you from trying to modify the
contents of a string literal. You shouldn’t do it, though. A compiler can choose to put string literals in some
part of memory that can’t be modified—in ROM, or somewhere the memory mapping registers will forbid
writes. Even if string literals are someplace where they could be modified, the compiler can make them shared.
For example, if you write
char *p = “message”;
char *q = “message”;
p[ 4 ] = ‘\0’; /* p now points to “mess” */
(and the literals are modifiable), the compiler can take one of two actions. It can create two separate string
constants, or it can create just one (that both p and q point to). Depending on what the compiler did, q might
still be a message, or it might just be a mess.

NOTE

This is “C humor.” Now you know why so few programmers quit their day jobs for
stand-up comedy.
Cross Reference:
IX.1: Do array subscripts always start with zero?

No comments:

Post a Comment