Saturday 12 November 2011

What standard functions are available to manipulate strings? in C programming

What standard functions are available to manipulate strings?

Short answer: the functions in <string.h>.
C doesn’t have a built-in string type. Instead, C programs use char arrays, terminated by the NUL (‘\0’)
character.

C programs (and C programmers) are responsible for ensuring that the arrays are big enough to hold all that
will be put in them. There are three approaches:
u Set aside a lot of room, assume that it will be big enough, and don’t worry what happens if it’s not
big enough (efficient, but this method can cause big problems if there’s not enough room).
u Always allocate and reallocate the necessary amount of room (not too inefficient if done with realloc;
this method can take lots of code and lots of runtime).
u Set aside what should be enough room, and stop before going beyond it (efficient and safe, but you
might lose data).

NOTE
C++ is moving toward a fourth approach: leave it all behind and define a string type. For various reasons, that’s a lot easier to do in C++ than in C. Even in C++, it’s turning out to be rather involved. Luckily, after a standard C++ string type has been defined, even if it turns out to be hard to  implement, it should be very easy for C++ programmers to use.

There are two sets of functions for C string programming. One set (strcpy, strcat, and so on) works with the first and second approaches. This set copies or uses as much as it’s asked to—and there had better be room for it all, or the program might be buggy. Those are the functions most C programmers use. The other set (strncpy, strncat, and so on) takes the third approach. This set needs to know how much room there is, and it never goes beyond that, ignoring everything that doesn’t fit.

The “n” (third) argument means different things to these two functions:

To strncpy, it means there is room for only “n” characters, including any NUL character at the end. strncpy copies exactly “n” characters. If the second argument doesn’t have that many, strncpy copies extra NUL characters. If the second argument has more characters than that, strncpy stops before it copies any NUL character. That means, when using strncpy, you should always put a NUL character at the end of the string yourself; don’t count on strncpy to do it for you. To strncat, it means to copy up to “n” characters, plus a NUL character if necessary. Because what you really know is how many characters the destination can store, you usually need to use strlen to calculate how many characters you can copy.

The difference between strncpy and strncat is “historical.” (That’s a technical term meaning “It made sense to somebody, once, and it might be the right way to do things, but it’s not obvious why right now.”) Listing XII.5a shows a short program that uses strncpy and strncat.
 
NOTE
Get to know the “string-n” functions. Using them is harder but leads to more robust, less buggy software. If you’re feeling brave, try rewriting the program in Listing XII.5a with strcpy and strcat, and run it with big enough arguments that the buffer overflows. What happens? Does your computer hang? Do you get a General Protection Exception or a core dump? See FAQ VII.24 for a discussion.

An example of the “string-n” functions.

#include <stdio.h>
#include <string.h>
/*
Normally, a constant like MAXBUF would be very large, to
help ensure that the buffer doesn’t overflow. Here, it’s very
small, to show how the “string-n” functions prevent it from
ever overflowing.
*/
#define MAXBUF 16
int
main(int argc, char** argv)
{
char buf[MAXBUF];
int i;
buf[MAXBUF - 1] = ‘\0’;
strncpy(buf, argv[0], MAXBUF-1);
for (i = 1; i < argc; ++i) {
strncat(buf, “ “,
MAXBUF - 1 - strlen(buf));
strncat(buf, argv[i],
MAXBUF - 1 - strlen(buf));
}
puts(buf);
return 0;
}

NOTE
Many of the string functions take at least two string arguments. It’s convenient to refer to them as “the left argument” and “the right argument,” rather than “the first argument” and “the second argument,” for describing which one is which.
strcpy and strncpy copy a string from one array to another. The value on the right is copied to the value
on the left; think of the order as being the same as that for assignment.
strcat and strncat “concatenate” one string onto the end of another. For example, if a1 is an array that holds
“dog” and a2 is an array that holds “wood”, after calling strcat(a1, a2), a1 would hold “dogwood”.
strcmp and strncmp compare two strings. The return value is negative if the left argument is less than the
right, zero if they’re the same, and positive if the left argument is greater than the right. There are two
common idioms for equality and inequality:
if (strcmp(s1, s2)) {
/* s1 != s2 */
}
and
if (! strcmp(s1, s2)) {
/* s1 == s2 */
}

This code is not incredibly readable, perhaps, but it’s perfectly valid C code and quite common; learn to
recognize it. If you need to take into account the current locale when comparing strings, use strcoll. A number of functions search in a string. (In all cases, it’s the “left” or first argument being searched in.) strchr and strrchr look for (respectively) the first and last occurrence of a character in a string. (memchr and memrchr are the closest functions to the “n” equivalents strchr and strrchr.) strspn, strcspn (the “c” stands

for “complement”), and strpbrk look for substrings consisting of certain characters or separated by certain
characters:
n = strspn(“Iowa”, “AEIOUaeiou”);
/* n = 2; “Iowa” starts with 2 vowels */
n = strcspn(“Hello world”, “ \t”);
/* n = 5; white space after 5 characters */
p = strbrk(“Hello world”, “ \t”);
/* p points to blank */
strstr looks for one string in another:
p = strstr(“Hello world”, “or”);
/* p points to the second “o” */

strtok breaks a string into tokens, which are separated by characters given in the second argument. strtok is “destructive”; it sticks NUL characters in the original string. (If the original string should be changed, it should be copied, and the copy should be passed to strtok.) Also, strtok is not “reentrant”; it can’t be called from a signal-handling function, because it “remembers” some of its arguments between calls. strtok is an odd function, but very useful for pulling apart data separated by commas or white space. Listing XII.5b shows a simple program that uses strtok to break up the words in a sentence.

An example of using strtok.
#include <stdio.h>
#include <string.h>
static char buf[] = “Now is the time for all good men ...”;
int
main()
{
char* p;
p = strtok(buf, “ “);
while (p) {
printf(“%s\n”, p);
p = strtok(NULL, “ “);
}
return 0;
}

Cross Reference:

IV.18: How can I read and write comma-delimited text?
Chapter VI: Working with Strings
VII.23: What is the difference between NULL and NUL?
IX.9: What is the difference between a string and an array?
XII.8: What is a “locale”?
XII.10: What’s a signal? What do I use signals for?

No comments:

Post a Comment