Sunday 13 November 2011

How can I manipulate strings of multibyte characters? in C programming

How can I manipulate strings of multibyte characters?

Better than you might think.

Say your program sometimes deals with English text (which fits comfortably into 8-bit chars with a bit to spare) and sometimes Japanese text (which needs 16 bits to cover all the possibilities). If you use the same code to manipulate either country’s text, will you need to set aside 16 bits for every character, even your English text? Maybe not. Some (but not all) ways of encoding multibyte characters can store information about whether more than one byte is necessary.

mbstowcs (“multibyte string to wide character string”) and wcstombs (“wide character string to multibyte string”) convert between arrays of wchar_t (in which every character takes 16 bits, or two bytes) and multibyte strings (in which individual characters are stored in one byte if possible).  There’s no guarantee your compiler can store multibyte strings compactly. (There’s no single agreed-upon way of doing this.) If your compiler can help you with multibyte strings, mbstowcs and wcstombs are the
functions it provides for that.

Cross Reference:

XII.14: What are multibyte characters?

No comments:

Post a Comment