Wednesday, 16 November 2011

What’s the difference between big-endian and little-endian machines? in C programming

What’s the difference between big-endian and little-endian machines?

The difference between big-endian and little-endian is in which end of a word has the most significant byte. Looked at another way, it’s a difference of whether you like to count from left to right, or right to left. Neither method is better than the other. A portable C program needs to be able to handle both kinds of machines.

Say that your program is running on a machine on which a short is two bytes long, and you’re storing the value 258 (decimal) in a short value at address 0x3000. Because the value is two bytes long, one byte will be stored at 0x3000, and one will be stored at 0x3001. The value 258 (decimal) is 0x0102, so one byte will be 1, and one will be 2. Which byte is which?

That answer varies from machine to machine. On a big-endian machine, the most significant byte is the one with the lower address. (The “most significant byte” or “high-order byte” is the one that will make the biggest change if you add something to it. For example, in the value 0x0102, 0x01 is the most significant byte, and 0x02 is the least significant byte.) On a big-endian machine, the bytes are stored as shown here:

address 0x2FFE 0x2FFF 0x3000 0x3001 0x3002 0x3003
value 0x01 0x02
That makes sense; addresses are like numbers on a ruler, with the smaller addresses on the left and the larger
addresses on the right.
On a little-endian machine, however, the bytes are stored as shown here:
address 0x3003 0x3002 0x3001 0x3000 0x2FFF 0x2FFE
value 0x01 0x02
That makes sense, too. The smaller (in the sense of less significant) part is at the lower address. Bad news: some machines store the bytes one way; some, the other. For example, an IBM compatible handles the bytes differently than a Macintosh.

Why does that difference matter? What happens if you use fwrite to store a short directly, as two bytes, into a file or over a network, not formatted and readable but compact and binary? If a big-endian machine storesit and a little-endian reads it (or vice versa), what goes in as 0x0102 (258) comes out as 0x0201 (513). Oops.
The solution is, instead of storing shorts and ints the way they’re stored in memory, pick one method of storing (and loading) them, and stick to it. For example, several standards specify “network byte order,” which is big-endian (most significant byte in the lower address). For example, if s is a short and a is an arrayof two chars, then the code
a[0] = (s >> 4) & 0xf;
a[1] = s & 0xf;
stores the value of s in the two bytes of a, in network byte order. This will happen if the program is running on a little-endian machine or on a big-endian machine.

You’ll notice I haven’t mentioned which machines are big-endian and which are little-endian. That’s deliberate. If portability is important, you should write code that works either way. If efficiency is important, you usually should still write code that works either way. For example, there’s a better way to implement the preceding code fragment on big-endian machines. However, a good compiler will generate machine code hat takes advantage of that implementation, even for the portable C code it’s given.

NOTE
The names “big-endian” and “little-endian” come from Gulliver’s Travels by Jonathan Swift. On his third voyage, Gulliver meets people who can’t agree how to eat hard-boiled eggs: big end first, or little end first. “Network byte order” applies only to int, short, and long values. char values are, by definition, only one byte long, so there’s no issue with them. There’s no standard way to store float or double values.

Cross Reference:

X.5: What is meant by high-order and low-order bytes?
X.6: How are 16- and 32-bit numbers stored?

No comments:

Post a Comment