Information AboutByte Order |
| CATEGORIES ABOUT ENDIANNESS | |
| computer memory | |
| data transmission | |
|
EXPLANATION When a sequence of small units is used to form a larger Ordinal value, convention must establish the order in which those smaller units are placed. This could be considered similar to the situation in different Written Language s, where some (such as English and French ) are written left to right, while others (such as Arabic and Hebrew ) are written right to left. The decimal numbering is big-endian when written using numbers, starting at the left with the highest order magnitude and progressing to smaller order magnitudes to the right. For example, the number 1234 starts with the thousands (in this case: ''one'' thousand) and continues through the hundreds (2) and tens (3) to units (4). ENDIANNESS IN COMPUTERS There seem to be no significant advantages in using one method of endianness over the other, and both have remained common in terms of the number of different architectures that use them. However, because little endian Intel X86 based processors (and their clones) are used in most personal computers and laptops, the vast majority of desktop computers in the world today use little endian. Generally the Byte ( Octet ) is considered an atomic unit from the point of view of storage at all but the lowest levels of network protocols and storage formats. Therefore sequences based around single bytes (e.g. text in ASCII or one of the ISO-8859-n encodings) are not generally affected by endian issues. While variable-width text encodings using the byte as their base unit could be considered to have an inbuilt endianness that, at least in all commonly used text encodings, is fixed by the encoding's design. However, Unicode strings encoded with UTF-16 or UTF-32 are affected by endianness, because each code unit must be further represented as two or four bytes. Logical and arithmetical description Note When some computers store a 32-bit integer value in memory, for example 4A3B2C1D at address 100, they store the bytes within the address range 100 through 103 in the following order: Big-endian That is, the Most Significant Byte (also known as the ''MSB'', which is 4A in our example) is stored at the memory location with the lowest address, the next byte in significance, 3B, is stored at the next memory location and so on.Architectures that follow this rule are called big-endian (, SPARC and System/370 . Other computers store the value 4A3B2C1D in the following order:Little-endian That is, , DEC VAX , and most notably the Intel X86 based series of processors including Intel Pentium based personal computers and laptops. In other words, endianness does not denote what the value ''ends'' with when stored in memory, but rather ''which end'' it begins with. Note that the stated mnemonics are not the origin of the terms, see Below . Some architectures can be configured either way; these include ARM , PowerPC (but not the PPC970/G5 ), DEC Alpha , MIPS , PA-RISC and IA64 . The word bytesexual or '''bi-endian''', said of hardware, denotes willingness to compute or pass data in either big-endian or little-endian format (depending, presumably, on a mode bit somewhere). Many of these architectures can be switched via software to default to a specific endian format (usually done when the computer starts up); however, on some architectures the default endianness is selected by some hardware on the motherboard and cannot be changed by software (e.g., the DEC Alpha, which runs only in big-endian mode on the Cray T3E). Middle-endian Still other architectures, called middle-endian (or sometimes '''mixed-endian'''), may have a more complicated ordering such that bytes within a 16-bit unit are ordered differently from the 16-bit units within a 32-bit Word . For instance, 4A3B2C1D is stored as:or alternatively: Middle-endian architectures include the PDP-11 family of processors. (The term pdp-endian is still sometimes used to refer specifically to the PDP-11's endianness.) The format for double-precision floating-point numbers on the VAX and ARM are also middle-endian. In general, these complex orderings are more confusing to work with than consistent big or little endianness. The concept of endianness is less important in the numbering of bits within a byte, as computer architectures in general do not support the addressing of individual bits within bytes. Sub-byte addressing is instead accomplished with arithmetic and logical instructions which are well-defined in terms of the significance of the bits, rather than an arbitrary numbering in an address space, and therefore architecture-neutral. Issues similar to those of byte-endianness can still apply when interpreting bit position as something other than binary significance, for example in dealing with bitmapped graphics formats, but these issues are not intrinsically related to the architecture being used. A C program cannot be written to test the bit order of an architecture. That said, the byte-endianness of an architecture may lead a programmer to make assumptions about bit order which are inconsistent with the data formats they are operating on. Furthermore, if a byte is identified incorrectly within a word, then any bits within that byte will also be incorrectly identified, regardless of the system used to address them. C function to check if a system is big-endian or little-endian (assumes int is larger than char and will not determine if a system is middle-endian):
if (p {Link without Title} == 1) // Lowest address contains the least significant byte return LITTLE_ENDIAN; else return BIG_ENDIAN; } Portability issues Endianness has grave implications in software Portability . For example, in interpreting data stored in binary format and using an appropriate Bitmask , the endianness is important because different endianness will lead to different results from the mask. Writing binary data from software to a common format leads to a concern of the proper endianness. For example saving data in the BMP bitmap format requires little-endian integers - if the data are stored using big-endian integers then the data will be corrupted since they do not match the format. Software that needs to share information between hosts of different endianness typically uses one of two strategies. Either it can choose a single endianness for sharing data, or it can allow hosts to share data in any endianness that they choose, so long as they mark which one they are using. Both approaches have advantages: on the one hand, choosing a single endianness makes decoding easier, since software only needs to decode one format. On the other hand, allowing multiple endiannesses makes encoding easier, since software doesn't need to convert data out of its native order; and also enables more efficient communication when the encoder and decoder share a single endianness, since neither needs to change the byte order. Most Internet Standard s take the first approach, and specify big-endian byte order. Many vendor originated formats simply use the byte order of the platform they originated on. Some other applications, notably X11 , take the second approach. UTF-16 can be written in big-endian or little-endian order. It permits a Byte Order Mark (BOM) of 2 bytes at the beginning of a string to denote its endianness. A similar 4 byte byte-order mark can be used with the rare encoding UTF-32 . Example programming caveat Below is an example application, written in C , which demonstrates the dangers of programming endianness unaware: #include
{
struct { char one {Link without Title} ; int two; char three {Link without Title} ; } data;
strcpy (data.one, "foo"); data.two = 0x01234567; strcpy (data.three, "bar");
fp = fopen ("output", "wb"); if (fp) { fwrite (&data, sizeof (data), 1, fp); fclose (fp); } } This code compiles properly on an I386 machine running FreeBSD and a SPARC64 machine running Solaris, but the output is different when examined with the Hexdump utility. i386 $ hexdump -C output |
|
|