ASCII ('''''A'''merican '''S'''tandard '''C'''ode for '''I'''nformation '''I'''nterchange''), generally Pronounced , is a Character Encoding based on the English Alphabet . ASCII codes represent Text in Computer s, Communications equipment, and other devices that work with text. Most modern character encodings have a historical basis in ASCII.
ASCII was first published as a standard in 1967 and was last updated in 1986. It currently defines codes for 33 non-printing, mostly obsolete Control Character s that affect how text is processed, plus the following 95 printable characters (starting with the space character):
|
|   |
{ Class |
"wikitable" style="text-align: center"
|
|   |
{ Class |
"wikitable" style="text-align: center"
|
|   |
{ Class |
"wikitable" style="text-align: center"
|
Structural features
- The digits 0-9 are represented with their values in binary prefixed with 0011 (this means that Bcd -ASCII is simply a matter of taking each bcd nibble separately and prefixing 0011 to it.
- Lowercase and uppercase letters only differ in bit pattern by a single bit simplifying case conversion to a range test (to avoid converting characters that are not letters) and a single Bitwise Operation .
Aliases for ASCII
RFC 1345 (published in June 1992) and the
IANA registry of character sets (ongoing), both recognize the following case-insensitive aliases for ASCII as suitable for use on the Internet:
- ANSI_X3.4-1968 (canonical name)
- ANSI_X3.4-1986
- ASCII
- US-ASCII (preferred MIME name)
- us
- ISO646-US
- ISO_646.irv:1991
- iso-ir-6
- IBM367
- cp367
- csASCII
Of these, only the aliases "US-ASCII" and "ASCII" have achieved widespread use. One often finds them in the optional "charset" parameter in the Content-Type header of some
MIME messages, in the equivalent "meta" element of some
HTML documents, and in the encoding declaration part of the prolog of some
XML documents.
Variants of ASCII
As computer technology spread throughout the world, different standards bodies and corporations developed many variations of ASCII in order to facilitate the expression of non-English languages that used Roman-based alphabets. One could class some of these variations as "ASCII
Extensions ", although some mis-apply that term to cover all variants, including those that do not preserve ASCII's character-map in the 7-bit range.
ISO 646 (1972), the first attempt to remedy the pro-English-language bias, created compatibility problems, since it remained a 7-bit character-set. It made no additional codes available, so it reassigned some in language-specific variants. It thus became impossible to know what character a code represented without knowing which variant to work with, and text-processing systems could generally cope with only one variant anyway.
Eventually, improved technology brought out-of-band means to represent the information formerly encoded in the eighth bit of each byte, freeing this bit to add another 128 additional character-codes for new assignments. For example,
IBM developed 8-bit
Code Page s, such as
Code Page 437 , which replaced the control-characters with graphic symbols such as
Smiley faces, and mapped additional graphic characters to the upper 128 positions. Operating systems such as
DOS supported these code-pages, and manufacturers of
IBM PC s supported them in hardware.
Eight-bit standards such as
ISO/IEC 8859 and
Mac OS Roman developed as true extensions of ASCII, leaving the original character-mapping intact and just adding additional values above the 7-bit range. This enabled the representation of a broader range of languages, but these standards continued to suffer from incompatibilities and limitations. Still,
ISO-8859-1 , its variant
Windows-1252 (often mislabeled as ISO-8859-1 even by Microsoft software) and original 7-bit ASCII remain the most common character encodings in use today.
encoding-form prescribes the use of one to four 8-bit code values for each code point character, and equates exactly to ASCII for the code values below 128. Other encoding forms such as
UTF-16 resemble ASCII in how they represent the first 128 characters of Unicode, but tend to use 16 or 32 bits per character, so they require conversion for compatibility.
The ,
2005 .
The abbreviation ASCIIZ or ASCIZ refers to a
Null-terminated ASCII String .
Trivia
Asteroid
3568 ASCII is named after the character encoding.
See also
(where all ASCII printable characters are identical to ASCII)
(where some ASCII printable characters have been replaced)
References
For specific points
General
External links