Basic Multilingual Plane Shopping
Unicode
Website Links For
Mapping
 

Information About

Basic Multilingual Plane




Unicode ’s Universal Character Set potentially supports over 1 million (1,114,112 = 220 + 216 ''or'' 17 × 216, Hexadecimal 110000) code points.

As of Unicode 5.0.0, 102,012 (9.2%) of these code points are assigned, with another 137,468 (12.3%) reserved for Private Use , 2,048 for Surrogates , and 66 designated Noncharacters , leaving 872,582 (78.3%) unassigned. The number of assigned code points is made up as follows:

(See the Summary Table for a more detailed breakdown).

Unicode characters can be categorized in many ways. Every character is assigned a ''script'' (though many are assigned the common or inherited scripts where they inherit the script from the adjacent character). In Unicode a script is a coherent writing system that includes letters but also may include script specific punctuation, diacritic and other marks and numerals and symbols. A single script supports one or more languages.

Characters are assigned in ''blocks'' of characters. These blocks are usually groups of code points in some multiple of eight: many, for example, are grouped in blocks of 128 or 256 code points. Every character is also assigned a ''general category'' and subcategory. The general categories are: letter, mark, number, punctuation, symbol, or control (in other words a formatting or non-graphical character).

The blocks of characters are assigned according to various ''planes''. Most characters are currently assigned to the first plane: the ''Basic Multilingual Plane''. This is to help ease the transition for legacy software since the Basic Multilingual Plane is addressable with just two Octet bytes. The characters outside the first plane usually have very specialized or seldom use.

The first 256 code points correspond with those of ISO 8859-1 , the most popular 8-bit Character Encoding in the Western World . As a result, the first 128 characters are also identical to ASCII . Though Unicode refers to these as a Latin script block, these two blocks contain many characters that are commonly useful outside of the Latin script.


PLANES


The Unicode characters can be categorized in many different ways, Unicode code points can be logically divided into 17 ''planes'', each with 65,536 (= 216) code points, although currently only a few planes are used:
  • Plane 0 (0000–FFFF): Basic Multilingual Plane (BMP). This is the plane containing most of the character assignments so far. A primary objective for the BMP is to support the unification of prior character sets as well as characters for Writing Systems in current use.

  • Plane 1 (10000–1FFFF): Supplementary Multilingual Plane (SMP).

  • Plane 2 (20000–2FFFF): Supplementary Ideographic Plane (SIP)

  • Planes 3 to 13 (30000–DFFFF) are unassigned

  • Plane 14 (E0000–EFFFF): Supplementary Special-purpose Plane (SSP)

  • Plane 15 (F0000–FFFFF) reserved for the Private Use Area (PUA)

  • Plane 16 (100000–10FFFF), reserved for the Private Use Area (PUA)


Currently, about ten percent of the potential space is used. Furthermore, ranges of characters have been tentatively blocked out for every current and ancient writing system (script) the Unicode consortium has been able to identify: (see {Link without Title} ). While Unicode may eventually need to use another of the spare 11 planes for ideographic characters, other planes remain, if previously unknown scripts with tens of thousands of characters are discovered. This 20 bit limit is therefore unlikely to be reached in the near future.


Basic Multilingual Plane


The first plane (plane 0), the ''Basic Multilingual Plane'' (BMP), is where most characters have been assigned so far. The BMP contains characters for almost all modern languages, and a large number of special characters. Most of the allocated code points in the BMP are used to encode Chinese, Japanese, and Korean ( CJK ) characters.

The graphic on the right is a visual roadmap to the Basic Multilingual Plane. The colours in use are:
  •  Black  = Latin scripts and symbols

  •  Light Blue  = Linguistic scripts

  •  Blue  = Other European scripts

  •  Orange  = Middle Eastern and SW Asian scripts

  •  Light Orange  = African scripts

  •  Green  = South Asian scripts

  •  Purple  = Southeast Asian scripts

  •  Red  = East Asian scripts

  •  Light Red  = Unified CJK Han

  •  Yellow  = Canadian Aboriginal Scripts

  •  Magenta  = Symbols

  •  Dark Grey  = Diacritic s

  •  Light Grey  = UTF-16 surrogates and private use

  •  Cyan  = Miscellaneous characters

  •  White  = Unused


As Of Unicode 5.0 , The BMP includes the following scripts:

Future additions
Several scripts are expected to be included in the BMP in the next revision of Unicode. These scripts, and their proposed code point ranges, are the following:

Several other scripts are proposed for inclusion in the BMP, including:


Supplementary Multilingual Plane


Plane 1, the ''Supplementary Multilingual Plane'' (SMP), is mostly used for historic scripts such as Linear B , but is also used for musical and mathematical symbols.


Supplementary Ideographic Plane

Plane 2, the ''Supplementary Ideographic Plane'' (SIP), is used for about 40,000 Unified Han Ideograph s that have previously been seldom used in daily written communications.


Unused planes

Unicode has not yet assigned any characters to Planes 3 through 13. The current study of written language have not identified any need for these planes yet. However, symbol characters that arise outside the script writing systems could have potentially limitless possibilities for characters. The UCS and Unicode take requests for symbols on a case by case basis.


Supplementary Special-purpose Plane

Plane 14 (''E'' in Hexadecimal ), the ''Supplementary Special-purpose Plane'' (SSP), currently contains non-graphical characters in two blocks of 128 and 240 characters. The first block is for language tag characters for use when language cannot be indicated through other protocols (such as the