X86 Article Index for
X86
Website Links For
X86 Architecture
 

Information About

X86




The generic term x86 refers to the " CISC " type Instruction Set (architecture) of the most commercially successful CPU s (unlike " Microarchitecture " referring to ''CPU's'' layout) in the history of Personal Computing , used in processors from Intel , AMD , VIA , and others. It derived from the model numbers of the first few generations of CPU s, Binary Compatible Backward with the ''Intel's'' original 16-bit 8086 of 1978, most of which were ending in "86"With the introduction of the Pentium brand in 1993, Intel ended its "80x86" naming scheme as ''numbers'' could not be trademarked. However, the term x86 was already firmly established among technicians, compiler writers etc..

After the introduction of 80386 in 1985, the ''x86'' term also implied, in practice, a binary compatibility with the ''80386's'' extended 32-bit Instruction Set – sometimes emphasized as X86-32 to distinguish it either from the original 16-bit x86-16 or from the newer 64-bit ''' X86-64 ''' ''instruction sets''. The ''x86'' term usually implies the 32-bit ''x86-32'' ''instruction set'', while the ''x86-64'' term (used especially in reference to 64-bit processors1) is generally substituted by the '''x64''' name (exclusive for 64-bit software2) at least in ''personal computing'' and Server sIntel's equivalents of the x86 and X86-64 have been the IA-32 and Intel 64 ( EM64T or IA-32e ) respectively. Likewise, AMD prefers the AMD64 name over the X86-64 they introduced themselves..

The only significant competitors to x86 in PC s were the Motorola 68k , CISC type, and the PowerPC , RISC type, Instruction Set s. However, by August 7 , 2006 , Apple Inc. Switched to x86 CPU s granting the x86 instruction set an effective monopoly among desktop and notebook processors. The x86 also held a growing majority among servers and workstations. Markets without a significant x86 presence include low cost embedded processors found in Appliance s and toys, among others.The embedded processor's market is populated by more than 20 different Architectures , which, due to the price sensitivity, low power and hardware simplicity requirements, outnumber the x86.

Countless Computer Software is written for the ''x86'' platform – including nearly all modern commercial Operating System s from MS-DOS and Microsoft Windows to Linux , BSD , Solaris OS , and Mac OS X – making the ''x86 Instruction Set Architecture '' indispensable on a global scale, and practically irreplaceable.


CHRONOLOGY

The table below lists brands of famous3 x86 ( designs.''


HISTORY

The x86 architecture first appeared in the Intel 8086 CPU released in 1978, as a fully 16-bit design based on the earlier Instruction Set of the Intel 8085 . Although not Binary Compatible With The ''8085'' , the ''8086'' was designed to allow Assembly Language programs written for the ''8085'' be mechanically translated into the equivalent ''8086'' ''assembly''. This made the ''8086'' a tempting migration path aim for the ''8085'' hardware and software vendors, but – due to the 16-bit databus – not without significant redesign of the ''8085'' system hardware. To reduce such a redesign need, Intel introduced the 8088 , which external 8-bit Databus more easily interfaced to already established, and therefore low-cost, 8-bit system and peripheral chips. This – and other, non technical factors – encouraged IBM to build their IBM PC around the ''8088'', despite a presence (at the time) of technically superior competitors, like the Motorola 68000 . Subsequently, the ''IBM PC'' became a dominant personal computer platform, and the ''8088'' (''8086'') and its successors became a dominant ''CPU'' for desktop and laptop computers making their (named later as ''x86'') ''instruction set'' architecture dominant as well.

At various times, companies such as IBM , NEC , AMD , TI , STM , Fujitsu , OKI , Siemens , Cyrix , Intersil , C&T , NexGen , and UMC started to design and/or manufacture Processors , which implemented the x86 ''instruction set'' architecture (but in varying ''CPU'' hardware designs, called " Microarchitecture s", and so-called " Compatible " with the original) and were intended for personal computers as well as embedded systems. For the personal computer market, real quantities started to appear around 1990 with 386 and 486 ''compatible'' processors, often named similarly to Intel's original chips. Other companies, which designed or manufactured x86 or X87 processors, include ITT Corporation , National Semiconductor , ULSI Systems , and Weitek .

Following the fully Pipelined I486 , Intel introduced the Pentium brand name (which could be trademarked, unlike numbers) for their new line of Superscalar ''x86'' designs. With the 80x86 naming scheme now legally cleared, IBM partnered with Cyrix to produce the 5x86 and then the very efficient 6x86 (M1) and 6x86 MX ( MII ) lines of Cyrix designs, which were the first ''x86'' (''instruction set architecture'') chips implementing Register Renaming to enable Speculative Execution . AMD meanwhile designed and manufactured the advanced but delayed 5k86 ( K5 ), heavily based on their earlier 29K RISC type (hardware) ''microarchitecture''. Like NexGen 's Nx586 , it used a strategy where dedicated pipeline stages decode ''x86'' instructions into uniform and easily handled Micro-operation s, a method that has remained standard to this day.

Some early versions of these competitors' chips had heat dissipation problems. The 6x86 was also affected by a few minor compatibility issues, and the Nx586 lacked an FPU as well as (the then crucial) pin-compatibility, while the K5 had somewhat disappointing performance when it was (eventually) launched. A low customer awareness of alternatives to the Pentium line further contributed to these designs being comparatively unsuccessful, despite the fact that the K5 had very good Pentium compatibility and the 6x86 was significantly faster than the Pentium on integer code.It had a slower Floating Point Unit however, which is slightly ironic as Cyrix started out as a designer of fast Floating point units for x86 processors. On the other hand, AMD later established itself as a serious contender with the K6 line of processors, which gave way to the highly successful Athlon and Opteron . There were also other contenders, such as Centaur Technology , ( IDT ), Rise Technology , and Transmeta . VIA Technologies ' energy efficient C3 and C7 processors were designed by Centaur and are in full production today.

The architecture has twice been extended to a larger Word Size . In 1985, Intel released the 32-bit 386 to gradually replace the earlier 16-bit chips (which were sold for many more years). This extension to the architecture is sometimes called x86-32 to differentiate it from the original "x86-16" or the newer X86-64 extension. However, it was originally referred to as i386 by Intel (and others) and later renamed IA-32 (for Intel '''A'''rchitecture-'''32'''-bit) when Intel unveiled its unrelated 64-bit Itanium architecture, referred to as IA-64 . In 1999-2003, AMD further extended the architecture to 64 bits, originally called X86-64 in AMD documents, but now AMD64 . Intel soon adopted AMD's architectual extensions under the name IA-32e which was later renamed EM64T and finally Intel 64 (no to be confused with the unrelated IA-64 architecture). Microsoft and Sun Microsystems have used their own vendor-neutral X64 for this same X86-64 architecture.


DESIGN



Technical overview

The x86 architecture is a variable instruction length, primarily two-address, " CISC " design with emphasis on Backward Compatibility . The instruction set is not typical CISC however, but basically an extended and orthogonalized version of the simple eight-bit 8085 architecture. Words are stored in Little-endian order and 16-bit and 32-bit accesses are allowed to unaligned memory addresses. To conserve opcode space, most register-addresses are three bits, and at most one operand can be in memory (in contrast with some highly orthogonal CISC designs such as PDP-11 where both operands can be in memory), but this memory operand may also be the ''destination'', while the other operand, the ''source'', can be either ''register'' or ''immediate''. This contributes, among other factors, to a code footprint that rivals 8-bit machines and enables efficient use of instruction cache memory. During Execution , current x86 processors employ a few extra decoding steps to split most instructions into smaller pieces, micro-ops, which are readily executed by a Micro-architecture that could be (simplistically) described as a RISC -machine without the usual load/store limitations. The small number of general registers (also inherited from 8085) has made register-relative addressing (using small immediate offsets) an important method of accessing operands, especially on the stack. Much work has therefore been invested in making such accesses as fast as register accesses, i.e. a one cycle instruction throughput in most circumstances.


Segmentation

Minicomputers during the late 1970s were running up against the 16-bit 64- KB address limit, as memory had became cheaper. Most such companies therefore redesigned their processors to directly handle 32-bit addressing and data. The original 8086, developed from the simple 8085 microprocessor and primarily aiming at another market, instead adopted a much-criticized concept of segment registers which raised the memory address limit by only 4 bits, to 20 bits (1 Megabyte ).

Data and/or code could be managed within "near" 16-bit segments within this 1 , 8085 , and Z80 to the newer processor. Seven years later, in 1985, this cumbersome addressing model was effectively factored out by the introduction of 32-bit offset registers, in the 386 design.


The original 8086 and 8088

The original Intel 8086 and 8088 have fourteen 16- Bit Registers . Four of them (AX, BX, CX, DX) are general registers (although each have an additional purpose; for example only CX can be used as a counter with the ''loop'' instruction). Each can be accessed as two separate bytes (thus BX's high byte can be accessed as BH and low byte as BL). Four segment registers (CS, DS, SS and ES) are used to form a memory address. There are two pointer registers. SP points to the bottom of the stack and BP which is used to point at some other place in the stack or the memory(Offset). Two registers (SI and DI) are for array indexing.The FLAGS Register contains Flag s such as Carry Flag , Overflow Flag and Zero Flag . Finally, the instruction pointer (IP) points to the current instruction.

The 8086 has 64 KB of 8-bit (or alternatively 32 K-word of 16-bit) s, which can be invoked by both hardware and software. The interrupts can cascade, using the stack to store the Return Address .


Real mode

See Also: Real mode



Real mode is an operating mode of 80286 and later X86 -compatible CPUs . Real mode is characterized by a 20 bit segmented memory address space (meaning that only 1 MB of memory can be addressed), direct software access to BIOS routines and peripheral hardware, and no concept of Memory Protection or Multitasking at the hardware level. All x86 CPUs in the 80286 series and later start up in real mode at power-on; 80186 CPUs and earlier had only one operational mode, which is equivalent to real mode in later chips.

In real mode, memory access is ''segmented''. This is done by shifting the segment address left by 4 bits and adding an offset in order to receive a final 20-bit address. For example, if DS is A000h and SI is 5677h, DS:SI will point at the absolute address DS × 16 + SI = A5677h. Thus the total address space in real mode is 220 bytes, or 1 MB , quite an impressive figure for 1978. All memory addresses consist of both a segment and offset; every type of access (code, data, or stack) has a default segment register associated with it (for data the register is usually DS, for code it is CS, and for stack it is SS). For data accesses, the segment register can be explicitly specified (using a segment override prefix) to use any of the four segment registers.

In this scheme, two different segment/offset pairs can point at a single absolute location. Thus, if DS is A111h and SI is 4567h, DS:SI will point at the same A5677h as above. This scheme makes it impossible to use more than four segments at once. CS and SS are vital for the correct functioning of the program, so that only DS and ES can be used to point to data segments outside the program (or, more precisely, outside the currently-executing segment of the program) or the stack. This scheme was intended as a compatibility measure with the Intel 8085 .

The segmented nature can make programming and compilers design difficult because the use of near and far pointers affect performance. The introduction of bank switching schemes such as EEMS made programming even more complicated before the adoption of 32 bit addressing methods with later processors.


16-bit protected mode

See Also: Protected mode



In addition to real mode, the Intel 80286 supports protected mode, expanding addressable Physical Memory to 16 MB and addressable Virtual Memory to 1 GB . This is done by using the segment registers only for storing an index to a segment table. There were two such tables, the Global Descriptor Table (GDT) and the Local Descriptor Table (LDT), each holding up to 8192 segment descriptors, each segment giving access to 64 KB of memory. The segment table provided a 24-bit Base Address , which can be added to the desired offset to create an absolute address. Each segment can be assigned one of four Ring levels used for hardware-based Computer Security .

Because real mode DOS programs may do direct hardware access or perform segment arithmetic, both incompatible with protected mode, an Operating System (OS) is limited in its ability to run these applications as Process es. To overcome these difficulties, Intel introduced the 80386 with Virtual 8086 Mode . While still subject to paging, it uses real mode to form linear addresses and allows the OS to Trap both I/O and memory access. By design, protected mode programs do not assume a relation between selector values and physical addresses.

Operating systems like OS/2 1.x try to switch the processor between protected and real modes. This is both slow and unsafe, because a real mode program can easily Crash a computer. OS/2 1.x defines restrictive programming rules allowing a ''Family API'' or ''bound'' program to run in either real or protected mode.

Windows 3.0 should run real mode programs in 16-bit protected mode. Windows 3.0 , when transitioning to protected mode, decided to preserve the single privilege level model that was used in real mode, which is why Windows applications and DLLs can hook interrupts and do direct hardware access. That lasted through the Windows 9x series. If a Windows 1.x or 2.x program is written properly and avoids segment arithmetic, it will run the same way in both real and protected modes. Windows programs generally avoid segment arithmetic because Windows implements a software virtual memory scheme, moving program code and data in memory when programs are not running, so manipulating absolute addresses is dangerous; programs should only keep Handle s to memory blocks when not running. Starting an old program while Windows 3.0 is running in protected mode triggers a warning dialog, suggesting to either run Windows in real mode or to obtain an updated version of the application. Updating well-behaved programs using the MARK utility with the MEMORY parameter avoids this dialog. It is not possible to have some GUI programs running in 16-bit protected mode and other GUI programs running in real mode. In Windows 3.1 real mode disappeared.


32-bit protected mode


The design supporting Paging . All of the registers, instructions, I/O space and memory are 32-bit. Memory is accessed through a 32-bit extension of protected mode. As in the 286, segment registers are used to index a segment table describing the division of memory. With a 32-bit offset, every application may access up to 4 GB (or more with Memory Segment s). In addition, 32-bit protected mode supports Paging , a mechanism making it possible to use Virtual Memory . An exception to this design is the Intel 80386SX , which is 32-bit with 24-bit addressing and a 16-bit Data Bus .

No new general-purpose registers were added. All 16-bit registers except the segment registers were expanded to 32 bits. This is represented by prefixing an "E" (for '''Extended''') to the register Opcode s (thus the expanded AX became EAX, SI became ESI and so on). With a greater number of registers, instructions and operands, the Machine Code format was expanded. To provide backward compatibility, segments with executable code can be marked as containing either 16 or 32 bit instructions. Special prefixes allow inclusion of 32-bit instructions in a 16-bit segment or vice versa.

Paging and segmented memory access are required for modern multitasking operating systems. Linux , 386BSD and Windows NT were developed for the 386 because it was the first Intel architecture CPU to support paging and 32-bit segment offsets. The 386 architecture became the basis of all further development in the x86 series. The success of Windows 3.1 , the first widely accepted version of Microsoft Windows , was largely due to its ability to take advantage of 386 features, even though it was used mainly to run multiple sessions rather than to take advantage of the native 32-bit Instruction Set .

The Intel 80387 Math Co-processor was integrated into the next CPU in the series, the Intel 80486 (the 486SX, sold as a budget processor, had its co-processor disabled or removed). The new Floating Point Unit (FPU) performed Floating Point calculations, important for scientific applications and graphic design.


MMX and beyond

See Also: MMX



MMX is a SIMD instruction set designed by Intel, introduced in 1997 for Pentium MMX microprocessors. It developed out of a similar unit first used on the Intel I860 . It is supported on most subsequent IA-32 processors by Intel and other vendors. MMX is typically used for video applications.

MMX added 8 new 64-bit registers to the architecture, known as MM0 through MM7 (generically MMn). In reality, these new registers are aliases for the existing x87 FPU stack registers. Hence, anything done to the floating point stack also affects the MMX registers. Unlike the floating point stack, these MMn registers are Randomly Accessible .


3DNow!

See Also: 3DNow!



In 1997 AMD introduced 3DNow! which consisted of SIMD floating point instruction enhancements to MMX. The introduction of this technology coincided with the rise of 3D entertainment applications and was designed to improve the CPU's Vector Processing performance of graphic-intensive applications. 3D video game developers and 3D graphics hardware vendors use 3DNow! to enhance their performance on AMD's K6 and Athlon series of processors.


SSE

See Also: Streaming SIMD Extensions
SSE2
SSE3



In 1999, Intel introduced the Streaming SIMD Extensions (SSE) Instruction Set which added eight new 128 bit registers (not overlaid with other registers) and 70 floating point instructions.

In 2000 Intel introduced the SSE2 instruction set, adding a complete complement of integer instructions (analogous to MMX) to the original SSE registers and 64-bit SIMD floating point instructions to the original SSE registers. The first addition made MMX almost obsolete, and the second allowed the instructions to be realistically targeted by conventional compilers.

Introduced in 2004 along with the ''Prescott'' revision of the Pentium 4 processor, SSE3 added specific memory and Thread -handling instructions to boost the performance of Intel's HyperThreading technology. AMD licensed the SSE3 instruction set and implemented most of the SSE3 instructions for its revision E and later Athlon 64 processors. The Athlon 64 does not support HyperThreading and lacks those SSE3 instructions used only for HyperThreading.


64-bit Long mode

See Also: x86-64


By 2002, it was obvious that the 32-bit address space of the x86 architecture was limiting its performance in applications requiring large data sets. A 32-bit address space would allow the processor to directly address only 4 GB of data, a size surpassed by applications such as Video Processing and Database Engine s, while using the 64-bit address, one can directly address 16777216 TiB (more than 16 billion MB) of data, although most 64-bit architectures don't support access to the full 64-bit address space (AMD64, for example, supports only 48 bits, split into 4 paging levels, from a 64-bit address).

AMD , who would traditionally follow the lead of Intel, took the initiative of extending the 32-bit x86 architecture to 64-bit , initially calling it ''x86-64'', later renaming it ''AMD64''. The Opteron , Athlon 64 , Turion 64 , and later Sempron families of processors use this architecture. The success of the AMD64 line of processors coupled with the lukewarm reception of the IA-64 architecture prompted Intel to reverse-engineer and adopt the instruction set, adding new extensions of its own and branding it the ''EM64T'' architecture, and later re-branding it ''Intel 64''.

In its literature and product version names, Microsoft and Sun refer to AMD64/Intel 64 collectively as ''x64'' in the Windows and Solaris operating systems respectively. Linux distributions refer to it either as "x86-64", its variant "x86_64", or "amd64". BSD systems use "amd64" while Mac OS X uses "x86_64".

This was the first time that a ''major'' upgrade of the x86 architecture was initiated and originated by a manufacturer other than Intel. It was also the first time that Intel accepted technology of this nature from an outside source.


Virtualization

x86 Virtualization is difficult because the architecture did not meet the Popek And Goldberg Requirements until recently. Nevertheless, there are several commercial X86 Virtualization products, such as VMware , Parallels and Microsoft Virtual PC , as well as Open Source virtualization projects such as Bochs , QEMU . Other solutions, such as the Kernel-based Virtual Machine ("KVM"), require newer processors which provide better hardware support for virtualization.

Intel and AMD have introduced x86 processors with hardware-based virtualization extensions that overcome the classical virtualization limitations of the x86 architecture. These extensions are known as Intel VT (IVT or simply VT) that was code named "Vanderpool," and AMD-V that was code named "Pacifica." Although most modern x86 server-based and many modern x86 desktop-based processors include these extensions, the technology is generally considered immature at this point with most software-based virtualization outperforming these extensions. A Comparison of Software and Hardware Techniques for x86 Virtualization This is expected to change as the technology matures.


SEE ALSO



FOOTNOTES



REFERENCES