Instructions and Instruction Sets

The following topics will be covered here:

Machine Instructions

Recall our simple example:

int a = 10;
int b = 5;
int c = a + b;

As already stated, the processor doesn't understand 'near-English' syntax, such as shown in the three lines shown above. Instead, it can only comprehend machine instructions. These are simply streams of binary digits (bits), such as 010010110011110101011... You get the idea.

A given processor can perform many different instructions. Some instructions might be devoted to simple logical operations, while others may involve mathematical operations (such as addition and multiplication). Some may be very complicated indeed. Each machine instruction that the processor can perform is defined by a unique binary string, i.e. a specific sequence of binary digits. The binary representation of an instruction is termed an op-code.

For example, one of the most simple instructions is "Add 1 to register AX." (Take note that registers are named with a two letter system... AX, BX, CX, SP, BP and SI, for example, are all registers.) The simplest way to do this is to perform the instruction whose mnemonic (in assembly) is INC AX. This instruction assembles to give the binary opcode: 0100 0000B (40H).

How does a number like this cause the CPU to do something so... magical. The immediate effect is that there are eight transistors which each correspond to one of the bits in the eight bit sequence. One of these bits is set to 1, while the other seven are set to 0. This causes of cascade of downstream effects in the CPU, whereby thousands (or potentially millions) of transistors undergo electrical changes. The final result is that the value stored in register AX gets incremented by 1.

Running the Program

Recall that a program is really just a long series of these op-codes. The processor runs a program by starting with the first instruction, and then performing each instruction, one by one, until it reaches the end. Not too complicated is it? (This is a slight simplification, since true programs perform conditional branching. In other words, they branch from one point in the program to another based on a condition.)

But how does the program now which instruction is next when running a program? To understand this, you must first understand that a program is really just data, and when you run a program, this data is living somewhere in main memory. So when you run a program, the memory address (the location of a particular region of memory) of the first instruction is pointed to by a special register called the the instruction pointer (IP), also known as the program counter. Thus the CPU knows where this instruction 'lives'. The instruction itself is fetched from memory into a register. At this point, the instruction can actually be performed. As this happens, the value held in the IP is incremented to reflect the location (memory address) of the next instruction. Then the process repeats. Simple as that!

The Instruction Set and x86 CPU

The instruction set determines the instructions that a given CPU can perform. It therefore follows that when we write a program and compile / assemble it to machine code, the resulting op-codes must be compatible with the processor we want to run the program on. We often refer to the compilation step as being platform-specific. While the actual definition of a platform varies depending on context, here it means that we must make sure that the compiler is creating machine code for a particular type or class of CPU. For example, if we take our original program's source code and compile it on an Amiga 500, it will run fine on a Motorola MC68000 (the CPU that is the core of the Amiga), but it wont run on a Pentium chip inside a PC. That's because they are two different platforms.

You may well be thinking, "If the code is compiled for a particular type of CPU, how come a program designed for a 486 will run on my Pentium III?" Good question. The answer is that while the instruction set of the Pentium III is significantly larger and more complex than that of the 486, the Pentium III's instruction set is also backwards compatible with the earlier instruction set. In other words, instructions that existed on the 486 are also available on the newer CPU.

In fact, all of Intel's CPUs (and the CPUs of other manufacturers such as AMD and Cyrix) have an instruction set that is either totally or almost totally compatible with the instruction set of the original 8086 CPU that was designed some 25 years ago! This is referred to generically as the x86 instruction set. Strange to think that 25 year-old code will work on your 2GHz Pentium 4, that is... if it doesn't generate a divide-by-0 error (which often results from running code quicker than it was designed for - but that's a an issue with the software design, not the instruction set itself).

Now move on to look at the steps involved in performing an instruction.