15. The Minicomputer Revolution
Part of
the 22C:122/55:132 Lecture Notes for Spring 2004
|
In 1965, Digital Equipment Corporation announced the PDP-8; like the IBM System 360, this was advertised as an integrated circuit machine, but it was based on hybrid integrated circuits (DEC's trademarked FLIP-CHIP technology), and in fact, the machine only used token numbers of these devices. As with the 360, this machine began life with support for 8-bit ASCII (they used 7 bit ASCII padded to 8 bits with a 1 in the high order bit), but in almost every other way, the PDP-8 was out on the opposite side of the spectrum from the System 360.
The machine is covered in chapters 5 of Bell and Newell,
http://research.microsoft.com/~gbell/Computer_Structures__Readings_and_Examples/
It is worth noting that Bell was the chief architect of this machine! The PDP-8 was upward compatable from the PDP-5, and generally, today, the -5 is considered part of the -8 family, as something of a dry run for the architecture that was perfected as the -8. As with the System 360, the PDP-8 family grew by the addition of low-performance low-cost machines at one extreme, and high performance higher cost machines at the other extreme.
The big difference was one of scale. The PDP-8 was designed to be inexpensive; the original model was sold for $18,000; a follow-up models, the 8/S, sold for $10,000 and the 8/E sold for $7,000 in 1970. By that year, DEC claimed that the PDP-8 was the best selling computer in the world, but in part, they were able to make this claim because IBM was not selling most of their computers, they were leasing them. The price decline for the PDP-8 continued until the end of the family's lifetime in around 1990, by which time it was a monolithic microprocessor made by Intersil and Harris. You can still find this processor on E-bay for In 2004, used Harris 6120 chips could be had for $49 or less, with the value propped up largely because of the historical importance of the architecture.
The minimal PDP-8 had a 12-bit word, a 12-bit memory address, and a single accumulator with a carry bit, so the entire processor could be described as:
0 1 2 3 4 5 6 7 8 9 A B _______________________ |_________|_____________| page word PC program counter _ _______________________ |_| |_______________________| L link AC accumulator _______________________ |_____|_|_|_____________| op i z disp IR instruction registerNote that the link bit L was effectively the carry bit out of the accumulator, although carry into the link caused it to toggle, and as a result, the machine was sometimes described as having a 13-bit accumulator.
The instruction register IR had a 3-bit opcode field, 2 address mode bits called i and z, and a 7-bit displacement field. Addressing relied, on part, on the division of the program counter into a page field and a word-in-page field. The following addressing modes were used:
i z 0 0, page zero mode; ea = 0 [] disp 0 1, current page mode; ea = page [] disp 1 0, indirect page zero mode; ea = Memory[ 0 [] disp ] 1 1, indirect current page mode; ea = Memory[ page [] disp ]In the above, the [] operator is used to concatenate the fields of the addres. In practice, page zero was used for linkage pointers and for global variables, while current page addressing was used for local variables and pointers.
None of the above were particularly innovative, but the PDP-8 family introduced one important memory addressing innovation, autoincrement addressing. On the PDP-8, memory locations 8 to 15 were described as the autoincrement registers, even though they were main memory locations and not technically registers. These memory locations behaved normally when they were the operands of direct addressing operations, but when used for indirect addressing, each use of the location had the side effect of incrementing that location.
With only a 3-bit opcode field, the instruction set of the PDP-8 was extremely simple!
op 000 AND AC = AC & Memory[ ea ]; 001 TAD AC = AC + Memory[ ea ]; 010 ISZ if (++Memory[ ea ] == 0) PC++; 011 DCA Memory[ ea ] = AC; AC = 0; 100 JMS Memory[ ea ] = PC; PC = ea + 1; 101 JMP PC = ea; 110 IOT -- input/output transfer 111 -- microcoded instructionsThe microcoded instructions were interesting because they used the 9 bits below the opcode field to directly control the data flow through the ALU using something closely resembling horizontal microcode:
Microcoded instructions group 1 _______________________ |1_1_1_0|_|_|_|_|_____|_| 4 5 6 7 rot B bit name function 4 CLA AC = 0 5 CLL L = 0 6 CMA AC = ~AC 7 CML L = ~L rot 000 no rotation 100 RAL rotate L[]AC one place left 010 RAR rotate L[]AC one place right 101 RTL rotate L[]AC two places left 011 RTR rotate L[]AC two places right B IAC L[]AC++If these bits were combined in one instruction they were evaluated in left-to right order, for example, the combination CLA IAC had the effect of loading the constant 1, while CLA CLL CML RTL loaded the constant 2. (There were incompatabilities between different models when IAC was combined with shifting or when an attempt was made to shift both left and right in the same instruction!)
The second group of microcoded instructions was used for conditional branches:
Microcoded instructions group 2 _______________________ |1_1_1_1|_|_|_|_|_|0_0_0| 4 5 6 7 8 bit name function 4 CLA clear accumulator 5 SMA if AC < 0, skip 6 SZA if AC = 0, skip 7 SNL if L = 1, skip 8 reverse sense of skipThese allowed the next instruction to be skipped if some condition or combination of conditions is met. If these are combined with CLA, the accumulator is cleared after the skip condition is tested. The combination SMA SZA skips if the accumulator is negative or zero, while inverting this (by setting bit 8) skips if the accumulator is strictly positive. Conditional branch is done by skipping a branch instruction.
The basic PDP-8 address space of only 4K words of 12 bits each is pathetically small, but it was sufficient to support a FORTRAN compiler, an assembler, and an interpreter for a simple language called FOCAL, similar in spirit to BASIC. Most of the smallest PDP-8 configurations were sold as embedded controllers for machine tools and other heavy equipment; nuclear reactor control, telecommunications line multiplexing and similar jobs were natural for this little machine.
However, there was also a demand for a very small general purpose computer, and to meet it, from the start, DEC offered a rudimentary but effective memory management unit for the PDP-8 and an extended arithmetic element that expanded the CPU by one register (the multiplier-quotient register) and added hardware multiply and divide instructions along with double-word (24-bit) shift operations.
The memory management unit supported 15-bit physical addresses by organizing physical memory as 8 segments of 4K bytes each. At any moment, a program could address one of two segments, the current code segment, and the current data segment. It is fair to think of the code and data segment registers as 3-bit extensions to the 12-bit effective program counter and effective address.
0 1 2 3 4 5 6 7 8 9 A B _____ _______________________ |_____|_________|_____________| code page word segment PC program counter _____ |_____|_ _ _ _ _ _ _ _ _ _ _ _| data segmentAll direct addressing and instruction fetches used the current code segment, while all indirect addressing used the current data segment; Normally, these were equal, so a program would run in one segment just as it had on a simple machine, but if the data segment was changed, the program could use indirect addressing to move data to or from any other segment.
The PDP-8 memory management unit was a genuinely third-generation feature in that it supported a protection mechanism that would force a trap if a program tried to perform any input-output operations. The result was a machine resiliant enough that secure timesharing systems were developed for it, allowing up to 7 concurrent users even without auxiliary memory, and more if a swapping disk was available.
The term minicomputer was coined in the late 1960's bu someone on DEC's British sales staff, in the era of the miniskirt and the Austin Mini. By 1970, with the advent of MSI integrated circuits, it was clear that a far more powerful computer could be built for a very similar price. While DEC moved forward with the PDP-8/E, implemented using MSI technology, they began a carefully planned exploration of alternative designs for a 16-bit minicomputer.
Like the IBM System 360 family, the PDP-11 was intended, from the start, to be a family of compatable computers, with various models planned for different points on the price-performance tradeoff curve. Low-end machines were intended to be very inexpensive and high-end machines were intended to be very high performance, within the limits of the 16-bit world. The original was implemented with MSI TTL, but by 1975, a 4-chip LSI processor, the LSI-11, was in production, and the Harris J11 from 1983 was (for its time) a high end monolithic microprocessor.
Chapters 38 and 39 of Siewiorek Bell and Newell cover the PDP-11 quite well, the former is a reprint of the original 1970 paper describing this architecture, while the latter is a retrospective from the late 1970's.
http://research.microsoft.com/~gbell/Computer_Structures_Principles_and_Examples/
What would you do if you had years of experience with the PDP-8 and were given the option of adding another 4 bits to the instruction word, with addressing expanded to a 64K memory address space?
Ed DeCastro, an engineer who had overseen the development of the first SSI (small scale integrated circuit) PDP-8, the PDP-8/I, took one approach: Add a bit to the opcode field, add two bits to select between four accumulators, and add one bit to the addressing mode to allow for indexed addressing using one of two index registers. This design was rejected by DEC, and DeCastro quit to found Data General Corporation; the new machine became the Nova, and later, there was a SuperNOVA (a big version). The MicroNOVA is probably the legitimate claimant to status as the first monolithic VLSI 16-bit processor; it was introduced in 1975, and the other major claimant, the Intel 8080, was mostly an 8-bit micro with limited 16-bit support.
Harold McFarland, a student of Gordon Bell, designed two other competing 16-bit designs. Gordon Bell and Bill Wulf, both on the Faculty of Carnegie Mellon University, were outside consultants brought in to help determine what would become the PDP-11; they modified and fleshed out one of McFarland's designs to become the machine that would launch DEC out of the minicomputer arena and into the stratosphere of high-end architectures.
The PDP-11 architecture assumes a 16-bit word, byte-addressable main memory, and 8 general purpose registers. The following basic instruction format is used:
_______ _____ _____ _____ _____ |_______|_____|_____|_____|_____| | op(4) | m(3) r(3)| m(3) r(3)| src dstThe opcode is 4 bits, and the two operands are specified by a 3-bit addressing mode m and a 3-bit register specifier r. The following addressing modes are supported:
_____ _____ |_____|_____| | m r | effective operand 0 0 0 register Reg[r] 0 0 1 register deferred Memory[Reg[r]] 0 1 0 auto increment Memory[Reg[r]++] 0 1 1 auto inc deferred Memory[Memory[Reg[r]++]] 1 0 0 auto decrement Memory[--Reg[r]] 1 0 1 auto dec deferred Memory[Memory[--Reg[r]]] 1 1 0 indexed Memory[Reg[r]+Memory[pc++]] 1 1 1 indexed deferred Memory[Memory[Reg[r]+Memory[pc++]]]Looking at these addressing modes, it is clear that auto increment and auto decrement addressing can be used to form push and pop operations on a stack, so single PDP-11 instructions can frequently do the equivalent of a single Burroughs instruction syllable. This involves memory-to-memory arithmetic on the -11, so it's not a good idea, but low-end compilers frequently used this approach because compiling for a pure stack machine is easy.
Another important thing to note is that some of these addressing modes cause additional fetches from the instruction stream, so the actual instruction format will only be 16 bits if the instruciton has only register operands, while the instruction may be 32 or 48 bits long if there are index fields:
_______ _____ _____ _____ _____ |_______|_____|_____|_____|_____| basic instruction _______ _____ _____ _____ _____ |_______|_____|_____|_____|_____| basic instruction |_______________________________| index word for src or dst _______ _____ _____ _____ _____ |_______|_____|_____|_____|_____| basic instruction |_______________________________| src index word |_______________________________| dst index wordFinally, it is worth noting that the PDP-11 used general register 7 as the program counter, so several addressing modes involving the program counter had interesting consequences:
_____ _____ |_____|_____| | m r | effective operand 0 0 0 1 1 1 program counter pc 0 1 0 1 1 1 immediate Memory[pc++] 0 1 1 1 1 1 direct Memory[Memory[pc++]] 1 1 0 1 1 1 relative Memory[pc + Memory[pc++]] 1 1 1 1 1 1 relative deferred Memory[Memory[pc + Memory[pc++]]]The modes listed above are useful, while the others, such as autodecrement program counter, are more curiosities than anything useful (consider the effect of a move instruction with autodecrement program counter addressing for both source and destination addresses!)
The instruction set of the PDP-11's set of 2-operand instructions was fairly uniform:
_______ _____ _____ _____ _____ |_______|_____|_____|_____|_____| | op | src | dst | 0 0 0 1 MOV 16 bit, dst = src, also sets CC 1 0 0 1 MOVB 8 bit 0 0 1 0 CMP 16 bit, dst - src, results to CC only 1 0 1 0 CMPB 8 bit 0 0 1 1 BIT 16 bit, dst & src, results to CC only 1 0 1 1 BITB 8 bit 0 1 0 0 BIC 16-bit, dst = dst & ~src, also sets CC 1 1 0 0 BICB 8-bit 0 1 0 1 BIS 16-bit, dst = dst | src, also sets CC 1 1 0 1 ADD 16-bit, dst = dst + src, also sets CC 0 1 1 0 SUB 16-bit, dst = dst - src, also sets CC
Note that this instruction set is far more orthogonal than any prior machine; all addressing modes apply to all of these instructions, and almost all of the instructions are available in byte and word forms. It is not fully orthogonal, since ADD and SUB were only applicable to 16-bit words.
Opcodes 0000 was used for single-operand instructions, stealing the src field for 6 additional opcode bits, and many of these were also fully orthogonal. Opcode 1111 was used for various extended arithmetic instructions, such as multiply, divide, and floating point; many of these were optional and different models of the PDP-11 were not fully compatable with some of these instruction variants, few of these instructions had the general orthogonality of the core instruction set; for example; all two-operand arithmetic extensions required that one operand be in a register, with at most one general 6-bit operand address with all the addressing modes, and some were purely register-to-register.
The PDP-11 introduced the 4-bit condition code register that is still present in most modern microprocessors:
N -- the result of the last ALU operation was negative Z -- the result of the last ALU operation was zero V -- the last ALU operation involved a 2's complement overflow C -- the last ALU operation produced a carry out of the high bit
Notice that overvlow (the V bit) indicates that the sign of the result is wrong, for example, when two positive numbers were added to produce a negative result in the 2's complement number system. The abbreviation V for overflow was chosen to avoid the confusion of capital-O with zero.
Another innovation of the PDP-11 that was copied by most of the microprocessors that came in the late 1970's was the relative branch instruction for testing the condition codes. The PDP-11 supported the following branch instructions:
_ _____________ _______________ |_|0_0_0_0|_____|_______________| |b| op | bb | disp | 0 0 0 1 BR Branch always 0 0 1 0 BNE Branch if not equal (Z = 0) 0 0 1 1 BEQ Branch if equal (Z = 1) 0 1 0 0 BGE Branch if greater or equal (N xor V = 0) 0 1 0 1 BLT Branch if less than (N xor V = 1) 0 1 1 0 BGT Branch if greater than (Z or (N xor V) = 0) 0 1 1 1 BEQ Branch if less or equal (Z or (N xor V) = 1) 1 0 0 0 BPL Branch if positive (N = 0) 1 0 0 1 BMI Branch if minus (N = 1) 1 0 1 0 BHI Branch higher (unsigned) (C or Z = 0) 1 0 1 1 BLOS Branch if lower or same (C or Z = 1) 1 1 0 0 BVC Branch if overflow clear (V = 0) 1 1 0 1 BLT Branch if overflow set (V = 1) 1 1 1 0 BGT Branch if carry clear (C = 0) 1 1 1 1 BEQ Branch if carry set (C = 1)
This set of conditional branches is essentially identical to that of the Motorola 68000, the Intel 80x86 and many other later machines; of course, they use different symbolic names, they use different numerical opcodes, and the sense of some of the condition code bits is inverted, but the basic idea is preserved with little change in these newer architectures.
These branch instructions added the sign-extended displacement to the program counter, so you could branch minus 127 or plus 128 from the location of the branch (or plus 127 and minus 128 from the location after the branch, which was how the destination address was actually computed). Because all PDP-11 instructions were a multiple of 16-bit words, the branch displacement was multiplied by 2, so we are counting words in this paragraph, not bytes!
The PDP-11 had a simple one-operand jump instruction that set the program counter to the effective address of the operand -- obviously, it was illegal to use a register-mode operand because you can't jump to a register. The function call and return instructions were more interesting, but ultimately, they were one of the design mistakes in the PDP-11. No other instructions assumed the use of a particular register as the stack pointer, but the function calling instructions demanded that R6 be used as a stack pointer:
_______________________________ |0_0_0_0_1_0_0|_____|_____|_____| | | src | m r | dst JSR src,dst Memory[--sp] = Reg[src] Reg[src] = pc pc = ea(dst) _______________________________ |0_0_0_0_0_0_0_0_1_0_0_0_0|_____| | | src | RTS src pc = Reg[src] Reg[src] = Memory[sp++]
These function calls not only assume that the programmer needs the previous program counter in a register, but that the programmer insists on using the stack for linkage -- and there are interesting linkage conventions that are not stack based! Fortunately, if the src fields of these instructions are set to 7, implying the program counter itself is the source, one of the register transfers in the call and return becomes a no-op, so you aren't forced to waste a register.
An important innovation in the PDP-11 was at the processor-memory-switch level! Previous machines had tended to have special opcodes reserved for input-output, but in the PDP-11, these were eliminated; instead, a single bus, the UNIBUS, was used for access to all memory and devices; addresses from 0 to 56K were used for memory, while addresses in the top 8K bytes of the address space were used for input/output devices.
The 16-bit address space of the PDP-11 seemed modestly large in 1970; Most of the market for PDP-8 systems was for machines with 8K 12-bit words of memory, and a PDP-8 with the full 32K 12-bit words had a total memory equal to a PDP-11 with 48K bytes. In 1970, core memory modules were typically made with a capacity of 8K bytes, so a fully loaded 56K machine was made with 7 memory modules.
On the other hand, by 1973, it was obvious that the PDP-8 was a viable competitor with much larger machines, particularly when a floating point coprocessor was included, but to compete effectively, a memory management unit was also required. The UNIBUS actually had 18-bit address and data paths, in part, so that DEC's older 18-bit machines could be re-implemented using PDP-11 technology; the result of that effort was the PDP-15, but the same 18-bit data paths allowed a memory management unit to be built for the PDP-11 that took 16-bit virtual addresses and produced 18-bit physical addresses.
This problem of working with a small virtual address space on a machine with a larger physical address space had already been faced in the PDP-8, where the virtual address was 12 bits and the physical address was 15 bits. With the PDP-11, though, DEC took a new approach to solving this problem:
_____ _______________ _________ |_____|_______________|_________| virtual address seg(3)| block(8) | byte(5) | _________________________ _________ |_________________________|_________| physical address block(13) | byte(5) |(The above figure is from memory, I may be wrong.)
The memory management unit divided the virtual address space into 8 segments, where each segment consisted of from 0 to 256 blocks of 32 bytes. Segment descriptors were 32 bits each, including the base block number of the segment, the block count for the segment, and an assortment of access-rights and bookkeeping bits. Later PDP-11 systems were built using the UNIBUS only for I/O transfers, with a separate memory bus with a larger address space, so the physical address was expanded from 18 to 22 bits with the 11/70 in 1975.
Unfortunately, two incompatable memory management units were introduced, the 11/40 MMU in 1973, with a single address space per process, and the 11/45 MMU in 1972, allowing each process to have separate code and data spaces; the latter idea was an obvious successor of the scheme used on the PDP-8, but it was difficult to use because the division between code and data on the PDP-11 is not straightforward (PDP-11 code frequently contains operands that are addressed as part of the program but are addressed using addressing modes that don't hint at this in an obvious way.
The Motorola 680x0 family is clearly derived from the PDP-11, but the influence of the PDP-11 goes far beyond this! The C programming language was developed with the PDP-11 as its first target. In fact, the autoincrement and autodecrement addressing modes of the PDP-11 directly inspired the inclusion of these features in C; even the notation (prefix and suffix ++ and --) appears to take its inspiration from the PDP-11 assembly language notation for autoincrement and autodecrement addressing. Only after Unix was ported to the PDP-11 did that system mature, and it was the availability of virtual memory mechanisms on the PDP-11 that led to the growth of Unix into a secure operating system that could support the breadth of applications Unix supports today.
DEC's lead operating system for the PDP-11, RT-11, was not a great system, but the RT-11 command language had a huge influence on CPM, the first widely used operating system on Intel 8080 based microcomputers; this, in turn, had a significant effect on DOS, which, after purchase by Microsoft, became the foundation from which Windows grew.
The presence of incompatable memory management models and incompatable floating point coprocessors clearly harmed the market for the PDP-11, but PDP-11 sales continue to this day -- DEC dropped the product line in the early 1990's, before Compaq purchased the company, but clones are still in production -- today, you can even buy new PDP-11 compatable coprocessor boards for the IBM PC.
By the mid 1970's, it was obvious that there was a market for high-end PDP-11 systems, and DEC began development of the virtual address extension to the PDP-11 model 78. Eventually, this virtual addressing extension project took over the entire 11/78 effort, leading to a new 32-bit architecture, the VAX architecture, with its first example being the VAX 11/780 (the name and numbering scheme really do tell the story of the origin of this architecture).
Motorola, observing the clean elegance of the PDP-11, used it as the basis for their 680x0 family of microprocessors. While these are 32-bit machines, their instruciton set is so similar to that of the PDP-11 that it appears that at some point in the design of the 68000, Motorola may even have intended to make a compatable machine. The production 680x0 family, however, differs in many details, while retaining a basic instruction format that is obviously patterned after the PDP-11. This family gained widespread exposure as the CPU in many early Unix workstations from Sun and HP, it was used in all of the early models of Apple Macintosh computers, and it was used as the basis for the Palm-Pilot and its clones running PalmOS until they switched to using the ARM processor.
Where the Motorola 32-bit architecture looks like it grew from an attempt at compatability, the VAX architecture is a natural extension of the PDP-11 to a 32 bit word in much the same way that the Data General Nova is a natural extension of the PDP-8 to 16 bits. The 3-bit fields of the PDP-8 instruction set wer widened to 4 bits in the VAX, with an 8-bit opcode, 16 addressing modes and 16 registers of 32 bits each. With the added opcodes, it was decided that each instruction could be followed by from 0 to 3 operand fields, allowing fully orthogonal register to register, register to memory and memory to memory instructions, so for example, one-operand add (increment), two operand add (source added to destination), and 3-operand add (source 1 plus source 2 to destination) were all provided.
Chapter 42 of Siewiorek Bell and Newell covers the VAX, well; this is a reprint of W. D. Strecker's 1978 paper describing this as a new architecture.
http://research.microsoft.com/~gbell/Computer_Structures_Principles_and_Examples/
The VAX instruction set is based on strings of 8-bit instruction syllables, where each instruction begins with a syllable holding just the opcode. The opcode determines two things, first, the operation to be done, and second, the number of operand descriptions that follow. Each operand description begins with a syllable that gives the addressing mode, and the mode, in turn, determines how many syllables follow to complete the operand specifier.
The VAX was a determinedly low-byte first machine, so instructions coded in Binary are best read from right to left (as is text in Hebrew or Arabic). This may make the following material look a bit eccentric:
VAX instruction: ___ _______________ ___|_______________| 0 to 3 operands | op(8) | VAX short literal operand ___ ___________ |___|___________| | | x(6) | 0 0 short literal x VAX simple operand modes _______ _______ |_______|_______| | | r(4) | 0 1 0 1 register mode Reg[r] 0 1 1 0 register deferred Memory[Reg[r]] 0 1 1 1 autodecrement Memory[--Reg[r]] 1 0 0 0 autoincrement Memory[Reg[r]++] 1 0 0 1 autoinc deferred Memory[Memory[Reg[r]++]] VAX operand specifier with 8-bit extension ______ _______ _______ ______|_______|_______| x(8) | | r(4) | 1 0 1 0 displacement mode Memory[Reg[r] + x] 1 0 1 1 disp deferred Memory[Memory[Reg[r] + x]] 1 0 0 0 1 1 1 1 immediate (byte) x VAX operand specifier with 16-bit extension _______ _______ _______ _______|_______|_______| x(16) | | r(4) | 1 1 0 0 displacement mode Memory[Reg[r] + x] 1 1 0 1 disp deferred Memory[Memory[Reg[r] + x]] 1 0 0 0 1 1 1 1 immediate (word) x VAX operand specifier with 32-bit extension ________ _______ _______ ________|_______|_______| x(32) | | r(4) | 1 1 1 0 displacement mode Memory[Reg[r] + x] 1 1 1 1 disp deferred Memory[Memory[Reg[r] + x]] 1 0 0 0 1 1 1 1 immediate x 1 0 0 1 1 1 1 1 absolute Memory[x] VAX operand specifier with 64-bit extension _________ _______ _______ _________|_______|_______| x(64) | | r(4) | 1 0 0 0 1 1 1 1 immediate x
The immediate and absolute addressing modes shown above are actually special cases of autoincrement and autoincrement deferred addressing using R15, the program counter, as the register field. These follow from the more general modes in the same way that immediate and absolute addressing modes on the PDP-11 follow from the corresponding general modes. Note that the operand size for the immediate mode is determined by the opcode, so the identical same address specification is be used for immediate byte, word (16-bit), double-word (32-bit) and 64-bit operands.
Note that there is one missing addressing mode that was present on the PDP-11, autodecrement deferred. This was omitted because empirical studies of addressing on the PDP-11 showed that it was never used, and therefore, it was omitted to allow other more important modes.
Where the PDP-11 had only one size of displacement used for indexing, the VAX offered 8, 16 and 32 bit displacements. As the designers of the B 5000 had pointed out over a decade previously, most displacements are small displacements into records, and efficient encoding of these can lead to markedly smaller programs. On the other hand, the designers of the VAX were very anxious not to prevent use of long displacements!
A second important observation the designers of the VAX attended to was the fact that most immediate constants in real programs are very small, values from 1 through 10, and occasionally constants such as 16 and 20 make up the vast majority. Therefore, there will always be a need for efficient encoding of small constants.
Finally, as the designers of the IBM System 360 observed, double indexing has real value. The designers of the VAX went one step beyond what the 360 had done by incorporating scale factors into their indexing scheme:
VAX indexed addressing ____ _______ _______ ____|_______|_______| addr | | x(4) | mode 0 1 0 0 indexed mode Memory[Reg[x]*Size + ea]
Indexed mode operated as a prefix on any other addressing mode that specified a memory address (as opposed to an immediate or register operand). This used that memory address as a base and added to it the product of the index register and the operand size (taken from the opcode, 1, 2, 4 or 8 bytes).
The VAX was a huge commercial success, bringing DEC into position to compete head-on with mainframe manufacturers such as IBM; when Unix was ported to the VAX by the University of California at Berkeley (BSD Unix), the VAX quite suddenly became the preferred foundation for Unix, and with the MicroVAX, DEC became a major vendor of Unix-based workstations. DEC's own operating system, VAX VMS, was extremely well developed, and as DEC crumbled in the 1990's, much of the VMS development team moved to Microsoft where they re-implemented most of the good ideas from VMS as Windows NT.
One of the great (and difficult to attribute) quotes of the early 1980's is that the VAX was everyone's second favorite computer architecture. Why wasn't it everyone's favorite?
First, the VAX offered too many choices. A compiler writer working on a VAX based compier could perform any particular arithmetic operation as a memory to memory operation, a memory to register operation or a register to register operation. An optimizing compiler faced a combinatorial explosion when working out optimal code for an expression of any complexity, and this was both computaitonally difficult and daunting to develop. In contrast, when there are only a few ways of performing an operation, compilers are much easier to write.
Second, the set of addressing modes was either too large or too small. If you look at the addressing modes offered by programming languages, you find that the set is infinite! An operand in C is addressed as follows:
constant -- 6 simple variable -- i array -- opr[3] field of struct -- opr.f pointer -- *opr field pointer -- opr->f -- equivalent to (*opr).f
In the above, opr may be another operand (except a constant), so there are an infinite number of possible compositions of these. In contrast, the VAX provides a large finite set of modes, and the problem of composing these modes to construct a general address is difficult, again because there are so many choices.
Third, the extreme orthogonality of the instruction set and addressing modes led to huge numbers of operation combinations that were never used. Why offer efficient encodings of things people never need to do?
Fourth, the VAX went overboard with complex instructions, for example, offering instructions to insert an element in a doubly linked list, and offering a function call mechanism that compared the set of registers used by the function with the set of registers in use by the caller (each described by a 16-bit mask) in order to decide which registers to push on the stack (along with a 16-bit mask describing what had been pushed). In many cases, good optomizing compilers had an extremely hard time recognizing contexts where these instructions were useful, and in the case of function calls, good optimizing compilers (notably Gnu C++) could frequently create more efficient function calling sequences by ignoring these complex instructions and using simple primitives.
Finally, the VAX instruction set was extremely difficult to pipeline because the set of operations performed by each instruction was so potentially irregular. Throughout the 1970's, one of the developments that was creeping down from supercomputers into everyday computers was pipelined execution. By 1974, one 32-bit minicomputer, the Modcomp IV, was being made with a shallow pipeline, but DEC had largely ignored this possibility in designing the VAX and PDP-11 families, focusing instead on microprogrammed processor design. By the late 1970's, we were beginning to understand hot to apply pipelining ideas to machies with irregular instruction sets like the VAX, but we also knew that it would be much easier to pipeline machines with more regular instruction sets.
It is worth noting that the last two objections to the VAX architecture listed above also apply to all of the descendants of the PDP-11 family, including the Motorola 68000 and (more distantly) the Intel 80x86/Pentium family.