The Ultimate RISC paper presents the following bus for the Ultimate RISC:
Data, 16 bits, bidirectional tristate =================== Address, 16 bits, from IEU =================== Write, 1 bit, unidirectional ------------------- Read, 1 bit, unidirectional -------------------A pulse on the write line transfers data to the attached memory or register if the address matches the address for that memory.
When the read line is high, the contents of the addressed memory or register are transferred to the data line.
A single register R at address A on the ultimate RISC bus would look like the following:
Data, 16 bits =========================o======o=== Address, 16 bits ===o=====================|======|=== Write, 1 bit, ---|----o----------------|------|--- Read, 1 bit, ---|----|-o--------------|------|--- __|__ | | ___ | | | = A | | --|AND|-------|-----/_\ address |_____| | --|___| | | decoder | | | | | ----|-o | | | | ___ ___|___ | | --|AND|---|> R | | ----|___| |_______| | | | ------
An output only centronics printer interface (an IBM PC compatable parallel port) would look like the following:
Data, 16 bits =========================o======o=== Address, 16 bits ===o=====================|======|=== Write, 1 bit, ---|----o----------------|------|--- Read, 1 bit, ---|----|-o--------------|------|--- __|__ | | ___ | | | = A | | --|AND|-------|-----/_\ address |_____| | --|___| | | decoder | | | /16 /16 ----|-o | _|_ | | ___ ___|___ | | 5 | --|AND|---|> R | | --/-| ----|___| |_______| | | | o--/---| printer | | 8 | connector | |--/_\ | -| | | |---|---/--| | | 4 | |-/- 8The printer or other device attached to the I/O interface is responsible for interpreting the signals from the host computer, but the following interface documentation gives typical interpretations:
__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ output |\/|\/|OE|\/|SL|IN|LF|ST| | | | | | | | | |__|__|__|__|__|__|__|__|__|__|__|__|__|__|__|__| unused| | | control | data out | OE -- enable tri-state driver for data to connector SL -- select output to device IN -- initialize device LF -- control a device option (linefeed on printer) ST -- strobe to device __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ input |BS|AK|PE|OL|ER|\/|\/|\/| | | | | | | | | |__|__|__|__|__|__|__|__|__|__|__|__|__|__|__|__| status | unused | data in | BS -- device busy AK -- device acknowledges strobe pulse PE -- out of paper OL -- device on line ER -- device errorThe designer of the interface is completely uninterested in all of the inputs and outputs other than the OE line. The designer of the printer itself and the designer of the software to control the printer must, however, agree on the use of the other lines. The definitions given above are compatable with a large number of IBM PC parallel port interfaces, except that the PC interface is based on 8-bit registers, with distinct addresses defined for data in and data out to each.
This design allows something that Centronics didn't anticipate, but it is a feature common on PC compatables -- that is, the use of tri-state data lines in the parallel port, so they may be used for both input and output.
If we have a 16 bit address bus, we have, at most, 64K of address space. Consider the following design of a 16K memory module for this bus:
Data, 16 bits =========================o=====o=== Address, 16 bits ===o=====================|=====|=== Write, 1 bit, ---|-------o-------------|-----|--- Read, 1 bit, ---|-------|-o-----------|-----|--- __|__ | | ___ | | /|| | \ | -|AND|-----|----/_\ High (+5) ----- | | | -|___| ___|___ | | | | ---- | | | 14 | in | | R R | | ___ ------|-|-/-----|Addr | | | | | -|EQU| ___ | | ___ | 16Kx16| | -oS0o-|-o-|---|___|-|AND|-|-o-|AND|-|> | | Low | ---|EQU|-|___| ---|___| |__out__| | (0)-oS1o-o-------|___| | | -----This diagram shows the details (at an electrical level) of the address decoder for a typical memory such as this. The pair of equivalence gates (EQU) and the AND gate make the comparison between the 2 most significant bits of the address on the bus and the 2-bit value set on the switches (or jumpers) S0 and S1. If the switch is closed (or the jumper is present), the value input to the comparator is 0, while if the switch is open (or the jumper has been cut) the value is 1. An open switch allows the resistor R (usually a few thousand ohms) to pull the input to the comparator up to logic 1, typically 5 volts, while a closed switch short circuits the input to logic zero, typically ground or 0 volts.
Many devices have option select or address select jumpers or switches that operate as outlined above.
Adding multiple registers to the ALU allows a far more interesting programming environment than is possible with the single register of the original Ultimate RISC. Of course, merely adding extra ALU modules to the bus, each with a different address, will make the system useful, but this means duplicating the entire ALU instead of just adding registers. A good design for a multi-register ALU for the ultimate RISC might be:
Data, 16 bits =====================o==========o=== Address, 16 bits ===o=================|==========|=== Write, 1 bit, ---|-----o-----------|----------|--- Read, 1 bit, ---|-----|-o---------|----------|--- __|___ | | ___ | | /| | |\ | -|AND|---|---------/_\ _|_ | | | -|___| | __|__ |=X || | | | --|-------| g(x)| |___|| | | | 4 | | |__x__| | | --|-|-/----o | | | | | | | | ----o | | | | | |_____| | | | | | | |a b| | | | | | | | ALU | | | | | | -|f | | | | | | |_f(a,b)| | | | | | | | | | | | 4 ___|___ | | ----|-|-/------|Addr | | | | | ___ | | | -------|-o-|AND|--|>16x16 | | ---|___| |__out__| | | | --------The minimum necessary set of ALU functions is:
The decision to use 16 registers in this ALU was arbitrary, but it is based on the fact that there are many successful 16 register machines. The decision to use 16 operations for this ALU was far more arbitrary. It is easy to find about 8 operations that are useful on reading and writing the ALU, but it is a challenge to find 16.
Multi-function ALU's can be built many ways, but at heart, all are based on a full adder:
A B Cin | S Cout -------------+------------ 0 0 0 | 0 0 0 0 1 | 1 0 0 1 0 | 1 0 0 1 1 | 0 1 1 0 0 | 1 0 1 0 1 | 0 1 1 1 0 | 0 1 1 1 1 | 1 1To add two 2-bit numbers A and B, we string these together as follows
A1 B1 A0 B0 | | ____ | | ____Cin _|_|___|_ | _|_|___|_ | A B Cin | | | A B Cin | | | | | | | Cout S | | | Cout S | |_________| | |_________| ____| | |____| | Cout | | S1 S0Of course, the "ripple carry" logic used here makes the adder speed rather poor if the word size is large, but, as it turns out, increasing the adder speed is an independent problem from increasing its functionality.
To increase its functionality, we must look at the implementation of the adder. Typically, the function of each full adder is explained as:
Cout = (A and B) or (A and Cin) or (B and Cin) S = A xor B xor CinHigh speed adder design rests on improving the speed of the carry chain, that is, computing Cout quickly. The logic for the sum S, however, remains largely unchanged in high speed adders, and this is where we will concentrate our effort in improving functionality.
The underlying logic function used in sum computation is the exclusive or. One of the most common implementations of the exclusive or function in digital logic is the following:
____ A ---o----------------| | | |NAND|--- | ____ ---|____| | ____ ---| | | ---| | |NAND|---o |NAND|--- A xor B ---|____| | ____ ---|____| | ---| | | | |NAND|--- B ---o----------------|____|We can augment this basic circuit with the following auxiliary inputs:
____ Aenab ---| | A ---o----------------|NAND|--- | ____ ---|____| | ____ ---| | | ---| | Xenab ---|NAND|---o |NAND|--- F ---|____| | ____ ---|____| | ---| | | B ---o----------------|NAND|--- Benab ---|____|Having done this, we can view the result as computing 8 different functions of A and B. THese are:
Xenab Aenab Benab | Function -----------------------+------------ 0 0 0 | F = 0 0 0 1 | F = B 0 1 0 | F = A 0 1 1 | F = A or B 1 0 0 | F = 0 1 0 1 | F = B and not A 1 1 0 | F = A and not B 1 1 1 | F = A xor BIf we now build an the sum component for one bit of the adder as follows, we get a general purpose ALU:
A B Cin | | | Bnot --|---- | | | _|___|_ | | | | | | | XOR | | | |___ ___| | |__ | | _|___|_ | | A B | | Xenab ---|Xenab | | Aenab ---|Aenab | | Benab ---|Benab | | |___F___| | | ____| _|___|_ | A B | ---|Xenab | | 1-|Aenab | Cenab -o---|Benab | |___F___| | FAn ALU based a complete full adder using this logic now performs the following set of useful functions, along with many useless functions:
Xenab Aenab Benab Cenab Bnot | function -------------------------------+------------- 0 0 0 0 0 | F = 0 0 1 1 0 0 | F = A or B 1 0 1 0 0 | F = B and not A 1 1 0 0 0 | F = A and not B 1 1 1 0 0 | F = A xor B 1 1 1 1 0 | F = A + B + Cin 0 1 1 0 1 | F = A or not B 1 1 0 0 1 | F = A and B 1 1 1 1 1 | F = (A - B - 1) + CinOf course, we don't usually want to use 5 control signals to select among 9 useful alternative, particularly when some of those alternatives are of only marginal utility, so in a practical system, we will typically select eight of these, with a bit of logic (3-inputs and 5-outputs) to select the useful functions.