7. Reduction to Microcode, An Example
Part of
the 22C:122/55:132 Lecture Notes for Spring 2004
|
In general, given a set of register-transfer components that correspond to the operators of an imperitive programming languge, we can reduce any program in that language to a register-transfer system and a finite-state control unit. We do this as follows:
The net result of applying this second part of the algorithm is the state table for a (Moore) finite-state machine to drive the register transfer logic developed in the first part of this.
There is one little detail needed to reduce this to actual hardware: We need to make sure that the register transfers occur at the right times. Typically, this means something like the following:
___________ _______________| status | ____________| outputs ______|__|______ | | | | Register clock -----o---|> Finite State | | Transfer | | Control Unit | | Logic | |________________| | | _|_|_ | |_________| select | | and | |___________| inputs -----|gates| | |_____| | | |________________| clock |__________________| inputs |___________
This bit of logic guarantees that the clock inputs to the register transfer logic all go to zero between successive clock pulses, even if the same register is assigned to by successive states of the control unit. This is extremely safe if the control unit changes state on the positive going clock edge and the registers in the register-transfer half of the system are negative edge triggered. Faster performance is possible if all register transfers happen on the same clock edge as the state changes; because we used and gates above, this is best done by having everything happen on the falling edge of the clock, so the and gates cut off any changes in the clock inputs to the register transfer logic before the output of the control unit begins to wobble through its state changes.
For the sake of example, consider Euclid's algorithm to compute the greatest common divisor of two numbers. This can be expressed algorithmically as follows:
unsigned int gcd( unsigned int x, unsigned int y ) { if (x == y) { return x; } else if (x > y) { return gcd( x-y, y ); } else /* x < y */ { return gcd( x, y-x ); } }
The above algorithm is expressed recursively, so our first job is to remove the recursion:
unsigned int gcd( unsigned int x, unsigned int y ) { while (x != y) { if (x > y) { x = x - y; } else /* x < y */ { y = y - x; } } return x; }
The next step is to remove the function calls from the program; in this case, we have no main program calling for the greatest common divisor, instead, we have the specification, invented arbitrarily for this assignment, that our machine should compute the greatest common divisor of its two inputs whenever the input ready signal is asserted, and our machine should output result valid whenever the output of the machine holds the result.
/* these variables stand in for external inputs */ unsigned int input_a; input_b; BOOLEAN input_valid; /* these are the variables of our machine */ unsigned int x, y; BOOLEAN result_valid; /* the contents of x and y will be equal when result_valid is true */ /* either x or y can be sampled as the output when this holds */ main() { for (;;) { /* an infinite loop */ do { (void); while (input_ready == FALSE); result_valid = FALSE; x = input_a; y = input_b; while (x != y) { if (x > y) { x = x - y; } else /* x < y */ { y = y - x; } } result_valid = TRUE; } }
We have written the above with some extra variables added to the declarations up front, standing in for outputs from the outside world, and with mere comments stating which internal variables are visible to the outside world to hold outputs. We had to do this because the programming language view of how input and output are done differs in serious ways from the way mechanisms actually accomplish this.
Having reduced our algorithm to the final form shown above, we can now begin reducing this to register-transfer logic. We begin by extracting all the assignment statements from the code and grouping them by the name of the variable assigned to:
result_valid = FALSE; result_valid = TRUE; x = input_a; x = x - y; y = input_b; y = y - x;
Next, for each variable, we use a register, and we use a multiplexor to select which assignment is performed, with appropriate register-transfer functional units substituted for the operators in the expressions. For the result-valid Boolean variable, this gives the following:
FALSE TRUE __|___|__ F -----------------\ 0 1 / result-valid \_____/ _______|_______ C -------------|> result valid | result-valid |_______________| | | result-valid output
This example is constructed, deliberately, without any thoughtful optimization! Given the encoding of FALSE as zero and TRUE as one, it is trivial to optimize this as:
F --------------------- result-valid | _______|_______ C -------------|> result valid | result-valid |_______________| | | result-valid output
We can apply similar reductions to each of the other registers, so for the X register, we get:
input a ___________ value of y | ___|______ | _|___|_ | | | x y | | | | x-y | | | |_______| | | _____| | __|___|__ | F -----------------\ 0 1 / | x \_____/ | _______|_______ | C -------------|> x | | x |_______________| | | | o---------------- | gcd output
The diagram for the Y register is very similar, but this raises a question if we try to combine all of these: How do we make an organized drawing of the entire system? Here is one suggestion for the overall structure of the drawing of such a digital system:
______________________________________ ____________ | inputs from the outside world | | |______________________________________| | | distibution network for feedback | | | from internal registers | feedback | |______________________________________| data | | combinational register-transfer | paths | | elements | | |______________________________________| | | multiplexors | | |______________________________________| | | registers | | |______________________________________|____________|
Following this convention gives us the following diagram for the entire register-transfer half of the system:
input a input b | | | ----------|-----------o-------------------- x | | | | | | | ------|-------o---|---------------- y | | _|___|_ | _|___|_ | | | | x y | | | y x | | | | | x-y | | | y-x | | | | |_______| | |_______| | | | _____| | _____| | | __|___|__ __|___|__ | | F ----\ 0 1 / F ----\ 0 1 / F -------- | | x \_____/ y \_____/ rv | | | ____|____ ____|____ ____|____ | | C ---|> x | C ---|> y | C ---|> rv | | | x |_________| y |_________| rv |_________| | | | | | | | | o------------------|------- | o------------------|------------------|----------- | | | gcd alternate result valid output gcd output output
At this point, we are more than halfway there! We know, at this point, that the outputs from the finite state control unit will be no more and no less than Fx, Cx, Fy, Cy, Frv, and Crv. For compactness, we have renamed the result-valid flipflop (and its control signals) with the abbreviated name rv.
What we are missing are the inputs to the control unit! To locate these, we extract, from the final textual version of our program, all of the expressions evaluated in the control structures of the program. These were;
(input_ready == FALSE) (x != y) (x > y)
Input-ready is trivial, since this Boolean input to the system is simply passed directly to the control unit, unchanged. The other two are more interesting. We could add special combinational logic to compute these directly from the distribution network for the values of internal registers, as follows:
input a input b | | ----------o---------|-----o---------|---------o---------- x | | | | | | | | ------|---o-----|-----|---o-----|-----o---|------ y | _|___|_ _|___|_ | _|___|_ | _|___|_ | | | x y | | x y | | | x y | | | y x | | | | x>y | | x=y | | | x-y | | | y-x | | | |_______| |_______| | |_______| | |_______| | | | | | | | | | | <---- | x>y | | <--------------- x=y
This, however, is completely unnecessary, because we are already computing both the difference x-y and the difference y-x; between these two differences, we can completely determine equality and the relative magnitudes of the numbers.
If we were dealing with signed numbers, we might be tempted to use the signs of the differences for this comparison. This would fail in the event of an overflow. Instead, we will take advantage of the fact that the subtraction x-y is generally done using two's complement additioin, x+~y+1; when this is done, the carry out of the adder means "no borrow output from most significant bit", which, in turn, means x>y. Therefore, taking both the differences x-y and y-x gives us signals indicating both x>y and y>x. We can get all the possible comparisons of x and y from simple functions of these two carry signals.
input input a input b ready | | | | ----------|-----------o-------------------- x | | | | | | | | | ------|-------o---|---------------- y | | | _|___|_ | _|___|_ | | | | | x y | | | y x | | | | | | x-y | | | y-x | | | IR <-- | |_c_____| | |_c_____| | | /| | | | | | | | | x>y <--O< |-|-------|-|--------|-------o | | | \| | | | | | | | | ___ | | | | | | | | | |-|------- | | | | | | x=y <-|and| | | | | | | | |___|-|---------|--------|------- | | | | _____| | _____| | | __|___|__ __|___|__ | | F ----\ 0 1 / F ----\ 0 1 / F -------- | | x \_____/ y \_____/ rv | | | ____|____ ____|____ ____|____ | | C ---|> x | C ---|> y | C ---|> rv | | | x |_________| y |_________| rv |_________| | | | | | | | | o------------------|------- | o------------------|------------------|----------- | | | gcd alternate result valid
The control unit can now be described! We know that the outputs of the control unit are Fx, Cx, Fy, Cy, Frv, and Crv, and we know that the inputs are IR, x>y and x=y. The truth tables for the control unit will therefore have the following structure
this | next this | F C F C F C state IR x>y x=y | state state | x x y y rv rv -------------------|------- -------|----------------------
All we need to do is fill in these tables. To do so, we examine the algorithm again, grouping all assignment statements into groups that can be carried out in parallel using the data paths outlined at the end of the previous seciton. These groups are identified by the labels in the following restatement of the algorithm:
main() { for (;;) { /* an infinite loop */ do { A: (void); while (input_ready == FALSE); B1: result_valid = FALSE; B2: x = input_a; B3: y = input_b; while (x != y) { if (x > y) { C: x = x - y; } else /* x < y */ { D: y = y - x; } } E: result_valid = TRUE; } }
Each letter used above corresponds to one state of the finite state control unit. In state A no assignments take place; in state B, we have three parallel assignments, while in states C, D and E, there is one assignment for each. We can formalize this in the table that relates the outputs of our finite state control unit to the current state:
this | F C F C F C state | x x y y rv rv --------|---------------------- A | - 0 - 0 - 0 B | 0 1 0 1 0 1 C | 1 1 - 0 - 0 D | - 0 1 1 - 0 E | - 0 - 0 1 1
In the above truth table, we have not provided binary encodings for the state names, and we have used dashes in the output to indicate don't care values. As a general rule, if a register is not clocked during some clock-cycle of the finite state control unit, the multiplexor controls and function select controls that govern the data inputs to that register do not matter. Designating don't care values is important when an effort is made to optimize the implementation of a boolean function, so it is polite to indicate it if the door to optimization is to remain open.
The second state table we must fill out follows directly from the control structure for our algorithm; we can express this as a picture, as follows:
______ / \x>y \ / ------------->(C)-------- _____ir=0 / / \ x=y \ / \ /x>y \___ /y>x \ \ / / \ / \ -->(A)------>(B) X ---->(E) / ir=1 |\ ___/ \ / / \ / | \y>x / \x>y / / \ \ | \ \ / x=y / / / \ x=y| ------------->(D)-------- / / \ | / \ / / \ \ \______/y>x / / \ \___________________________/ / \_______________________________________________/
There is nothing particularly pretty about this state diagram! The inner loop of the original code has been thoroughly obscured by the fact that we only have states for each block of assignment statements, and not for control structures such as the while loop and the if statement. Without these extra states, states B, C and D all test for loop termination,m and states C and D are connected in what might be described as a figure-eight loop. This state diagram reduces directly to the following truth-table for the next-state function:
this | next state IR x>y x=y | state --------------------|------- A 0 - - | A A 1 - - | B B - 1 - | C B - 0 0 | D B - 0 1 | E C - 1 - | C C - 0 0 | D C - 0 1 | E D - 1 - | C D - 0 0 | D D - 0 1 | E E - - - | A
Again, we have used dashes to indicate don't care values on the inputs. When mechanically reducing a truth table where there are don't care inputs, neither that input nor the inverted value of that input needs connection to the row decoder for that row of the input. Digital logic optimization is largely concerned with algorithms for combining rows of the truth table by identifying the rows for which the inputs are don't care values.
We can combine the above two truth tables into one table without changing any of the information expressed. If we do it as follows, we make clear that this table still describes a Moore machine:
this | next F C F C F C state IR x>y x=y | state x x y y rv rv --------------------|------------------------------ A 0 - - | A - 0 - 0 - 0 A 1 - - | B | B - 1 - | C 0 1 0 1 0 1 B - 0 0 | D B - 0 1 | E | C - 1 - | C 1 1 - 0 - 0 C - 0 0 | D C - 0 1 | E | D - 1 - | C - 0 1 1 - 0 D - 0 0 | D D - 0 1 | E | E - - - | A - 0 - 0 1 1
In the above truth table, all rows that involve transitions from one state have been grouped into blocks, and for each block, only one set of outputs is given.
We can assign arbitrary Boolean encodings for the state names used here! Sometimes, the optimization of the logic for the finite-state control unit is simplified by careful encoding of the state names, but since we are largely ignoring such issues in this class, we will not bother with this.
The important thing to note is that, aside from questions of optimization, the reduction of an algorithm to a digital system that executes that algorithm is a mechanical process. Most of the information you need to evaluate the performance of that digital system is actually apparent in the final version of the source code for the algorithm, particularly if it is annotated to show the statements that can be executed in parallel.
Filling in the contents of this state table has a strong resemblance to programming! It is, in fact, quite reasonable to view the part of the truth table dealing with each state as a kind of machine instruction, where next-state fields of the table give the control structure part of the behavior of that instruction, while the control unit outputs to the data part of the system give the computational part of the instruction.
If we think in these terms, the register-transfer logic of the digital system can be thought of as defining the instruction set of the control unit, and then the actual job of working out the control unit for that control unit can be thought of as programming in that instruction set. When the register transfer logic is designed with the intent of allowing general purpose computation, as opposed to designing it to meet the needs of one particular algorithm, we refer to the entire system as a microprogrammable system, and we refer to the contents of the state table of the control unit as the microprogram (µprogram, for short). This terminology is particularly applicable when the state table for the control unit is stored in a dedicated fast RAM or in some kind of programmable ROM (PROM, generally; for example, EEPROM or Flash EEPROM), so that the particular algorithm to be executed by the digital system can be changed after the system has been manufactured.