Homework 5

22C:122, Fall 1998

Due Monday Sept 28, 1998, in class

Douglas W. Jones

Figure 3.4 in the text gives the pipeline diagram for Hennesy and Patterson's DLX architecture. This diagram does not clearly document the fields of the interstage registers, but merely shows them as big grey rectangles. Give a reasonable breakdown of the fields in each interstage register, noting for each field, how big you would expect it to be.
Background: Result forwarding logic can be used to eliminate delay slots. For example, in the case of the Pipelined Ultimate RISC, we can add logic to compare the memory addresses being used by the operand load and operand store stages of the pipeline; if these addresses are equal, we can ignore the value from memory and instead use the value being stored by the store stage, and thus eliminate the operand delay slot!
Recall the MINIMAL Ultimate RISC code for A[I] = 5:
```
	     XA  \ move the address of A to the accumulator
	     acc /
	     I   \ add the value of I to the accumulator
	     add /
	     acc \ store accumulator to location labeled xxx
	     xxx /
	     X5  \ copy the constant 5 (labeled X5) to memory
	xxx: --- /
```
Recall that, when this was rewritten for the Pipelined Ultimate RISC, we had to add 3 delay slots (1 for operand delay, and 2 for self-modifying delay).
Problem: Consider adding similar forwarding logic to compare the memory address used by the operand store stage of the pipeline to the two addresses issued by the instruction fetch stage of the pipeline.
Part A: Give a detailed diagram of the Pipelined Ultimate RISC instruction execution unit showing all comparitors, multiplexors and other parts required to implement both the operand forwarding logic mentioned in the background and the forwarding logic discussed in the problem statement.
Part B: Which of the delay slots required for the A[I] = 5 example are eliminated by the addition of the above forwarding logic? Do any delay slots remain?
Refer to section 3.3 in the text. Which of the delay slots we have dealt with in the previous problem are the result of structural hazards, which are the result of data hazards, and which are the result of control hazards?
Our Pipelined Ultimate RISC design applies a considerable amount of hardware to speeding the Instruction Execution Unit, but we have done nothing to speed the operation of the ALU! Is this realistic? Use what you know about the relative speeds of ALU's and memory chips, based on your elementary computer architecture course and on what you find in Hennesy and Patterson to answer this question.