Recall the MINIMAL Ultimate RISC code for A[I] = 5:
XA \ move the address of A to the accumulator acc / I \ add the value of I to the accumulator add / acc \ store accumulator to location labeled xxx xxx / X5 \ copy the constant 5 (labeled X5) to memory xxx: --- /Recall that, when this was rewritten for the Pipelined Ultimate RISC, we had to add 3 delay slots (1 for operand delay, and 2 for self-modifying delay).
Problem: Consider adding similar forwarding logic to compare the memory address used by the operand store stage of the pipeline to the two addresses issued by the instruction fetch stage of the pipeline.
Part A: Give a detailed diagram of the Pipelined Ultimate RISC instruction execution unit showing all comparitors, multiplexors and other parts required to implement both the operand forwarding logic mentioned in the background and the forwarding logic discussed in the problem statement.
Part B: Which of the delay slots required for the A[I] = 5 example are eliminated by the addition of the above forwarding logic? Do any delay slots remain?