Homework 4 Solutions

22C:122, Fall 1998

Background: Consider a pipelined implementation of the Minimal Ultimate RISC where each instruction execution cycle has the following semantics:

repeat
   M[dst] = tmp;
   tmp = M[src];
   src = M[pc];
   dst = M[pc + 1];
   pc = pc + 2
forever

Part A: This machine's instruction set differs from that of the Minimal Ultimate RISC discussed in class because there is no dst' register. As a result, if we look at an instruction sequence using a pipeline diagram, we find that the src and dst addresses for a single move are not fetched from the same instruction:

               ----|-------------|-------------|-------------|-
         src1/dst1 | src=src1    | tmp=M[src1] | M[dst2]=tmp |
                   | dst=dst1    |             |             |
               ----|-------------|-------------|-------------|-
         src2/dst2 |             | src=src2    | tmp=M[src2] |
                   |             | dst=dst2    |             |
               ----|-------------|-------------|-------------|-
         src3/dst3 |             |             | src=src3    |
                   |             |             | dst=dst3    |
               ----|-------------|-------------|-------------|-
                       cycle 1       cycle 2       cycle 3

Examining the above, it is clear that, to move A to B, one must write the following code:

         1: word A
         2: word ... store previous operand
         3: word ... fetch next operand
         4: word B

So, we can write code to compute A = B - C as follows, using XXXX, YYYY and ZZZZ as filler memory addresses for the strangeness introduced by this instruction set.

	    word  B
             word   XXXX
            word  C
	     word acc
            word    YYYY
             word sub
            word  acc
             word   YYYY
            word    ZZZZ
             word A

Part B: Here is a register-transfer implementation of this instruction execution unit, ignoring the problem of branch instructions.

              _____________
             |  _________  |
          __\|_|/__    __|_|__
       --|>___PC___|  |__+2___| 
      |      | |        /| |\                  
      |      | |_________| |__________________\ read
      |      |______________________   _______  address
      |                   __________| |_______/
CLK --o                  |  ________| |_______  data
      |                  | |      _\| |/_     \
      |                  | |     |__+1___|
      |                  | |        | |_______\ read
      |                  | |        |_________  address
      |       ___________| |__________________/
      |      |  _________| |__________________  data
      |      | |         | |                  \
      |   __\|_|/__   __\|_|/__
      o--|>__DST___|-|>__SRC___| 
      |      | |         | |__________________\ read
      |      | |         |____________________  address
      |      | |          ____________________/
      |      | |         |  __________________  data
      |      | |      __\| |/__               \
      o------| |-----|>__TMP___|
      |      | |_________| |__________________\ write
      |      |___________| |__________________  address
      |                  | |__________________<
      |                  |____________________  data
      |                                       /
       ---------------------------------------> write strobe

Part C: Here is the above machine, with branch logic added:

            _______________________________ 
           |  ___________________________  |
           | |  ___________              | |
           | | |  _______  |             | |
          _|_|_|_|_      | |             | |
         _\ 1   0 /      | |             | |
        |  \_MUX_/       | |             | |
        |    | |         | |             | |
        | __\|_|/__    __|_|__           | |
      ---|>___PC___|  |__+2___|          | | 
     |  |    | |        /| |\            | | 
     |  |    | |_________| |_____________| |__\ read
     |  |    |___________________   _____| |__  address
     |  |                 _______| |_____| |__/
CLK -o  |                |  _____| |_____| |__  data
     |  |                | |   _\| |/_   | |  \
     |  |                | |  |__+1___|  | |
     |  |                | |     | |_____| |__\ read
     |  |                | |     |_______| |__  address
     |  |     ___________| |_____________| |__/
     |  |    |  _________| |_____________| |__  data
     |  |    | |         | |             | |  \
     |  | __\|_|/__   __\|_|/__          | |
     o---|>__DST___|-|>__SRC___|         | |
     |  |    | |         | |_____________| |__\ read
     |  |    | |         |_______________| |__  address
     |  |    | |          _______________| |__/
     |  |    | |         |  _____________| |__  data
     |  |    | |      __\| |/__          | |  \
     o-------| |-----|>__TMP___|         | |
     |  |    | |_________| |_____________| |__\ write
     |  |    |___________| |______   ____| |__  address
     |  |                | |______| |____| |__<
     |  |                |________| |_________  data
     |  |                       __|_|__       /
     |  |                      |_=FFFF_|
     |  |                          |    ___
     |   --------------------------o--o|and|
      ---------------------------------|___|--> write strobe

Here is the pseudocode for the above machine:

repeat
   if dst <> FFFF then M[dst] = tmp else pc = tmp;
   tmp = M[src];
   src = M[pc];
   dst = M[pc + 1];
   pc = pc + 2
forever

Note that this logic does nothing to eliminate visible delay slots! Programming this machine will definitely be strange!

Part D: This pipelined IEU has no operand delay slots, other than the strange relationship between source and destination addresses being encoded in different instructions.

Part E: This pipelined IEU has one branch delay slots in addition to the strange semantics of the move instruction. Therefore, a branch would look like this:

	    word  DST    -- start the branch
             word   XXXX
            word    YYYY -- finish the branch
	     word pc
            word    ZZZZ -- delay slot filler
             word   YYYY