Homework 6 Solutions

22C:122, Fall 1998

```
     a: MOVE X,sub
        MOVE ccN,Y
```
Our result forwarding logic fails in the above because the address ccN (FFF1 as a source) and sub (FFF1 as a destination) are equal, and therefore the value of X is forwarded to Y; this is the incorrect result!
```
     b: MOVE X,sub
        MOVE acc,Y
```
Our forwarding logic fails in the above because the forwarding logic does not know about the ALU; as a result, it cannot forward the result from the subtract operation to Y.
```
     c: MOVE X,FFF8₁₆
        MOVE FFF8₁₆,Y
```
Our forwarding logic fails in the above because the memory address FFF8₁₆ is unimplemented memory, so the correct semantics is that Y should get an undefined value. Instead, it gets the forwarded value of X. This may not be formally incorrect, but is unexpected behavior.
```
     d: MOVE X,(d+2)
        MOVE .-.,Y
```
Our forwarding logic fails in the above because the code is self-modifying and our forwarding logic does not forward to the instruction fetch path.
```
     e: MOVE X,Y
        MOVE Y,pc
```
Result forwarding fails in the above because, while our forwarding tries to forward to the PC in order to eliminate branch delay slots, it does not account for an operand move immediately prior to a branch.

           ______   __________
          |  __  | |  ______  |
          | |  |_|_|_|      | |
  clk     |_|  \0mux1/------| |-----
   |     |+2_|  _|_|_       | |     |
   o------| |--|>pc__|      | |     |
   |      | |____| |________| |_____|_____
   |      |___________   ___| |___________ read address
   |         _________| |___| |_____|_____
   |        |  _______| |___| |___________ read data
   |       _|_|_      |_|   | |     |
   o------|>src_|    |+1_|  | |     |
   |        | |       | |___| |_____|_____
   |        | |       |_____| |___________ read address
   |        | |        _____| |_____|_____
   |        | |       |  ___| |___________ read data
   |       _|_|_     _|_|_  | |     |
   o------|>src_|---|>dst_| | |     |
   |        | |_______| |___| |_____|_____
   |        |_________| |___| |_______   _ read address
   |   _____   _______| |___| |_____|_| |_
   |  |  _  | |  _____| |_____________| |_ read data
   |  | | | | | |     | |_ _______  | | |
   |  | | |_|_|_|     |  _|_=FFFF?|-  | |
   |  | | \1mux0/-----| |------------|_=_|
   |  | |   | |       | |             | | 
   |  | |  _| |_     _|_|_            | |
   o--| |-|>tmp_|---|>dst'|           | |
   |  | |___| |       | |_____________| |_
   |  |_____  |       |  _________________ write address
   |        | |_______| |_________________
   |        |_________| |_________________ write data
   |                __|_|__          
   |               |_=FFFF?|
   |                   |           ___
   |                    ------not-|and|___ write memory
    ------------------------------|___|

To eliminate the worst of the difficulty with problem 1 parts a, b and c, we must turn off result forwarding for operands that are outside the normal part of memory. Thus, we could replace the test
```
     src = dst'
```
with the more complex test
```
     (src = dst') and (src < F000₁₆)
```
This merely prevents the forwarding logic from producing anomolous behavior; it does not forward the correct results!
To solve the problem with parts a and b, we must provide forwarding paths from the output of the ALU (prior to feeding into the accumulator or condition code register) to tmp. This is fairly complex, but it can be added to the register transfer notation with a one-line change, from
```
           tmp = (if src = dst' then tmp else m[src])
```
to
```
           tmp = (if (src = dst') and (dst' < F000₁₆)
                     then tmp
                  elseif src = FFF0₁₆ and (FFF0₁₆ < dst' < FFF8₁₆)
                     then alu-data-output
                  elseif src = FFF1₁₆ and (FFF0₁₆ < dst' < FFF8₁₆)
                     then alu-sign-bit-output
                     else m[src])
```
This is ugly, but it works!
The logic given in the assignment eliminates one branch delay slot because assignments to the PC are handled by checking dst and not dst', and by assigning from M[src] instead of assigning from tmp.
To introduce a bubble in the pipe, we can convert the instruction following a branch into a no-op. Consider the following version of the architecture:
```
        repeat the following assignments in parallel

      --   if (dst' < FFFE₁₆)
              then m[dst'] = tmp

      *    tmp = (if src = dst' then tmp else m[src])
           dst' = dst

           src = m[pc]
           dst = m[pc + 1]
      --   dst = (if dst = FFFF₁₆ then FFFE₁₆ else m[pc + 1])
           pc = (if dst = FFFF₁₆ then m[src] else pc + 2)

        forever
```
The changed lines have been marked with dashes. The first change causes stores to location FFFE₁₆ to be interpreted as no-ops; the second change is to convert dst to FFFE₁₆ in the event that the previous instruction was a branch. This converts the instruction following a branch into a no-op, although it still wastes effort fetching its operand. This wasted fetch could be avoided by changing the line marked with a star.