a: MOVE X,sub
MOVE ccN,Y
Our result forwarding logic fails in the above because the address
ccN (FFF1 as a source) and sub (FFF1 as a destination) are equal, and
therefore the value of X is forwarded to Y; this is the incorrect result!
b: MOVE X,sub
MOVE acc,Y
Our forwarding logic fails in the above because the forwarding logic does not
know about the ALU; as a result, it cannot forward the result from the subtract
operation to Y.
c: MOVE X,FFF816
MOVE FFF816,Y
Our forwarding logic fails in the above because the memory address
FFF816 is unimplemented memory, so the correct semantics is
that Y should get an undefined value. Instead, it gets the forwarded
value of X. This may not be formally incorrect, but is unexpected behavior.
d: MOVE X,(d+2)
MOVE .-.,Y
Our forwarding logic fails in the above because the code is self-modifying
and our forwarding logic does not forward to the instruction fetch path.
e: MOVE X,Y
MOVE Y,pc
Result forwarding fails in the above because, while our forwarding tries
to forward to the PC in order to eliminate branch delay slots, it does not
account for an operand move immediately prior to a branch.
______ __________
| __ | | ______ |
| | |_|_|_| | |
clk |_| \0mux1/------| |-----
| |+2_| _|_|_ | | |
o------| |--|>pc__| | | |
| | |____| |________| |_____|_____
| |___________ ___| |___________ read address
| _________| |___| |_____|_____
| | _______| |___| |___________ read data
| _|_|_ |_| | | |
o------|>src_| |+1_| | | |
| | | | |___| |_____|_____
| | | |_____| |___________ read address
| | | _____| |_____|_____
| | | | ___| |___________ read data
| _|_|_ _|_|_ | | |
o------|>src_|---|>dst_| | | |
| | |_______| |___| |_____|_____
| |_________| |___| |_______ _ read address
| _____ _______| |___| |_____|_| |_
| | _ | | _____| |_____________| |_ read data
| | | | | | | | |_ _______ | | |
| | | |_|_|_| | _|_=FFFF?|- | |
| | | \1mux0/-----| |------------|_=_|
| | | | | | | | |
| | | _| |_ _|_|_ | |
o--| |-|>tmp_|---|>dst'| | |
| | |___| | | |_____________| |_
| |_____ | | _________________ write address
| | |_______| |_________________
| |_________| |_________________ write data
| __|_|__
| |_=FFFF?|
| | ___
| ------not-|and|___ write memory
------------------------------|___|
src = dst'
with the more complex test
(src = dst') and (src < F00016)
This merely prevents the forwarding logic from producing anomolous behavior;
it does not forward the correct results!
To solve the problem with parts a and b, we must provide forwarding paths from the output of the ALU (prior to feeding into the accumulator or condition code register) to tmp. This is fairly complex, but it can be added to the register transfer notation with a one-line change, from
tmp = (if src = dst' then tmp else m[src])
to
tmp = (if (src = dst') and (dst' < F00016)
then tmp
elseif src = FFF016 and (FFF016 < dst' < FFF816)
then alu-data-output
elseif src = FFF116 and (FFF016 < dst' < FFF816)
then alu-sign-bit-output
else m[src])
This is ugly, but it works!
To introduce a bubble in the pipe, we can convert the instruction following a branch into a no-op. Consider the following version of the architecture:
repeat the following assignments in parallel
-- if (dst' < FFFE16)
then m[dst'] = tmp
* tmp = (if src = dst' then tmp else m[src])
dst' = dst
src = m[pc]
dst = m[pc + 1]
-- dst = (if dst = FFFF16 then FFFE16 else m[pc + 1])
pc = (if dst = FFFF16 then m[src] else pc + 2)
forever
The changed lines have been marked with dashes. The first change causes
stores to location FFFE16 to be interpreted as no-ops; the second
change is to convert dst to FFFE16 in the event that the previous
instruction was a branch. This converts the instruction following a branch
into a no-op, although it still wastes effort fetching its operand. This
wasted fetch could be avoided by changing the line marked with a star.