repeat M[dst] = tmp; tmp = M[src]; src = M[pc]; dst = M[pc + 1]; pc = pc + 2 forever
Part A: This machine's instruction set differs from that of the Minimal Ultimate RISC discussed in class because there is no dst' register. As a result, if we look at an instruction sequence using a pipeline diagram, we find that the src and dst addresses for a single move are not fetched from the same instruction:
----|-------------|-------------|-------------|-
src1/dst1 | src=src1 | tmp=M[src1] | M[dst2]=tmp |
| dst=dst1 | | |
----|-------------|-------------|-------------|-
src2/dst2 | | src=src2 | tmp=M[src2] |
| | dst=dst2 | |
----|-------------|-------------|-------------|-
src3/dst3 | | | src=src3 |
| | | dst=dst3 |
----|-------------|-------------|-------------|-
cycle 1 cycle 2 cycle 3
Examining the above, it is clear that, to move A to B, one must write
the following code:
1: word A
2: word ... store previous operand
3: word ... fetch next operand
4: word B
So, we can write code to compute A = B - C as follows, using XXXX, YYYY and
ZZZZ as filler memory addresses for the strangeness
introduced by this instruction set.
word B
word XXXX
word C
word acc
word YYYY
word sub
word acc
word YYYY
word ZZZZ
word A
Part B: Here is a register-transfer implementation of this instruction execution unit, ignoring the problem of branch instructions.
_____________
| _________ |
__\|_|/__ __|_|__
--|>___PC___| |__+2___|
| | | /| |\
| | |_________| |__________________\ read
| |______________________ _______ address
| __________| |_______/
CLK --o | ________| |_______ data
| | | _\| |/_ \
| | | |__+1___|
| | | | |_______\ read
| | | |_________ address
| ___________| |__________________/
| | _________| |__________________ data
| | | | | \
| __\|_|/__ __\|_|/__
o--|>__DST___|-|>__SRC___|
| | | | |__________________\ read
| | | |____________________ address
| | | ____________________/
| | | | __________________ data
| | | __\| |/__ \
o------| |-----|>__TMP___|
| | |_________| |__________________\ write
| |___________| |__________________ address
| | |__________________<
| |____________________ data
| /
---------------------------------------> write strobe
Part C: Here is the above machine, with branch logic added:
_______________________________
| ___________________________ |
| | ___________ | |
| | | _______ | | |
_|_|_|_|_ | | | |
_\ 1 0 / | | | |
| \_MUX_/ | | | |
| | | | | | |
| __\|_|/__ __|_|__ | |
---|>___PC___| |__+2___| | |
| | | | /| |\ | |
| | | |_________| |_____________| |__\ read
| | |___________________ _____| |__ address
| | _______| |_____| |__/
CLK -o | | _____| |_____| |__ data
| | | | _\| |/_ | | \
| | | | |__+1___| | |
| | | | | |_____| |__\ read
| | | | |_______| |__ address
| | ___________| |_____________| |__/
| | | _________| |_____________| |__ data
| | | | | | | | \
| | __\|_|/__ __\|_|/__ | |
o---|>__DST___|-|>__SRC___| | |
| | | | | |_____________| |__\ read
| | | | |_______________| |__ address
| | | | _______________| |__/
| | | | | _____________| |__ data
| | | | __\| |/__ | | \
o-------| |-----|>__TMP___| | |
| | | |_________| |_____________| |__\ write
| | |___________| |______ ____| |__ address
| | | |______| |____| |__<
| | |________| |_________ data
| | __|_|__ /
| | |_=FFFF_|
| | | ___
| --------------------------o--o|and|
---------------------------------|___|--> write strobe
Here is the pseudocode for the above machine:
repeat if dst <> FFFF then M[dst] = tmp else pc = tmp; tmp = M[src]; src = M[pc]; dst = M[pc + 1]; pc = pc + 2 foreverNote that this logic does nothing to eliminate visible delay slots! Programming this machine will definitely be strange!
Part D: This pipelined IEU has no operand delay slots, other than the strange relationship between source and destination addresses being encoded in different instructions.
Part E: This pipelined IEU has one branch delay slots in addition to the strange semantics of the move instruction. Therefore, a branch would look like this:
word DST -- start the branch
word XXXX
word YYYY -- finish the branch
word pc
word ZZZZ -- delay slot filler
word YYYY