repeat M[dst] = tmp; tmp = M[src]; src = M[pc]; dst = M[pc + 1]; pc = pc + 2 forever
Part A: This machine's instruction set differs from that of the Minimal Ultimate RISC discussed in class because there is no dst' register. As a result, if we look at an instruction sequence using a pipeline diagram, we find that the src and dst addresses for a single move are not fetched from the same instruction:
----|-------------|-------------|-------------|- src1/dst1 | src=src1 | tmp=M[src1] | M[dst2]=tmp | | dst=dst1 | | | ----|-------------|-------------|-------------|- src2/dst2 | | src=src2 | tmp=M[src2] | | | dst=dst2 | | ----|-------------|-------------|-------------|- src3/dst3 | | | src=src3 | | | | dst=dst3 | ----|-------------|-------------|-------------|- cycle 1 cycle 2 cycle 3Examining the above, it is clear that, to move A to B, one must write the following code:
1: word A 2: word ... store previous operand 3: word ... fetch next operand 4: word BSo, we can write code to compute A = B - C as follows, using XXXX, YYYY and ZZZZ as filler memory addresses for the strangeness introduced by this instruction set.
word B word XXXX word C word acc word YYYY word sub word acc word YYYY word ZZZZ word A
Part B: Here is a register-transfer implementation of this instruction execution unit, ignoring the problem of branch instructions.
_____________ | _________ | __\|_|/__ __|_|__ --|>___PC___| |__+2___| | | | /| |\ | | |_________| |__________________\ read | |______________________ _______ address | __________| |_______/ CLK --o | ________| |_______ data | | | _\| |/_ \ | | | |__+1___| | | | | |_______\ read | | | |_________ address | ___________| |__________________/ | | _________| |__________________ data | | | | | \ | __\|_|/__ __\|_|/__ o--|>__DST___|-|>__SRC___| | | | | |__________________\ read | | | |____________________ address | | | ____________________/ | | | | __________________ data | | | __\| |/__ \ o------| |-----|>__TMP___| | | |_________| |__________________\ write | |___________| |__________________ address | | |__________________< | |____________________ data | / ---------------------------------------> write strobe
Part C: Here is the above machine, with branch logic added:
_______________________________ | ___________________________ | | | ___________ | | | | | _______ | | | _|_|_|_|_ | | | | _\ 1 0 / | | | | | \_MUX_/ | | | | | | | | | | | | __\|_|/__ __|_|__ | | ---|>___PC___| |__+2___| | | | | | | /| |\ | | | | | |_________| |_____________| |__\ read | | |___________________ _____| |__ address | | _______| |_____| |__/ CLK -o | | _____| |_____| |__ data | | | | _\| |/_ | | \ | | | | |__+1___| | | | | | | | |_____| |__\ read | | | | |_______| |__ address | | ___________| |_____________| |__/ | | | _________| |_____________| |__ data | | | | | | | | \ | | __\|_|/__ __\|_|/__ | | o---|>__DST___|-|>__SRC___| | | | | | | | |_____________| |__\ read | | | | |_______________| |__ address | | | | _______________| |__/ | | | | | _____________| |__ data | | | | __\| |/__ | | \ o-------| |-----|>__TMP___| | | | | | |_________| |_____________| |__\ write | | |___________| |______ ____| |__ address | | | |______| |____| |__< | | |________| |_________ data | | __|_|__ / | | |_=FFFF_| | | | ___ | --------------------------o--o|and| ---------------------------------|___|--> write strobeHere is the pseudocode for the above machine:
repeat if dst <> FFFF then M[dst] = tmp else pc = tmp; tmp = M[src]; src = M[pc]; dst = M[pc + 1]; pc = pc + 2 foreverNote that this logic does nothing to eliminate visible delay slots! Programming this machine will definitely be strange!
Part D: This pipelined IEU has no operand delay slots, other than the strange relationship between source and destination addresses being encoded in different instructions.
Part E: This pipelined IEU has one branch delay slots in addition to the strange semantics of the move instruction. Therefore, a branch would look like this:
word DST -- start the branch word XXXX word YYYY -- finish the branch word pc word ZZZZ -- delay slot filler word YYYY