Please write your answers on the paper provided. Make sure your name is on the front page, in the same form as it appears in the official class list! Illegible and overly long answers are difficult to grade. Please leave margins around your answers! This exam is worth 30% of your final grade; allocate about 4 minutes per point!
Background Information: Consider the computer with the following characteristics, reprinted with minor corrections from the study questions:
Word size: 32 bits General Registers: 64 (includes PC) Instructions: _______________________________________________________________ |___|___________|___________|___________|___________|___________| OP R1 R2 op sh R3 OP = 00 operate R1 = shf(R2 op R3) ___________ op = |_____|_____| alu sr2 alu = 3 bit alu function select sr2 = 3 bit shift select for R2 operand sh = 6 bit signed shift count for result _______________________________________________________________ |___|___________|___________|___________________________________| OP R1 R2 const (18 bits) OP = 01 immediate constant to register R1 = R2 + sign-extend(const) OP = 10 load memory to register R1 = M[R2 + sign-extend(const)] OP = 11 store register to memory M[R2 - sign-extend(const)] = R1
Hint: A 2 instruction sequence is necessary and sufficient.
Hint: A 3 instruction sequence will suffice, including the 2 from question 1.
* straightforward: No superscalar execution, no branch anticipation, nothing else bizarre or exciting, just a basic pipeline for this machine.
Hint: You might want to at least briefly consider both the "canonical 5-stage pipe" of the class notes and the pipeline described for the DLX machine in Figure 3.1 on page 130 of the text before you start composing an answer to this problem.
______ ______ | IF |---o----+ | IF |---o----+ |______| __|__ | |______| __|__ | |____| |cache| | |____| |cache| | | ||_____| | | ||_____| | |______| | |______| | |____| | |____| | | MA |---o----+ ______ | MA |--------+ ______ |______| __|__ | | | |______| | | | |____| |cache| --|Memory| |____| -o---|Memory| | ||_____| |______| | | __|__ |______| |______| |______| |cache| |_____|Here, the MA stage represents a memory access stage in the pipe; the pipe may have multiple memory access states that perform read or write operations.
Part A: Explain why the left alternative above could not be easily incorporated into the architecture used for the running example in class, while the alternative presented on the right was considered appropriate. (2 points)
Part B: Why might you prefer the alternative on the left for the example architecture presented on this exam? (2 points)
We could steal 3 or 4 bits from the op and sh fields to make every operate instruction into a conditional skip instruction. 000 would mean never skip, other patterns would mean skip if result zero, skip if result positive, skip if negative, etc.
We could add condition codes to the system and steal 3 or 4 bits from the const field to allow each non-operate instruction to be conditional on the result of the most recent operate instruction.
Part A: Evaluated solely from the perspective of ease of implementation in a pipelined processor, which of these would be most desirable, and why. Your answer must both state the upside the desirable option and the downside of the undesirable option. (2 points)
Part B: Evaluated solely from the perspective of efficient instruction coding, which is likely to make more efficient use of program stream bandwidth. Your answer should be in the form of a persuasive qualitative argument. (2 points)
From CPU To Memory word address byte address |===============> ===============>| |--- byte number | mode __|__ ---------------->| f | 0=byte/1=word |_____| | | | <----------------- | | trouble ---- | __|_ | | 1|<===|==0 msb|<==|MUX | | To CPU | |___0|<===|=o==|msb data | | | | From memory <=======| ------ | | data | __|_ | |<====== | | 1|<===== | lsb|<==|MUX | | |___0|<========|lsbThis kind of alignment network can be generalized for byte, halfword and word access on a 32 bit machine, and reverse alignment networks can be constructed for store operations.
Part A: What memory access condition must this alignment network report as a trouble condition? (1 points)
Part B: Give a truth table for the combinational function f. Note that, for some combinations of inputs, some outputs may not be fully defined. (3 points)
The instruction set allows access to a non-aligned word operand in memory. On most machines where this is allowed, the pipeline stage that attempts a non-aligned access must stall in order to allow multiple memory cycles. Here, we propose an alternative design for a memory access pipeline stage:
|____| _____ | | MA1 |----|align|----+ |______| |_____| | |____| _____ | | MA2 |----|align|----+ |______| |_____| |memory ... |arbitration |____| _____ |bus | MAn |----|align|----+ |______| |_____| |To access memory, we have pipeline stages MA1 through MAn, each of which has, if needed, a memory port, and each of which has, as required, an alignment network. Simple memory accesses, for example, read byte from memory or write aligned word to memory, are handled by MA1. More complex accesses require the additional pipeline stages to perform memory cycles.
Part A: For a non-aligned word read from memory, how many pipeline stages are needed, what do they do, and how must the alignment net for each stage be set. (2 points)
Part B: For a single byte write to memory, how many pipeline stages are needed, what do they do, and how must the alignment net for each stage be set. (2 points)
Part C: For a non-aligned word write to memory, how many pipeline stages are needed, what do they do, and how must the alignment net for each stage be set. (2 points)
Part D: Given that the naive pipeline, with no consideration for non-aligned operands, had a single memory access stage that could read or write aligned or non-aligned bytes or words, how many stages do we really need? (1 points)