Question 1 Give an efficient macro to load a 32 bit constant.
The following shows some cute optimizations for short constants. The final case suffices!
MACRO LI rd, const IF (const <= 127) & (const >= -128) LIS rd, const ELSEIF (const <= #7FFFFF) & (const >= -#800000) LIL rd, const ELSE LIL rd, const >> 8 ORIS rd, const & #FF ENDIF ENDMAC
Question 2 Give a macro to branch to an arbitrary absolute address.
Here, we assume that rt is a designated temporary register. Again, a cute optimization is included, but the final case suffices!
MACRO BR dst IF (dst <= #7FFF) & (dst >= #8000) JSR R0, R0, R0, dst ELSE LI rt,const JSRR R0,rt ENDIF ENDMACQuestion 3 Present macros for load and store operations on 16 bit word and 8 bit byte operands.
The most important use of byte and halfword addressing is in arrays of bytes and halfwords, and to handle these, we need to follow byte pointers in registers. This is easy for load:
MACRO LOADRB dr, rx LOADR dr, rx EXTB dr, dr, rx ENDMAC MACRO LOADRW dr, rx LOADR dr, rx EXTW dr, dr, rx ENDMACThis is harder for store, requiring 2 twmporary registers, rt and rtt, and requiring an awkward computation to clear the destination field of the destination word. A NOT instruction would help, and 3-operand stuff instructions would be even better:
MACRO STORERB sr, rx LOADR rt, rx LI rtt, #FF STUFFB rtt, rx ; make mask SUB rtt, R0, rtt ; 2's comp (we want 1's comp!) SUBI rtt, 1 ; finish 1's comp of mask AND rt, rtt ; turn off target byte STUFFB rtt, sr ; align byte to be stored OR rt, rtt ; push it into cleared place STORER rt, rx ENDMAC MACRO STORERW sr, rx LOADR rt, rx LI rtt, #FFFF STUFFW rtt, rx ; make mask SUB rtt, R0, rtt ; 2's comp (we want 1's comp!) SUBI rtt, 1 ; finish 1's comp of mask AND rt, rtt ; turn off target byte STUFFW rtt, sr ; align byte to be stored OR rt, rtt ; push it into cleared place STORER rt, rx ENDMAC
Question 4 Find all the pipeline interlocks required in a natural pipelined implementation of this architecture.
Asume, for example, the following pipeline, with Rn being the interstage register after stage n:
Question 5 What problems would this architecture pose for a superscalar pipelined implementation?
First, it is obvious that it should be implemented using a superscalar pipeline, since instructions will likely be fetched 32 bits at a time but many instructions are only 16 bits.
Both pipes are identical in structure so there is no pipe selection problem. Interlocks become more complex, though, and the logic to get the extended constants for 32 bit format instructions get more complex!
Branches to an odd address must fetch the full 32 bit word but replace the even half with a NOOP.
Condition code dependancies raise problems that were solved for the simple pipeline by stating that all checking and setting of condition codes is done in the ALU stage. These problems are somewhat reduced by the small fraction of the instruciton set that uses condition codes!
Question 6 This instruction set is missing instructions for housekeeping and interrupt response. There is some room in the instruction set for expansion, but it may not be obvious. List the opcodes that can be used for such applications.
Any instruction with rd=0000 that does not set condition codes or have other side effects is a no-op in this architecture. A single one of these should be reserved as the official NOOP, but the remainder are available for other functions. These are, at the very least: STORE LEA LIL LIS ORIS.
In addition, when rs2 is 0000 on ADD SUB OR XOR the instruction becomes equivalent to MOVER, and when rs2 is 0000 on AND, it becomes equivalent to many other ways of zeroing a register. Therefore, all of these combinations can be stolen for extensions to the instruction set!
Note that rs1=0000 is very useful for subtract (it becomes negate) but that it may also be a useless code for the other arithmetic operations, including the EXT instructions. EXT with rs2=0 is not useless! It is the way to truncate a word to half or byte size!
A shift count of zero is another example of a no-op or MOVER equivalent; This occurs when the rs field is zero on any of the shift instructions! These can all be used for other purposes.
ADDI and SUBI with rs=0000 are also either NOP or MOVER equivalents, depending on whether or not they set the condition codes. In either case, they are available for other uses.
The set of conditional branches offered by Bcc probably uses the rd field to select the condition, allowing 16 conditions. Given NZVC, the traditional condition codes, 8 are needed to test individual codes, and 6 more for common relational tests. One combination would typically mean branch always, and a final one would commonly mean branch never. This latter offers an 8 bit constant field for some other purpose!
Question 7 Given your solutions to all of the above, plus everything you have learned in this course, present a summary evaluation of this architecture.
This summary is too short: Not bad, but seriously flawed!
In more detail, this is a commonplace RISC style instruction set, with a load-store approach to arithmetic and word addressing. It bends the common rules of RISC design by having a variable instruction length, but we know how to handle this. It also may offend some because it still rests on condition codes, but in a simple pipelined implementation, these pose no problem, and in a superscalar implementation, it uses them sparingly enough that many interlocks can be avoided. There is room for expansion, and it is likely that expansion and small modifications to this instruction set could lead to a viable design.
Here are some details of what's wrong with it, as given:
LIS rm,#F; AND rt,rm,rs; LSL rt,2; LOAD rt,rt,x,table; JSRR r0,rt.
This may be shortened a bit by loading register rm once and using it 8 times during the software multiply routine, but a bit field extract instruction that both masks and shifts would help, as would an indirect indexed jump. Of course, bitfield extract should be complemented by a bitfield stuff.