Warning: Some of the exam questions will be related to these study questions, but none of the exam questions will be the same as these study questions!!
unsigned char a[1024]; unsigned short int i; unsigned int x; ... x = 0; for (i = 0; i < 1024; i++) x = x + a[i];Assume that type char is 8 bits, short int is 16 bits and int is 32 bits. Assume you are running this code on a good modern 32-bit pipelined machine with a load-store RISC architecture, I-cache, O-cache, MMU and most other features we have discussed. The above sentence gives enough information that you ought to be able to estimate many details of programming this machine even though you cannot give specific instruction encodings.
Reduce this code to the approximate machine code you would expect on such a machine, making effective use of registers but otherwise preserving the original program structure as much as possible.
Reorder the instructions in the code you just wrote to minimize the number of pipeline interlocks you'd expect, and then estimate the efficiency with which this program will utilize the machine's computational resources.
Rewrite the C source code to maximize the performance of the program. Your rewrite should be based on what you learned in the first parts of this exercise, focusing on algorithmic changes that will reduce the frequency of operand interlocks, improve the efficiency of cache usage, etc.
What problems must you face in adding logic to detect operand dependencies in the pipelined Ultimate RISC? Do the interlock issues depend on whether the operand is in RAM or the ALU address space?
How would you go about adding stalls to the pipelined Ultimate RISC in order to delay instructions until their operands are available?
How would you go about adding result forwarding to the pipelined Ultimate RISC in order to eliminate stalls? You could write answers to these questions in the form of modifications to the pseudocode given in the homework solution.
Does the pipelined Ultimate RISC pose any special problems for use with an instruction cache? With a data cache? Could you solve these problems using snooping cache technology? Can all memory addresses be cached, or must the cache be disabled for some addresses?
How would you add bus arbitration to this machine so that a conventional single memory bus could be used for access to main memory? Consider alternative placements of the bus arbitration logic relative to any caches and their impact on the performance of the system.
Could you apply branch prediction logic to this machine? What special problems does it pose?
Warning: Short answers to the above problems are of no value! Use these problems to help you think through issues, to find alternate interpretations of the material, and to find holes in your knowledge.
Warning: Do not send me your answers! I will not read or comment on them. I will not answer questions equivalent to "what is the answer to study question X?" Questions must be formulated in some other way!