Final Exam Study Questions

22C:122/55:132, Spring 2001

Warning: Excessive focus on these study questions will distract you from a broad review of the material on which the exam will be based! Many of the exam questions will not be related to these study questions!

Warning: Some of the exam questions will be related to these study questions, but none of the exam questions will be the same as these study questions!!

Background: Consider the following small fragment of C code:
```
	unsigned char a[1024];
	unsigned short int i;
	unsigned int x;

	...

	x = 0;
	for (i = 0; i < 1024; i++) x = x + a[i];
```
Assume that type char is 8 bits, short int is 16 bits and int is 32 bits. Assume you are running this code on a good modern 32-bit pipelined machine with a load-store RISC architecture, I-cache, O-cache, MMU and most other features we have discussed. The above sentence gives enough information that you ought to be able to estimate many details of programming this machine even though you cannot give specific instruction encodings.
Some questions:
Reduce this code to the approximate machine code you would expect on such a machine, making effective use of registers but otherwise preserving the original program structure as much as possible.
Reorder the instructions in the code you just wrote to minimize the number of pipeline interlocks you'd expect, and then estimate the efficiency with which this program will utilize the machine's computational resources.
Rewrite the C source code to maximize the performance of the program. Your rewrite should be based on what you learned in the first parts of this exercise, focusing on algorithmic changes that will reduce the frequency of operand interlocks, improve the efficiency of cache usage, etc.
Background: Consider the pipelined version of the Ultimate RISC given in the solution to Homework 9. As given, this has 2 branch delay slots and 1 operand delay slot. We have discussed a number of methods for dealing with delay slots.
Some questions:
What problems must you face in adding logic to detect operand dependencies in the pipelined Ultimate RISC? Do the interlock issues depend on whether the operand is in RAM or the ALU address space?
How would you go about adding stalls to the pipelined Ultimate RISC in order to delay instructions until their operands are available?
How would you go about adding result forwarding to the pipelined Ultimate RISC in order to eliminate stalls? You could write answers to these questions in the form of modifications to the pseudocode given in the homework solution.
Does the pipelined Ultimate RISC pose any special problems for use with an instruction cache? With a data cache? Could you solve these problems using snooping cache technology? Can all memory addresses be cached, or must the cache be disabled for some addresses?
How would you add bus arbitration to this machine so that a conventional single memory bus could be used for access to main memory? Consider alternative placements of the bus arbitration logic relative to any caches and their impact on the performance of the system.
Could you apply branch prediction logic to this machine? What special problems does it pose?

Warning: Short answers to the above problems are of no value! Use these problems to help you think through issues, to find alternate interpretations of the material, and to find holes in your knowledge.

Warning: Do not send me your answers! I will not read or comment on them. I will not answer questions equivalent to "what is the answer to study question X?" Questions must be formulated in some other way!