Homework 12 Solved

22C:122/55:132, Spring 2001

Douglas W. Jones

Problem A: What is the difference in switching delay between CPU and M for these two approaches?
With a simple crossbar switch, each memory reference is routed through one switch on the way to memory. With alternative arrangement, each memory reference is routed through 2 switches.

Problem B: How do these two approaches scale? Address both switching delay and number of components in your answer, as a function of N, the number of CPUs (equal to the number of Memory modules).
If we use 2x2 switches as our fundamental unit, we require (N/2)² of these to build a simple crossbar switch to connect N processors, and the switching delay will be a constant independent of N (until the wire length comes into play). If we use the multilevel scheme, we will need (N/2)log₂(N) such switching units, and the switching delay will be log₂(N) times the delay of a simple crossbar switch.

Problem C: Both systems require that there be some specialization in each crossbar swtich to customize it for its setting in the interconnection system. Clearly identify the aspects of this specialization that differ in the two interconnection schemes!
Each crossbar intersection (independent of the dimensions of the switching module) must make a connection when the address from the CPU references its outgoing bus. In a 2ⁿx2ⁿ crossbar, this depends on the least significant n bits of the address. In a hierarchical switching network of 2x2 crossbars, each intersection checks only one bit of the address.
Therefore, in general, we must be able to program each crossbar switch intersection with a mask that indicates which address bits it is to check, and with a pattern it checks for equality to, under this mask. All the switches in a 2x2 crossbar switch module will use the same mask, and all the switches in each column of the 2x2 module will compare with the same pattern, so we must be able to customize 3 "words" in each module.
A Problem: The logic diagram in section 2 of Lecture 36 for a simple set associative associative memory includes no provisions to determine whether an entry in the memory is valid or invalid. The final paragraph of the notes for lecture 35 assert that we must have some way to invalidate all entries in a cache. Describe how this can be done (it is up to you to determine how best to describe this!)
We need a valid bit attached to each entry in the cache. If we use static ram, where each bit is stored in a flipflop, logically formed of two nand (or two nor) gates. To invalidate all cache entries, we add an extra input to this flipflop connected to the invalidate line. Here is a reasonable design for one line of this RAM chip:
```
	          _          _
	         DD          R
	Word     ||          |                  |
	Select --||o---------|------------------|--
	(from    |||         |                  |
	address  |||   ___   |    ___           |
	decoder) ||o--|   |  o---|   |          |
	         |||  |AND|0-|---|AND|O-        |
	         |o|--|___|  | --|___|  |       |
	         |||   ___   ||   ___   |       |
	         || --|   |  ||  |   |  |   |\  |
	         ||   |AND|0-||--|AND|O-|-o-| >-o
	         o|---|___|  || -|___|  | | |/  |
	         ||          |||        | |     |
	         ||          || --------  |     |
	         ||          | -----------      |
	         ||          |                  |
	                                        Q 
```
The above one-bit cell of a static RAM has only one feature added, the Reset input to the upper right nand gate.
A Problem: Given a system with 4 CPUs, 4 memories and a crossbar switch, there are two natural places to add cache memories to the system in order to improve its performance. These are indicated in the following figure:
```
      ?   |_|   |_|  
CPU --c--|   |-|   |-
CPU --a--|___|-|___|-
      c   |_|   |_|  
CPU --h--|   |-|   |-
CPU --e--|___|-|___|-
      ?   | |   | |
         ???cache???
          | |   | |  
          M M   M M
```
In either case, we add 4 caches; in one case, one per CPU, and in the other case, one per memory. Contrast these! What problems does each pose for the cache designer, and what are the potential benefits of each approach.
If the cache is put between CPU and the crossbar switch, it reduces contention in the switch, but it makes it extremely difficult to solve the cache coherency problem. If the cache is put between the crossbar switch and memory, it does nothing to help with contention, but it poses no cache coherency problem.