Homework 6

22C:122/55:132, Spring 2001

Due Monday Feb 26, 2001, in class

Background: Consider the O(log n) carry lookahead adder given in the notes for Lecture 14. This design used a binary tree of lookahead units, and the notes commented that quatrenary tree was commonly used in production computers today. The 74LS381A and 74S182 documented (among other places) in Chapter 4 of the Iowa Logic Simulator manual are good examples of modern production devices of this sort.
Startlingly enough, my crystal ball indicates that word sizes of 3, 9 and 27 bits will be the next great craze, suggesting that instead of quatrenary trees, we should be using trinary trees for carry propagation. (OK, you don't believe me? The truth is, you can find plenty of worked examples with quatrenary trees, so I can't use that for an exercise!)
Part A: Design the carry-lookahead unit for a fast adder based on trinary trees. This will have inputs P_i, G_i, P_i+1, G_i+1, P_i+2, and G_i+2 from the three subtrees above it, and input C_i from the right; it will have outputs C_i+1 and C_i+2 to the left and center subtree above it, and it will have outputs P and G to its parent in the carry lookahead tree.
This circuit has 7 inputs; as a result, a truth table specification is unlikely to be useful. Specify this with either a logic diagram and a set of boolean equations.
Part B: Give an Iowa Logic Specification Language specification for this part.
Part C: Consider a 27 bit adder using this circuit. A[0..26] are the individual one-bit adders at the leaves of the tree. Draw those parts of the tree connecting A[15..20] to the root, including all relevant carry connections and the logic to compute carry out from the entire adder. Simply use elipsis to indicate connections to the leaves and subtrees that this assignment allowed you to omit.

Background: Consider the following Iowa Logic Specificaiton Language design for an incrementor:

circuit inc( range word );
   -- parts I: inc( 1..4 ) makes I into a 4 bit incrementor
   -- if I.cin is one, I.in is incremented by one and passed to I.out
   -- if I.cin is zero, I.in is output unchanged to I.out
   -- I.cout gives the carry out

   integer diff = 1 - first(word);

   circuit build( integer index, integer pin );
      -- recursively builds most of the circuit!
      integer diff = index - pin;

      -- all inputs and outputs are to points defined globally in inc
      parts
         g: and( index );
         inv: xor;
         if pin < last(word) then
            c: build( index + 1; pin + 1 );
         endif;

      wires
         in( pin ) to inv.in(1);
         g.out   to inv.in(2);
         inv.out to out( pin );
         for i in first(word) .. (pin - 1) do
            in(i) to g.in( i + diff );
         endfor;
         cin     to g.in(index);
   end; -- build

   inputs
      in(word),
      cin;

   outputs
      out(word);
      cout;

   parts
      inv: xor;
      g: and( size(word) + 1 );
      if size(word) > 1 then
         body: build( 2, first(word) + 1 );
      endif;

   wires
      -- first bit data paths
      in( first(word) ) to inv.in(1);
      cin to inv.in(2);
      inv.out to out( first(word) );

      -- other bits are connected in build

      -- carry out logic
      cin to g.in( size(word) + 1 );
      for i in word do
         in(i) to g.in( i + diff );
      endfor;
      g.out to cout;
end;

(The above circuit works quite well under the logic simulator!)

A Question: If we assume a constant delay per gate and no cost for interconnecting wires, this circuit would take a constant time to increment a number regardless of the number of bits.

Explain why no physical realization of this circuit will break O(log n). To do this, you will have to identify the component or components that pose special problems, and you will have to explain what those problems are that lead to the indicated performance.

Background: Consider adding a microcoded multiply operation to the Minimal CISC example architecture. Ignore completely the problem of wedging this instruction into the instruction set, and focus entirely on the problem of using the registers and functional units given in Figure 1 of the paper to do the job.
Part A: Assuming that, prior to multiplying, the multiplicand is in M[sp], assume the multiplier is in t, and assume the accumulator is zero. What new data paths and functional operations must be added to the machine to allow multiplication?
Part B: Give the microinstruction sequence you would need for a 16 bit multipy. Note that no loop counter is available, so your code will include 16 repeats of the same microinstructions. You need only show one repeat of your code if you document the repeating pattern properly.