22C:122/55:132 Notes, Lecture 4, Spring 2001

Douglas W. Jones
University of Iowa Department of Computer Science

Review of the Register Transfer Level
To continue our brief review, we will quickly go over a set of fundamental register transfer components and then review the process of compiling code from a high level programming language to register transfer form.
Bundles of Wires
A sigle bit signal in a digital system is generally represented as a line, the conventional schematic symbol for a wire:
```
	one bit ----------------------
	
```
When multiple bits are involved, there are a number of equivalent notations:
```
	     D  ----------------------
              1
	     D  ----------------------
              2
	     D  ----------------------
              3
		   is the same as

	     D  =========/============  (thickness indicates
	      1-3         3              multiple wires)

		   or the same as

             D  ---------/------------  (careful use of a
	      1-3         3              count indicates
                                         multiple wires)
	
```
Making a distinction between "fat lines" and "thin lines" is helpful, but it takes more work to draw the pictures this way. Where there is only one line-thickness, slashes and wire counts are needed on each line representing a bundle, and this must be repeated as needed to make it clear that some particular line in a schematic diagram represents more than one wire. When "fat line" notation is used for bundles, particularly when there is a standard word size, no "wire count" or slash is needed for those bundles where the count is obvious from context.
Where it is necessary to show a bundle being broken into its individual wires, the following notation is common:
```
	           _
	            |          _
	     D  ----|         |                _
              1     |         |---- D         |
	     D  ----|===/=====|      1        |---- D
              2     |    3    |==========/====|      2
	     D  ----|         |_          2   |---- D
              3    _|                         |_     3
	
```
In using this notation, particularly when showing a bundle being broken into sub-bundles, it is important ot make it clear using subscripts or wire names which wires go in which bundle!
Registers
A register is a collection of identical flipflops all sharing a common clock. For example, we migh construct a 3 bit positive pulse-triggered latch from 3 positive pulse-triggered D latches as:
```
          D                D                D
           2                1                0
          |   _______      |   _______      |   _______
          |  |       |     |  |       |     |  |       |
           --|D     Q|--    --|D     Q|--    --|D     Q|--
             |   _   |  |     |   _   |  |     |   _   |  |
             | _| |_ |  |     | _| |_ |  |     | _| |_ |  |
             |       |  |     |       |  |     |       |  |
           --|C      |  |   --|C      |  |   --|C      |  |
          |  |_______|  |  |  |_______|  |  |  |_______|  |
          |             |  |             |  |             |
        C-o-------------|--o-------------|--              |
                        |                |                |
                        Q                Q                Q
                         2                1                0
	
```
Generally, we will avoid drawing out the gate-level components, and instead, we will simply abbreviate this register as:
```
	             D
	              2-0
	             |
                     /3
	     ________|________
	    |   _             |
	C---|>_| |_    R      |
	    |_________________|
                     |
                     /3
                     |
                     Q
                      2-0
	
```
The commonly used notation clearly marks, on the bundled wires, the number of conductors, while leaving it to the reader to infer that the register must have that number of bits. External connections are generally given names, with subscripts indicating the number of conductors and their order. Thus, if the input is labeled D_2-0, the most significant bit is D₂ and the least significant bit is D₀.
It is also convenient to name the register itself. In the above figure, the register is named R, and the label could have been extended with subscripts, for example, R_2-0 to indicate that R has 3 bits.
There is no universally accepted ordering for the numbering of the bits in a byte or word. It makes good sense to number them consecutively, but numbering may start at 0 or 1, and numbering may start at the most significant or least significant end. I prefer numbering starting at 0 from the least significant end, so bit i has the binary weight 2ⁱ if the word contains an integer. The numbering from the most significant end makes excellent sense for fixed point fractions, where bit i has the binary weight 2^-i; this is how the designers of most computers of the 1940's and early 1950's thought, because they were motivated by problems representing the real numbers used in scientific computing and not by integers.
Designers, of course, are free to label things any way they find convenient, but generally, once a labeling system is found, it is best to stick to it! Mixing labeling schemes leads to chaos, so, for example, the label R_0-2 could have been used in to name the register in the above figure, leaving the labels on the wires unchanged, but in the context of the labels on the inputs and outputs, this would be in very poor taste because it implies that R₀ is the most significant bit, connected to the input D₂ and output Q₂. This kind of confusion is unavoidable where components from different designers or manufacturers must be interconnected, but it should be avoided whenever possible!
In theory, no digital system designer needs more than one type of edge triggered register, but optimal designs sometimes require more types. So, it is handy to keep at least the following 4 types in mind:
```
                     |
             ________|_______
            |   _            |    positive pulse
         ---|>_| |_          |    triggered
            |________________|    register
                     |                             |
                     |                     ________|_______
                        positive edge     |   _            |
                        triggered      ---|>_|             |
                        register          |________________|
                     |                             |
             ________|_______                      |
            | _   _          |    negative pulse
         ---|> |_|           |    triggered
            |________________|    register
                     |                             |
                     |                     ________|_______
                        negative edge     | _              |
                        triggered      ---|> |_            |
                        register          |________________|
                                                   |
                                                   |
        
```
The pulse-triggered registers are made with type D latches, while the edge-triggered registers are made with edge-triggered or master-slave flipflops. A simple inverter on the clock line into a positive pulse or edge triggered device will convert it to a negative pulse or edge triggered device.
It is worth remembering that register transfer level designers rarely make any explicit use of the Qbar outputs of registers, but that these outputs are available at almost no cost. If a register is located, in the physical realization of the circuit, adjacent to some function that requires the inverted outputs, it is quite reasonable to extract them directly from the register without adding additional inverters. On the other hand, if the inverse is needed at some distance from the register, the cost of the added inverters is frequently less than the cost of the extra interconnections needed to move both the Q outputs and the Qbar outputs from the register.

Multiplexors

When some logic element must process data from multiple sources, a multiplexor can be used to select the sourde. A 2-input 1-output multiplexor is composed of elements with the following truth table:

	  A  B  C | D
	 ---------|---
	  0  0  0 | 0
	  0  0  1 | 0
	  0  1  0 | 0
	  0  1  1 | 1
	  1  0  0 | 1
	  1  0  1 | 0
	  1  1  0 | 1
	  1  1  1 | 1

Here, when the control input C is 0, the output D is equal to the input A, while when the control input is 1, the output is equal to input B. The gate-level realization of a multiplexor is:

                    ____
         A --------|    \
                   |     )O--     
              -----|____/    |    _____
             |                ---O\    \
             |                     )    )------- D
             |      ____      ---O/____/
         B --|-----|    \    |
             |     |     )O-- 
             |   --|____/
             |  |
             |  o
             | /_\
             |  |
         C --o--

In register transfer diagrams, this function is frequently drawn using one or the other of the following notations:

                    A     B             A     B
	            |     |             |     |
	           _|_____|_           _|_____|_
	          | 0     1 |         \ 0     1 /
	    C ----|   MUX   |    C ----\       /
	          |_________|           \_____/
                       |                   |
                       |                   |
                       C                   C

This notation may be used for either single bit or multiple bit multiplexors, with the presence of multiple bits indicated by "bundle" notation on the interconnections and not by anything within the box. The slope-sided notation is commonly used without a label by designers who limit the use of this symbol to multiplexors and use it for nothing else. Other designers use the slope-sided notation for any function that combines multiple words to make one word. When that notation is used, a function label is, of course, required.

Multiplexors for more than 2 inputs may be made by cascading 2-input multiplexors, as:

                    A     A             A     A
                     0     1             2     3
	            |     |             |     |
	           _|_____|_           _|_____|_
	          \ 0     1 /         \ 0     1 /
	    C ---- \       /-----------\       /
	     0      \_____/             \_____/
                       |                   |
                       |                   |
                        ------       ------
	                      |     |
	                     _|_____|_
	                    \ 0     1 /
	    C -------------- \       /
	     1                \_____/
                                 |
                                 |
                                 D

But, this adds delays. A multiplexor is a simple combinarional circuit, so it is possible to create a multiplexor with any number of inputs that has a maximum delay of 3 gate delays. An optimal design for a 4-input multiplexor can be given as follows:

                C       C
                 1       0
                |       |
                o--     o--
                | _|_   | _|_
                | \ /   | \ /
                |  O    |  O
                |  |    |  |   ___
               -|--o----|--|--|   \ 
               -|--|----|--o--|    )O----
          A ----|--|----|--|--|___/      |
           0    |  |    |  |   ___       |   |
               -|--o----|--|--|   \       ---|
               -|--|----o--|--|    )O--      |___
          A ----|--|----|--|--|___/    ------|   \
           1    |  |    |  |   ___           |    )O---- D
               -o--|----|--|--|   \    ------|___/
               -|--|----|--o--|    )O--      |
          A ----|--|----|--|--|___/       ---|
           2    |  |    |  |   ___       |   |
               -o--|----|--|--|   \      |
               -|--|----o--|--|    )O----
          A ----|--|----|--|--|___/
           3    |  |    |  |

Both the multiplexor tree and the optimal design are likely to be drawn as follows in a register-transfer-level logic diagram:

                    A     A     A     A
                     0     1     2     3
	            |     |     |     |
	           _|_____|_____|_____|_
	          \ 00    01    10    11/
	C ----/----\                   /
	 1-0   2    \_________________/
                             |
                             |
                             D

The dimension on the control input to this multiplexor can be inferred from the number of inputs to the multiplexor, but the order of the subscripts cannot. The latter is essential to the determination of which combination of control inputs will select what data input. The labels on the inputs can be given in any system that is obvious, in context. Binary and hexadecimal labels are common, and symbolic labels are sometimes used when there is a binding of the bit patterns on the control input to some set of symbols.

General Combinational Elements

If some combinational logic element is to be replicated for each bit in a word, this can be easily represented as follows:

	        A         B
	        |         |
	        /         /
	     ___|_________|___
	    |            _    |
	    |   F = A or B    |
	    |_________________|
                     |
                     / 
                     |
                     F

The above example indicates that the identical function is computed for each pair of corresponding input bits in order to compute the corresponding output. For a single bit i, the example computation is as follows:

                   _____
           A ------\    \
            i       ) OR )------- F
           B -----O/____/          i
            i

Combinational Elements with Control Inputs

It is quite common to discover that, along some data path within a digital system, some selection must be made between combinational functions. Typically, this can always be done as follows, using a brute-force implementation:

	                      A     B
	                      |     |
	             ---------o-----|---
	            |               |   |     
	            |      ---------o---|-----
	            |     |             |     |
	           _|_____|_           _|_____|_
	          | x     y |         | x     y |
	          |    _    |         |     _   |
	          |  z=x|y  |         |  z=xy   |
	          |         |         |         |
	          |____z____|         |____z____|
                       |                   |
                        ------       ------
	                     _|_____|_
	                    \ 0     1 /
	    C -------------- \       /
	                      \_____/
                                 |
                                 F

We will frequently abbreviate such circuits using the following notation:

	        A         B
	        |         |
	     ___|_________|___
	    |   x         y   |
	    |         _       |
	    | 0   z = x|y     |
	C --|          _      |
	    | 1   z = xy      |
	    |                 |
	    |________z________|
                     |
                     |
                     F

Here, we use the same convention as was used with multiplexors, control inputs are shown from the left when data inputs are from the top, or from the bottom when data inputs were from the side.

Incrementing a Binary Number
To increment a binary number, we must compute a value for each bit that depends both on that bit and on the less significant bits. Consider the following worked example, adding 1 to 1011₂:
```
                    0 0 1 1    - the carry bits
		N     1 0 1 1
		  +         1
                 -------------
               N+1  0 1 1 0 0
	
```
Each column of this sum produces a 2-bit result, with values ranging from 0 to 2. The least significant bit of each single-column addition is the result bit for that column. The most significant bit is the carry into the next most significant column. The result of the increment operation always has the potential to require one more bit than the input.
A simple truth table can be constructed to describe what is done for each column:
```
	      in    +  out
	     n  c   | c     s
	      i  i  |  i+1   i
	    --------|----------
	     0  0   | 0     0
	     0  1   | 0     1
	     1  0   | 0     1
	     1  1   | 1     0
	
```
Here, we have labeled the carry inputs to bit i of the number as c_i and the carry output from bit i as c_i+1. It is easy to see the following relations from the truth table:
- c_i+1 = c_i and n_i
- s_i = c_i xor n_i
We call this logic circuit a half-adder because it does almost half of the work required to add two binary numbers.
The full increment function on a 3 bit number can be implemented by a series of these 1-bit operations:
```
                N         N         N
                 2         1         0
                |         |         |     0
                |  __     |  __     |  ___|
              __|_|  |  __|_|  |  __|_|   
             |  n c| | |  n c| | |  n c|   
             |  +  | | |  +  | | |  +  |
             |c_s__| | |c_s__| | |c_s__|
          C __| |    |__| |    |__| |
           3    |         |         |
                |         |         |
                S         S         S
                 2         1         0
	
```
At the register transfer level, we will typically abbrevaite this as one of the following:
```
	      N                N                N
              |                |                |
	      / 3              / 3              / 3 C
	 _____|_____      _____|_____      _____|___|_
	|           |    |           |    |           |
	|    +1     |    |    +1     |    |    +1     |
	|___________|    |___________|    |___________|
              |                |            |   |
	      / 3              / 4          C   / 3
              |                |                |
              S                S                S
	
```
In the leftmost example, the carry out is ignored! In the center example, the carry out is incorporated into the output as a 4th bit. In the rightmost example, the carry out is extracted as a distinct output. The form used depends on what is needed in context!
In the 2 left examples above, the C input to the least significant bit of the adder chain has been ignored. We assume that it is one if it is not mentioned. In the rightmost example, the C input is brought out explicitly so that it may have a value other than 1.
We will also occasionally use the following form:
```
	           N
	           |
	     ______|______
	    |      x      |
	    | 0: f = x    |
	C --|             |
	    | 1: f = x+1  |
	    |______f______|
                   |
                   |
                   S
	
```
Here, we have treated the C input as a control input for selecting between two functions, increment and do nothing.
Adders
A full adder for two binary numbers is described by the following truth table
```
	      in      +  out
	     a  b  c  | c     s
	      i  i  i |  i+1   i
	    ----------|----------
	     0  0  0  | 0     0
	     0  0  1  | 0     1
	     0  1  0  | 0     1
	     0  1  1  | 1     0
	     1  0  0  | 0     1
	     1  0  1  | 1     0
	     1  1  0  | 1     0
	     1  1  1  | 1     1
	
```
Here, aside from adding a new input, we have labeled things compatably with the increment example. Inspection gives the following implementation:
- s_i = a_i xor b_i xor c_i
- c_i+1 = (a_i and b_i) or (a_i and c_i) or (b_i and c_i)
Alternatively, we note that we can construct this function from two of the half-adder circuits used for incrementing:
```
                        a         b         c
                         i         i         i
                        |         |         |
                        |  __     |  _______|
                      __|_|  |  __|_| 
                     |  n c| | |  n c|
                     |  +  | | |  +  |
                     |c_s__| | |c_s__|
               _____  | |    |  | |
              /    /--  |     --|-
          ---(    (     |       |  
         |    \____\----|-------   
         |              |          
         c              s         
          i+1            i       
	
```
Here, we have violated a standard rule of logic schematic diagrams: Data should always flow from left to right and top to bottom. The reason is, of course, that the convention from arithmetic, that carry is propagated from right to left, has been allowed to dominate!
The one-bit increment components are referred to as half adders precisely because it takes two of them to make a full adder as shown above.
We will usually abbreviate adders as follows:
```
	        A         B
	        |         |
	     ___|_________|___
	    |   x         y   |
	    |                 |
	    |    f = x + y    |
	    |                 |
	    |________f________|
                     |
                     |
                     S
	
```
The options for treating carry out from such an adder and the options for treating carry into the least significant bit can be dealt with in the same way that they were for the incrementor.
If speed is of the essence, as it is in most high performance computer architectures, the "ripple carry" approach to computing the sum outlined above is not very good. The adder outlined above has a speed proportional to O(n) on an n bit word. There are practical ways of building an adder with O(log n) speed, and of course, optimal design methods can always be applied to give something with 3 gate delays, for an apparent speed of O(1). The latter is only illusion, though, because it requires gates with O(n) inputs and O(n) fanout, and the speed of such gates tends to fall of as O(log n)! As a result, O(log n) speed is the realistic goal for designers of fast arithmetic hardware.

Subtractors

Subtraction of binary numbers can be handled by adding the two's complement, and the two's complement is one plus the one's complement. This leads to the following naive design for a subtractor:

                          B
                          |   
                       ___|___
                      |       |
                      |  not  |
                      |_______|
                          |   
                       ___|___
                      |       |
                      |  +1   |
                A     |_______|
                |         |
             ___|_________|___
            |   x         y   |
            |                 |
            |    f = x + y    |
            |                 |
            |________f________|
                     |
                     |
                     S

Noting that the C_in input to the adder can also be used to add one to the sum, we can collapse this to:

                        B
                        |   
                     ___|___
                    |       |
                    |  not  |
                A   |_______|    1
                |       |    ____|
             ___|_______|___|_ 
            |   x       y  C  |
            |               in|
            |                 |
            |  f = x + y + C  |
            |               in|
            |  C              |
            |___out__f________|
               |     |
               |     |
               C     S

This is a common way to realize a subtractor, and it has the consequence that C_out=1 indicates that there was no borrow and C_out=0 indicates that there was a borrow.

Of course, we can also simply design the truth table for the subtractor from first principles and implement it directly.

Arithmetic Logic Units

While adders and subtractors are useful and sufficient, it is common to design single functional units that can perform both jobs. Such a unit is usually referred to as an Arithmetic Logic Unit. Here is one typical design:

              A            A            A
               2   B        1    B      0     B
              |     2      |      1     |      0
              |     |      |     |      |     | 
          F --|---o-|------|---o-|------|---  | 
           O  |   |_|      |   |_|      |   |_| 
              |  |XOR|     |  |XOR|     |  |XOR|
              |  |___|     |  |___|     |  |___|
              |  __|       |  __|       |  __|  
              | |          | |          | |
          F --|-|-o--------|-|-o--------|-|- 
           1  | | |  __    | | |  __    | | |  ----F
              | | |_|  |   | | |_|  |   | | |_|     2
              | ||AND| |   | ||AND| |   | ||AND|
              | ||___| |   | ||___| |   | ||___|
             _|_|__|   |  _|_|__|   |  _|_|_|_ 
            | a b c |  | | a b c |  | | a b c |   
            |   +   |  | |   +   |  | |   +   |
            |c__s___|  | |c__s___|  | |c__s___|
             |  |      |__|  |      |__|  |
         C --   |            |            |
          out   |            |            |
                S            S            S
                 2            1            0

This example ALU computes 8 possible functions of A and B, although only 6 are distinct from each other. These are enumerated below:

	 F   F   F  |            operation
	  2   1   0 |        
	------------|------------------------------------
	 0   0   0  |  S = A xor B
	 0   0   1  |  S = A equ B       = A xor (not B)
	 0   1   0  |  S = A + B
	 0   1   1  |  S = (A - B) - 1   = A + (not B)
	 1   0   0  |  S = A xor B
	 1   0   1  |  S = A xor (not B) = A equ B
	 1   1   0  |  S = A + B + 1
	 1   1   1  |  S = A - B         = A + (not B) + 1

This ALU design is typical of many commercial ALU designs! One typical feature is that many of the control input combinations produce operations of doubtful utility. If some carefully selected and gates in an optimal version of the adder are given additional enable inputs, the result is a few more useful operations and many more useless ones, with the addition of only one or two additional control lines. Because of this, it useful systems almost always hide the ALU control lines from the user, and instead, run the user's operation code through a small ROM (table lookup) to select from among the useful ALU operations without exposing the user to the useless or redundant operations.