Power dissipation is recognized as a critical parameter in modern VLSI design field. To satisfy MOORE’S law and to produce consumer electronics goods with more backup and less weight, low power VLSI design is necessary.
Fast multipliers are essential parts of digital signal processing systems. The speed of multiply operation is of great importance in digital signal processing as well as in the general purpose processors today, especially since the media processing took off. In the past multiplication was generally implemented via a sequence of addition, subtraction, and shift operations. Multiplication can be considered as a series of repeated additions. The number to be added is the multiplicand, the number of times that it is added is the multiplier, and the result is the product. Each step of addition generates a partial product. In most computers, the operand usually contains the same number of bits. When the operands are interpreted as integers, the product is generally twice the length of operands in order to preserve the information content. This repeated addition method that is suggested by the arithmetic definition is slow that it is almost always replaced by an algorithm that makes use of positional representation. It is possible to decompose multipliers into two parts. The first part is dedicated to the generation of partial products, and the second one collects and adds them.
The basic multiplication principle is two fold i.e. evaluation of partial products and accumulation of the shifted partial products. It is performed by the successive additions of the columns of the shifted partial product matrix. The ‘multiplier’ is successfully shifted and gates the appropriate bit of the ‘multiplicand’. The delayed, gated instance of the multiplicand must all be in the same column of the shifted partial product matrix. They are then added to form the product bit for the particular form. Multiplication is therefore a multi operand operation. To extend the multiplication to both signed and unsigned numbers, a convenient number system would be the representation of numbers in two’s complement format. The MAC (Multiplier and Accumulator Unit) is used for image processing and digital signal processing (DSP) in a DSP processor. Algorithm of MAC is modified radix-2 Booth's algorithm.
1.1.1 Speed and Size
When performance of circuits is compared, it is always done in terms of circuit speed, size and power. A good estimation of the circuit’s size is to count the total number of gates used. The actual chip size of a circuit also depends on how the gates are placed on the chip ‘ the circuit’s layout. Since we do not deal with layout in this report, the only thing we can say about this is that regular circuits are usually smaller than non-regular ones (for the same number of gates), because regularity allows more compact layout. The physical delay of circuits originates from the small delays in single gates, and from the wiring between them. The delay of a wire depends on how long it is. Therefore, it is difficult to model the wiring delay it requires knowledge about the circuit’s layout on the chip. The gate delay, however, can easily be modeled by saying that the output is delayed a constant amount of time from the latest input. What we can say about the wiring delay is that larger circuits have longer wires, and hence more wiring delay. It follows that a circuit with a regular layout usually has shorter wires and hence less wiring delay than a non-regular circuit.
Therefore, if circuit delay is estimated as the total gate delay, one should also have in minded the circuit’s size and amount of regularity, when comparing it to other circuits. ‘Delay’ usually refers to the ‘worst-case delay’. That is, if the delay of the output is dependent on the inputs given, it is always the largest possible output delay that sets the speed. Furthermore, if different bits in the output have different worst-case delays, it is always the slowest bit that sets the delay for the whole output. The slowest path between any input bit and any output bit is called the ‘critical path’. If a circuit is to be speed up, it is always the critical path that should be attacked in the first place.
1.1.2 Basics of Multiplier
Multiplication is a mathematical operation that at its simplest is an abbreviated process of adding an integer to itself a specified number of times. A number (multiplicand) is added to itself a number of times as specified by another number (multiplier) to form a result (product). In elementary school, students learn to multiply by placing the multiplicand on top of the multiplier. The multiplicand is then multiplied by each digit of the multiplier beginning with the rightmost, Least Significant Digit (LSD). Intermediate results (partial products) are placed one atop the other, offset by one digit to align digits of the same weight. The final product is determined by summation of all the partial-products. Although most people think of multiplication only in base 10, this technique applies equally to any base, including binary. Fig 1.1 shows the data flow for the basic multiplication technique just described. Each black dot represents a single digit.
Fig. 1.1 Basic Multiplication
Here, its assumed that MSB represent the sign of the digit. The operation of multiplication is rather simple in digital electronics. It has its origin from the classical algorithm for the product of two binary numbers. This algorithm uses addition and shift left operations to calculate the product of two numbers. Based upon the above procedure, we can deduce an algorithm for any kind of multiplication which is shown in Fig 1.2. Its possible to check at the initial stage that whether the product will be positive or negative or after getting the whole result, MSB of the results tells the sign of the product.
Fig. 1.2 Signed Multiplication Algorithm
1.1.3 Binary Multiplication
In the binary number system the digits, called bits, are limited to the set [0, 1]. The result of multiplying any binary number by a single binary bit is either 0, or the original number. This makes forming the intermediate partial-products simple and efficient. Summing these partial-products is the time consuming task for binary multipliers. One logical approach is to form the partial-products one at a time and sum them as they are generated. Often implemented by software on processors that do not have a hardware multiplier, this technique works fine, but is slow because at least one machine cycle is required to sum each additional partial-product. For applications where this approach does not provide enough performance, multipliers can be implemented directly in hardware. The two main categories of binary multiplication include signed and unsigned numbers. Digit multiplication is a series of bit shifts and series of bit additions, where the two numbers, the multiplicand and the multiplier are combined into the result. Considering the bit representation of the multiplicand x = xn-1′..x1, x0 and the multiplier y = yn-1′..y1,y0 in order to form the product up to n shifted copies of the multiplicand are to be added for unsigned multiplication. The entire process consists of three steps, partial product generation, partial product reduction and final addition.
Example: The simplest multiplication operation is to directly calculate the product of two numbers by hand. This procedure can be divided into three steps: partial product generation, partial product reduction and the final addition. To further specify the operation process, let us calculate the product of 2 two’s complement numbers, for example, 11012 (‘310) and 01012 (510), when computing the product by hand, which can be described according to fig 1.3.
Fig. 1.3 Multiplication calculations by hand
The first operand is called the multiplicand and the second the multiplier. The intermediate products are called partial products and the final result is called the product. However, the multiplication process, when this method is directly mapped to hardware, is shown in fig 1.3. As can been seen in the figures, the multiplication operation in hardware consists of PP generation, PP reduction and final addition steps. The two rows before the product are called sum and carry bits. The operation of this method is to take one of the multiplier bits at a time from right to left, multiplying the multiplicand by the single bit of the multiplier and shifting the intermediate product one position to the left of the earlier intermediate products.
Fig. 1.4 Multiplication operation in hardware
All the bits of the partial products in each column are added to obtain two bits: sum and carry. Finally, the sum and carry bits in each column have to be summed. Similarly, for the multiplication of an n-bit multiplicand and an m-bit multiplier, a product with n + m bits long and m partial products can be generated. The method shown in fig 1.4 is also called a non-Booth encoding scheme.