> > Dissertation: Design of truncated multiple constant Multiplication/Accumulation unit

# Essay: Dissertation: Design of truncated multiple constant Multiplication/Accumulation unit

• Subject area(s): Computer science essays
• Published: 24 July 2014*
• File format: Text
• Words: 6,849 (approx)
• Number of pages: 28 (approx)

## Text preview of this essay:

CHAPTER-1
Introduction
FIR filters are digital filters with finite impulse response. They are also known as non-recursive digital filters as they do not have the feedback a recursive part of a filter, even though recursive algorithms can be used for FIR filter realization. FIR filters can be designed using different methods, but most of them are based on ideal filter approximation. The objective is not to achieve ideal characteristics, as it is impossible anyway, but to achieve sufficiently good characteristics of a filter. The transfer function of FIR filter approaches the ideal as the filter order increases, thus increasing the complexity and amount of time needed for processing input samples of a signal being filtered. FIR filter design that uses window functions. The characteristics of the transfer function as well as its deviation from the ideal frequency response depend on the filter order and window function in use. Most FIR filter design methods are based on ideal filter approximation. The resulting filter approximates the ideal characteristic as the filter order increases, thus making the filter and its implementation more complex. The filter design process starts with specifications and requirements of the desirable FIR filter. Which method is to be used in the filter design process depends on the filter specifications and implementation. This chapter discusses the FIR filter design method using window functions. Each of the given methods has its advantages and disadvantages. Thus, it is very important to carefully choose the right method for FIR filter design. Due to its simplicity and efficiency, the window method is most commonly used method for designing filters.
FINITE impulse response (FIR) digital filter is one of the fundamental components in many digital signal processing (DSP) and communication systems. It is also widely used in many portable applications with limited area and power budget. A general FIR filter of order M can be expressed as
In order to avoid costly multipliers, most prior hardware implementations of digital FIR filters can be divided into two categories: Multiplierless based and memory based. Multiplierless-based designs realize MCM with shift-and add operations and share the common suboperations using canonical signed digit (CSD) recoding and common sub expression elimination (CSE) to minimize the added cost of MCM.
However, the area of delay elements is larger compared with that of the direct form due to the range expansion of the constant multiplications and the subsequent additions in the SAs. In, Blad and Gustafson presented high-throughput (TP) FIR filter designs by pipelining the carry-save adder trees in the constant multiplications using integer linear programming to minimize the area cost of full adders (FAs), half adders (HAs), and registers like algorithmic and pipelined registers.
1.1 Motivation
Reducing the area and delay run time management has been always among the most challenging issues in FIR Filter design. The core of many of these algorithms is the multiplication of a variable by a set of constants. The optimization of these multiplications can lead to important improvements in various design parameters like area is reduced in which cost will also reduces and best time can be achieved without any maximum delay.
1.2Problem Statement:
Low-cost FIR filter designs by jointly considering the optimization of coefficient bit width and hardware resources in implementations. As the Filter order increases the number of partial products in the multiplication increases. Although most prior designs are based on the transposed form. Direct FIR structure with faithfully rounded MCMAT leads to the smallest area cost and less delay time.
1.3Objective:
‘ Design of truncated multiple constant Multiplication/Accumulation unit
‘ Implementation of multiplier using Dadda and Wallace tree algorithms
‘ Achieve high speed and better resource utilization
CHAPTER-2
Literature Survey
2.1 Truncated multipliers
In the discussion to follow, it is assumed that an unsigned n-bit multiplicand A is multiplied by an unsigned n-bit multiplier B to produce an unsigned 2n-bit product P. For fractional numbers, the values for A, B, and P are
The multiplication matrix for P = A B is shown in Figure 1a. For most high-speed applications, parallel multipliers are used to produce the product.
In many computer systems, the 2n-bit products produced by the parallel multipliers are rounded to n bits to avoid growth in word size. As presented in [23] – [26], truncated multiplication provides an efficient method for reducing the hardware requirements of rounded parallel multipliers. With truncated multiplication, only the n + k most significant columns of the multiplication matrix are used to compute the product. The error produced by omitting the n , k least significant columns and rounding the final result to n bits is estimated, and this estimate is added with the n + k most significant columns to produce the rounded product. Although this leads to additional error in the rounded product, various techniques have been developed to help limit this error.
With the Constant Correction Truncated Multiplier presented in [24], a constant is added to columns n , 1 to n , k of the multiplication matrix. The constant helps compensate for the error introduced by omitting the n , k least signi cant columns (called reduction error), and the error due to rounding the product to n bits (called rounding error). The expected value of the sum of these error E total is computed by assuming that each bit in A, B and P has an equal probability of being one or zero. As described in [24], this gives
(2)
The constant Ctotal is obtained by rounding ,Etotal to n + k fractional bits, such that
(3)
where round(x) indicates that x is rounded to the nearest integer. The multiplication matrix for a truncated multiplier that uses this method is shown in Figure 1b.
In [26], the Variable Correction Truncated Multiplier is introduced. With this type of multiplier, the values of the partial product bits in column n , k , 1 are used to estimate the error due to leaving o the n , k least signi cant columns. This is accomplished by adding the partial products bits in column n , k , 1 to column n , k. To compensate for the rounding error that occurs when truncating the products bits in columns n , 1 to n , k, a rounding constant, Cround, is added to the multiplication matrix. Since each product bit has an equal probability of being one or zero and the rounding constant cannot go beyond column n , k, the value used for Round is
(4)
Which corresponds to the additive inverse of the expected value of the rounding error, truncated after column n , k. The correction constant is added by putting ones in columns n , 2 to n , k,
Truncated Multi-pliers have less average, mean square and maximum error for given values of n and k, but require more hardware. As discussed in , array multipliers can be implemented more efficiently as Vari-able Correction Truncated Multipliers and tree multipliers can be implemented more efficiently as Constant Correction Truncated Multipliers.
C. Variable Correction Truncated Multiplication Matrix
2.2 Truncated multiplier implementations
The Variable Correction Truncated Multiplication method provides an efficient method for re-ducing the power dissipation and hardware requirements of rounded array multipliers. With this method, the diagonals that produce the t = n , k least significant product bits are eliminated. To compensate for this, the AND gates that generate the partial products for column t , 1 are used as inputs to the modified adders in column t. Since the k remaining modified full adders on the right-hand-side of the array do not need to produce product bits, they are replaced by modified reduced full adders (RFAs), which produce a carry, but do not produce a sum. To add the constant that corrects for rounding error, k , 1 of the MHAs in the second row of the array are changed to modified specialized half adders (SHAs). SHAs are equivalent to MFAs that have an input set to one [7]. Array multipliers that use this method require t(t , 1)=2 fewer AND gates, (t , 1)(t , 2)=2 fewer full adders, and (t , 1) fewer half adders than standard array multipliers [26].
Figure 2b shows the block diagram of a 8 by 8 array multiplier that uses the Variable Correction Truncated Multiplication method. For this multiplier, n = 8, k = 2, and t = 6, which results in a hardware savings of 15 AND gates, 10 full adders, and 5 half adders. The two MFAs on the right-hand-side of the array are replaced by RFAs. The rounding correction constant Round = 0:25 2-8, is added by changing one of the MHAs in the second row to a SHA. For this example, only one MHA is modified since Round = 0:25 2-8 have a single ‘1’. This multiplier has a maximum absolute error of approximately 0:723 2-8. In comparison, an 8 by 8 rounded multiplier has a maximum absolute error of 0:5 2-8.
Figure 2. 8 by 8 Array Multipliers.
Figure 3. 8 by 8 Dadda Tree Multipliers
With tree multipliers, the bits of the multiplicand and multiplier are Ended to generate an n word by n bit partial product matrix. After this, half adders and full adders are used to reduce the partial product matrix to two rows, which are summed using a carry-propagate adder. Figure 3a shows the dot diagram of an 8 by 8 tree multiplier that uses Dadda’s method of partial product reduction [6]. In this gure, each partial product is represented by a dot, the outputs of each full adder are represented by two dots connected by a plain diagonal line, and the outputs of a half adder are represented by two dots connected by crossed diagonal line. An n by n multiplier that uses Dadda’s method of partial product reduction requires n2 AND gates to generate the partial products, n2 , 4n + 3 full adders and n , 1 half adders to reduce the partial products, and a (2n , 2)-bit carry-propagate adder to produce the product [7].
Tree multipliers can be efficiently implemented using the Constant Correction Truncated Multi-plier method. The hardware saved with truncated Dadda tree multipliers is t(t + 1)=2 AND gates and (t , 1)(t , 2)=2 full adders. The number of half adders saved is between 1 and t, and depends on the values of n and k. The size of the carry-propagate adder is reduced by t , 1 bits, and the k least significant adders in the carry-propagate adder do not need to produce sum bits. To add the correction constant, m of the half adders are changed to specialized half adders, where m corresponds to the number of ones in Total. Similar hardware savings can be achieved by multiplier trees that use other methods for reducing the partial product, such as Wallace tree multipliers [5] or multipliers that use compressors or higher order counters.
Figure 3b shows the dot diagram of an 8 by 8 truncated Dadda multiplier, which uses the Constant Correction Truncated Multiplication method [24]. For this multiplier, n = 8 and k = 3, so the t = 5 least significant columns of the dot diagram are eliminated. The correction constant C total = 0:625 2-8 is added by changing the two circled half adders to specialized half adders. This multiplier has a maximum absolute error of approximately 0:754 2-8. Compared to a standard 8 by 8 Dadda multiplier, this multiplier requires 15 fewer AND gates, 6 fewer full adders, 2 fewer half adders, and 4 fewer bits in the carry-propagate adder.
The Dadda multiplier is a hardware multiplier design invented by computer scientist Luigi Dadda in 1965. It is similar to the Wallace, but it is slightly faster (for all operand sizes) and requires fewer gates (for all but the smallest operand sizes).[1]
In fact, Dadda and Wallace multipliers have the same three steps:
1. Multiply (logical AND) each bit of one of the arguments, by each bit of the other, yielding results. Depending on position of the multiplied bits, the wires carry different weights, for example wire of bit carrying result of is 32.
2. Reduce the number of partial products to two layers of full and half adders.
3. Group the wires in two numbers, and add them with a conventional adder.
However, unlike Wallace multipliers that reduce as much as possible on each layer, Dadda multipliers do as few reductions as possible. Because of this, Dadda multipliers have a less expensive reduction phase, but the numbers may be a few bits longer, thus requiring slightly bigger adders.
To achieve this, the structure of the second step is governed by slightly more complex rules than in the Wallace tree. As in the Wallace tree, a new layer is added if any weight is carried by three or more wires. The reduction rules for the Dadda tree, however, are as follows:
‘ Take any three wires with the same weights and input them into a full adder. The result will be an output wire of the same weight and an output wire with a higher weight for each three input wires.
‘ If there are two wires of the same weight left, and the current number of output wires with that weight is equal to 2 (modulo 3), input them into a half adder. Otherwise, pass them through to the next layer.
‘ If there is just one wire left, connect it to the next layer.
This step does only as many adds as necessary, so that the number of output weights stays close to a multiple of 3, which is the ideal number of weights when using full adders as 3:2 compressors.
However, when a layer carries at most three input wires for any weight, that layer will be the last one. In this case, the Dadda tree will use half adder more aggressively (but still not as much as in a Wallace multiplier), to ensure that there are only two outputs for any weight. Then, the second rule above changes as follows:
‘ If there are two wires of the same weight left, and the current number of output wires with that weight is equal to 1 or 2 (modulo 3), input them into a half adder. Otherwise, pass them through to the next layer.
2.4 WALLACE TREE MULTIPLIER
A Wallace tree is an efficient hardwire implementation of a digital circuit that multiplies two integers.
The Wallace tree has three steps:
‘ Multiply (that is – AND) each bit of one of the arguments, by each bit of the other, yielding n2 results. Depending on position of the multiplied bits, the wires carry different weights, for example wire of bit carrying result of a2b3 is 32.
‘ Reduce the number of partial products to two by layers of full and half adders.
‘ Group the wires in two numbers, and add them with a conventional adder.
‘ Take any three wires with the same weights and input them into a full adder. The result will be an output wire of the same weight and an output wire with a higher weight for each three input wires.
‘ If there are two wires of the same weight left, input them into a half adder.
‘ If there is just one wire left, connect it to the next layer.
Multiplication of two 4 bit numbers using Wallace Tree

‘ Arranging the partial products in the form of tree structure

‘ Example for 4bit Multiplication using Wallace tree:
‘ Wallace Tree Multiplier
‘ Each layer of the tree reduces the number of vectors by a factor of 3:2
‘ Minimum propagation delay.
‘ The benefit of the Wallace tree is that there are only O(log n) reduction layers, but adding partial products with regular adders would require O(log n)2 time.
‘ Wallace trees do not provide any advantage over ripple adder trees in many FPGAs.
‘ Due to the irregular routing, they may actually be slower and are certainly more difficult to route.
‘ Adder structure increases for increased bit multiplication.
2.5 ROUNDING TECHNIQUES
2.5.1 Rounding in decimal
The most fundamental fact associated with rounding is that it involves transforming some quantity from a greater precision to a lesser precision; for example, rounding a reasonably precise value like \$3.21 to the nearest dollar would result in \$3.00, which is a less precise entity.
Given a choice, we would generally prefer to use a roundingalgorithm that minimizes the effects of this loss of precision, especially in the case where multiple processing iterations ‘ each involving rounding ‘ can result in “creeping errors” (by this we mean that errors increase over time due to performing rounding operations on data that has previously been rounded). However, in the case of hardware implementations targeted toward tasks such as digital signal processing (DSP) algorithms, for example, we also have to be cognizant of the overheads associated with the various rounding techniques so as to make appropriate design trade-offs.
For the purposes of the following discussions, we will assume that the goal is to round to an integer value. In real-world applications we might wish to round to any particular digit (usually a fractional digit), but the principles are exactly the same.
A summary of the actions of the main rounding modes as applied to standard (sign-magnitude) decimal values.
2.5.2 Round-toward-nearest: This is perhaps the most intuitive of the various rounding algorithms. In this case, values such as 3.1, 3.2, 3.3, and 3.4 would round down to 3, while values of 3.6, 3.7, 3.8, and 3.9 would round up to 4. The trick, of course, is to decide what to do in the case of the half-way value 3.5. In fact, round-toward-nearest may be considered to be a superset of two complementary options known as round-half-up and round-half-down, each of which treats the 3.5 value in a different manner as discussed below.
2.5.3 Round-half-up: This algorithm, which may also be referred to asarithmetic rounding, is the one that we typically associate with the rounding we learned at grade-school. In this case, a half-way value such as 3.5 will round up to 4. One way to view this is that, at this level of precision and for this particular example, we can consider there to be ten values that commence with a 3 in the most-significant place (3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, and 3.9). On this basis, it intuitively makes sense for five of the values to round down and for the other five to round up; that is, for the five values 3.0 through 3.4 to round down to 3, and for the remaining five values 3.5 through 3.9 to round up to 4.
The tricky point with the round-half-up algorithm arrives when we come to consider negative numbers. In the case of the values -3.1, -3.2, -3.3, and -3.4, these will all round to the nearest integer, which is -3; similarly, in the case of values like -3.6, -3.7, -3.8, and -3.9, these will all round to -4. The problem arises in the case of -3.5 and our definition as to what “up” means in the context ofround-half-up. Based on the fact that a value of +3.5 rounds up to +4, most of us would intuitively expect a value of -3.5 to round to -4. In this case, we would say that our algorithm was symmetric for positive and negative values.
However, some applications (and mathematicians) regard “up” as referring to positive infinity. Based on this, -3.5 will actually round to -3, in which case we would class this as being an asymmetricimplementation of the round-half-up algorithm. For example, theround method of the Java Math Library provides an asymmetric implementation of the round-half-up algorithm, while the roundfunction in MATLAB provides a symmetric implementation. (Just to keep us on our toes, the round function in Visual Basic for Applications 6.0 actually implements the round-half-even [Banker’s rounding] algorithm discussed below.)
2.5.4 Round-half-down: This acts in the opposite manner to its round-half-up counterpart. In this case, a half-way value such as 3.5 will round down to 3. Once again, we run into a problem when we come to consider negative numbers, depending on what we assume “down” to mean. In the case of a symmetric implementation of the algorithm, a value of -3.5 will round to -3. By comparison, in the case of an asymmetric implementation of the algorithm, in which “down” is understood to refer to negative infinity, a value of -3.5 will actually round to -4.
As a point of interest, the symmetric versions of rounding algorithms are sometimes referred to as “Gaussian implementations.” This is because the theoretical frequency distribution known as a Gaussian distribution ‘ which is named for the German mathematician and astronomer Karl Friedrich Gauss (1777-1855) ‘ is symmetrical about its mean value.
2.5.5 Round-half-even: If half-way values are always rounded in the same direction (for example 3.5 rounds to 4 and 4.5 rounds to 5), the result can be a bias that grows as more rounding operations are performed. One solution toward minimizing this bias is to sometimes round up and sometimes round down.
In the case of the round-half-even algorithm (which is often referred to as Banker’s Rounding because it is commonly used in financial calculations), half-way values are rounded toward the nearest even number. Thus, 3.5 will round up to 4 and 4.5 will round down to 4. This algorithm is, by definition, symmetric for positive and negative values, so both -3.5 and -4.5 will round to -4.
In the case of data sets that feature a relatively large number of “half-way” values (financial records provide a good example of this), the round-half-even algorithm performs significantly better than the round-half-up scheme in terms of total bias. However, in the case of data sets containing a relatively small number of “half-way” values ‘ such as real-world values being applied to DSP algorithms ‘ the overhead involved in performing the round-half-even algorithm in hardware does not justify its use (see also the filter examples shown later in this paper).
2.5.6 Round-half-odd: This is the theoretical counterpart to the round-half-even algorithm, in which half-way values are rounded toward the nearest odd number. In this case, 3.5 will round to 3 and 4.5 will round to 5 (similarly, -3.5 will round to -3, and -4.5 will round to -5). The reason we say “theoretical” is that, in practice, the round-half-odd algorithm is rarely (if ever) never used because it will never round to zero (rounding to zero is often a desirable attribute for rounding algorithms).
2.5.7 Round-alternate: Also known as alternate rounding, this is similar in concept to the round-half-even and round-half-odd schemes discussed above, in that the purpose of the round-alternatealgorithm is to minimize the bias that can be caused by always rounding half-way values in the same direction.
In the case of the round-half-even approach, for example, it would be possible for a bias to occur if the data being processed contained a disproportionate number of odd and even half-way values. One solution is to use the round-alternate algorithm, in which the first half-way value is rounded up (for example); the next is rounded down, the next up, the next down, and so on.
2.5.8Round-random: This may also be referred to as random rounding or stochastic rounding, where the term “stochastic” comes from the Greek stokhazesthai, meaning “to guess at.” With this technique, in the case of “half-way” values, we effectively toss a metaphorical coin in the air and randomly (or pseudo-randomly) round the value up or down.
Although this technique typically gives the best overall result over a large number of calculations, it is only employed in very specialized applications, because the nature of this algorithm makes it difficult to implement and tricky to verify the results.
3. Comparative Analysis
Sl.No Ttle of the paper Authors Name of journal/ conference with page numbers Principle/ procedure Advantages Disadvantages
1 High Speed FIR Filter based on Truncated Multiplier and Parallel Adder Deepshikha Bharti
K.Anusudha International journal of Engineering Trends and Technology(IJETT) Multiplication
and addition is frequently required in Digital Signal Processing Reduction in delay Hardware implementation is complex
2 Design and Implementation of Truncated Multipliers for Precision Improvement and Its Application to a Filter Structure R. Devarani Mr.C.S. Manikanda Babu
International Journal of Modern Engineering Research(IJMER) In order to obtain
The truncation error is not more than 1ulp (unit of least position), so there is no need of error compensation circuits, and the final output will be pr??cised. Power and area is reduced.
3 A New Design and an Architecture for FIR filter with Flexibility and Power Reduction Sruthy K Mr.S. Maria Antony Ms. S.Kavitha International Journal of Engineering Research & Technology (IJERT) capable of operating for different word length filter coefficients without any overhead in the
hardware circuitry Good Area and speed Improvement. More power dissipation
4 A New Low-Power Recoding Algorithm for
Multiplier less Single/Multiple Constant Multiplication A.K. Oudjida, M.L. Berrandjia
N. Chaillet IEEE The most commonly used
Heuristic (CSD) has been implemented. Easy to implement and less power consumption. High speed
5 Digital Filter Synthesis Considering
Multiple Adder Graphs for a Coefficient Jeong-Ho Han, and In-Cheol Park IEEE multiple adder graphs are efficiently
generated from a seed adder graph obtained by using previous
dependence-graph algorithms Flexibility and better performance. Hardware complexity
6 Floor plan-aware Low-complexity Digital Filter Synthesis
for Low-power & High-speed Dongku Kang, Hunsoo Choo and Kaushik Roy Semiconductor Research Corporation. integrates high-level synthesis and Floor plan to
obtain improvement in both computational complexity
And interconnect delay. Reduce the interconnect delay of the critical path. More complexity design
7 A DESIGN FLOW FOR LINEAR-PHASE FIXED-POINT FIR FILTERS: FROM THE NPRM
SPECIFICATIONS TO A VHDL CODE Chia-Yu Yao, Chin-Chih Yeh, Tsuan-Fan Lin, Hsin-Horng Chen, and Chiang-Ju Chien IEEE PMILP algorithm that minimizes
the number of SPT terms for some least significant digits (LSDs)
given the NPRM specifications We can use less number of adders. Occupies more area.
8 FPGA Realization of FIR Filters by Efficient and
Flexible Systolization Using Distributed Arithmetic Pramod Kumar Meher, shrutisagar Chandrasekaran IEEE performance metrics of the proposed implementation is
broadly in line with theoretical expectations.
Less area delay. Complexity.
9 Digital Filter Synthesis Based on an Algorithm to Generate
All Minimal Signed Digit Representations In-Cheol Park and Hyeong-Ju Kang IEEE The minimal signed digit (MSD) representations of a constant and present
an algorithm to synthesize digital filters based on the MSD representation Flexibility It can view only one coefficient at a time.
10 Memory-Based Realization of FIR Digital Filter by Look-Up-Table Optimization Batchu Jeevanarani and Thota Sreenivas International Journal of Engineering Research and Applications (IJERA) memory elements store all the possible values of products of the filter coefficients could be an area-efficient alternative to DA-based design of FIR filter with the same throughput of implementation. Reduced Latency
11 Design and Application of Faithfully Rounded and Truncated Multipliers With Combined Deletion, Reduction, Truncation, and Rounding Hou-Jen Ko and Shen-Fu Hsiao IEEE the deletion, reduction, truncation, and rounding
of partial product bits in order to minimize the number of
Smaller Delay Truncation error
12 Information Theoretic Approach to Complexity
Reduction of FIR Filter Design Chip-Hong Chang Jiajia Chen A. P. Vinod IEEE exploiting the prowess of information theory on
directed-acyclic graph representation of the transposed direct
form structure of FIR filters Reduction in implementation cost
Reduced critical path delay. Conflicts in common sub expressions
13 A Novel Common-Sub expression-Elimination
Method for Synthesizing Fixed-Point FIR Filters Chia-Yu Yao, Member, IEEE, Hsin-Horng Chen, Tsuan-Fan Lin, Chiang-Ju Chien, and Chun-Te Hsu IEEE CSE method can perform tradeoffs designs between
complexity and the throughput rate Sharing one pattern with other pattern.
Hardware complexity is reduced. Affect the routing area.
14 EFFICIENT ALGORITHMS FOR COMMON SUBEXPRESSION ELIMINATION
IN DIGITAL FILTER DESIGN Fei Xu, Chip-Hong Chang and Ching-Chuen Jong IEEE contention resolution algorithm (CRA) is proposed for the common sub expression elimination of the
multiplier block of digital filter structure lowering
the power dissipation
low cost circuits
15 Exact and Approximate Algorithms for the
Optimization of Area and Delay in Multiple
Constant Multiplications Levent Aksoy, Student Member, IEEE, Eduardo da Costa, Paulo Flores, Member, IEEE,
and Jos?? Monteiro, Member, IEEE IEEE approximate algorithm
based on the exact approach with extremely competitive results Computer arithmetic alternative representations Over path delay is more.
16 Multiple Real-Constant Multiplication with Improved
Cost Model and Greedy and Optimal Searches M. B. Gately, M. B. Yeary, and C. Y. Tang IEEE multiplier less shift-add network that implements the multiplication of a signal by real constants, in a way which
minimizes hardware cost subject to error constraints More accuracy predictors Longer execution time
17 Global Optimization of Common Subexpressions for Multiplierless Synthesis of
Multiple Constant Multiplications Yuen-Hong Alvin Ho, Chi-Un Lei, Hing-Kit Kwan and Ngai Wong IEEE novel common
sub expression elimination (CSE) algorithm that
models the optimal synthesis of coefficients into a 0-1
mixed-integer linear programming (MILP) problem.
coefficient decompositions that combine
all minimal signed digit (MSD) representations and
the shifted sum and difference of coefficients . Low power
High speed More cost effective.
18 A Comparison of Multiplierless Multiple Constant
Multiplication using Common Subexpression
Elimination Method Yasuhiro Takahashi, Toshikazu Sekine Michio Yokoyama IEEE comparison of hardware
reductions achieved using the horizontal, vertical, oblique and combining horizontal and vertical CSEs in realizing constant
Multipliers..
Good Performance. Implementation cost cannot be reduced.
19 New Approach to Look-Up-Table Design and
Memory-Based Realization of FIR Digital Filter Pramod Kumar Meher, Senior Member, IEEE IEEE LUT-based
multiplication, which could be used to reduce the memory size
to half of the conventional LUT-based multiplication Less area and low latency Higher adder-widths.
20 A New Integrated Approach to the Design of
Low-Complexity FIR Filters Fei Xu, Chip-Hong Chang, and Ching-Chuen Jong IEEE new algorithm for the design of low-complexity FIR
filters with resource sharing to reduce the adder cost directly
during the coefficient synthesis process Reduced cost
21 Design of High-Speed Multiplierless Filters Using
a Nonrecursive Signed Common Subexpression
Algorithm Marcos Mart??nez-Peir??, Eduardo I. Boemo, and Lars Wanhammar IEEE complete description
of the algorithm, and a comparison with two other well-known
options: the graph synthesis, and the classical common subexpression
elimination technique. High speed Hardware implementation is complex
22 A Novel Low-Complexity Method for Parallel
Multiplier less Implementation of Digital FIR Filters Yongtao Wang and Kaushik Roy IEEE computation reduction method which
can be used to obtain low-complexity parallel multiplierless
implementation of digital FIR filters, exploring the use of shift
inclusive differential (SID) coefficients and common subexpression
elimination (CSE) CSE method applied to the design
space represented by the graph, which recursively eliminates 2-
bit subexpressions with a steepest descent approach for subexpression
selectio Power and area is reduced.
23 Contention Resolution’A New Approach to
Versatile Sub expressions Sharing in
Multiple Constant Multiplications Fei Xu, Chip-Hong Chang Ching-Chuen Jong IEEE an efficient generalized contention resolution algorithm
(CRA) is proposed to eliminate three broad categories of reusable common subexpressions in MCM lowering
the power dissipation
low cost circuits
24 A Novel Common-Subexpression-Elimination
Method for Synthesizing Fixed-Point FIR Filters Chia-Yu Yao, Member, IEEE, Hsin-Horng Chen, Tsuan-Fan Lin, Chiang-Ju Chien, and Chun-Te Hsu IEEE CSE algorithm
considers both the redundancy among the canonic-signed-digit (CSD) filter coefficients and the length of the critical path in the
multiplier block of a transposed-form FIR filter Reduction in delay

25 Lower Bounds for Constant Multiplication Problems Oscar Gustafsson, Member, IEEE IEEE lower bounds are straightforwardly calculated and
have applications in proving the optimality of solutions obtained
by heuristics.
Specific accurate value will not present.
26 Design of Low Complexity Multiplierless Digital
Filters With Optimized Free Structure Using a
Population-Based Metaheuristic Marc Joliveau_y, Pascal Giardy, Michel Gendreau_, Franc??ois Gagnony, Claude Thibeault IEEE innovative process that simultaneously designs multiplierless
low complexity digital filters and optimizes their global structure.
Performance of the proposed algorithm is validated by comparing
its accuracy to current methods when designing 1000 IIR filters Low complexity
More delay
27 Design of Linear Phase FIR Filters With
High Probability of Achieving Minimum
Number of Adders Dong Shi, Student Member, IEEE, and Ya Jun Yu, Senior Member, IEEE IEEE low complexity linear phase finite impulse response (FIR) filters
with optimum discrete coefficients. The proposed algorithm,
based on mixed integer linear programming (MILP), efficiently
traverses the discrete coefficient solutions and searches for the
optimum one that results in an implementation using minimum
28 Sign-Extension Avoidance and Word-Length
Optimization by Positive-Offset Representation
for FIR Filter Design Ruimin Huang, Chip-Hong Chang, Senior Member, IEEE, Mathias Faust, Student Member, IEEE,
Niklas Lotze, and Yiannos Manoli, Senior Member, IEEE IEEE redundant adders in the multiplier block of the
filter have been minimized. Our simulation results show an average
power reduction of about 19% over and above the savings
achieved by sharing of adders in multiple constant multiplication
Less complexity
29 Design of Low-Complexity FIR Filters Based on
Signed-Powers-of-Two Coefficients With
Reusable Common Subexpressions Fei Xu, Chip Hong Chang, and Ching Chuen Jong IEEE the optimization of
the reusability of adders for two major types of common subexpressions,
together with the minimization of adders that are needed for the spare
SPT terms. The coefficient set is synthesized in two stages. In the first stage,
CSPT terms in the vicinity of the scaled and rounded canonical signed digit
(CSD) coefficients are allocated to obtain a CSD coefficient set, with the
total number of CSPT terms not exceeding the initial coefficient set. Less area and low latency Higher adder-widths.
30 Reduced Area Multipliers K A n h C. Bickerstaff, Michael Schulte and Earl E. Swartz,lander, Jr IEEE the
Reduced Area multiplier, with a novel reduction scheme which results in fewer
multipliers Reduced cost
31 On ‘A New Common Sub expression Elimination
Algorithm for Realizing Low-Complexity Higher Order
Digital Filters’ Chip-Hong Chang, Senior Member, IEEE,
and Mathias Faust, Student Member, IEEE IEEE CSD number possesses the maximum number of
nonzero bits, its binary equivalent may have less than n
Nonzero bits. Therefore, should not be construed as the
number of nonzero bits in CSD representation is reduced by
50% compared to the two’s complement form. On average,
binary has n/2 nonzero bits while the expected number
of nonzero bits of CSD tends asymptotically to n/3 + 1/9
According to of . As n”, the number of nonzero
Bits of CSD are reduced by 33% over that of binary. For finite
n, the reduction is smaller. Slightly statistical over the binary. Inconsistency in canonical signed digit (CSD) in which it is inaccurate.
32 Time-Multiplexed Multiple-Constant Multiplication Peter Tummeltshammer, Student Member, IEEE, James C. Hoe, Member, IEEE, and
Markus P??schel, Senior Member, IEEE IEEE standard-cell application-specific
integrated circuits than prior works on reconfigurable multiplier
Blocks. Significant area with absolute time. Increased latency.
33 HIGH-SPEED, LOW-COMPLEXITY FIR FILTER USING
MULTIPLIER BLOCK REDUCTION AND POLYPHASE
DECOMPOSITION Marcos Martinez-Peiro and Lars Wanhammar IEEE FIR filter structures are compared and
various schemes for simplifying the implementation of the
Multiplications are evaluated. Carry-save adders with carry overflow
correction are used in the implementation Reduced power consumption by 0.79W Increase in area by 0.7%
34 Low-Error Carry-Free Fixed-Width Multipliers With
Low-Cost compensation Circuits Tso-Bing Juang, Student Member, IEEE, and Shen-Fu Hsiao, Member, IEEE IEEE design is based on the statistical
analysis of the error compensation value of the truncated partial
products in binary signed-digit representation with modified
Booth encoding. The overall truncation error is significantly
Reduced. compensation value and the
truncated digits is so simple that the area cost of the corresponding
compensation circuit is almost negligible Reduced truncation error and cost. Penalty with the large truncated errors.

35 EFFICIENT GENETIC ALGORITHM DESIGN FOR POWER-OF-TWO FIR FILTERS Paolo Gentili, Francesco Piazza and Aurelio Uncini IEEE .Efficient reduction of computational costs and an improvement in performance a specific filter coefficient coding scheme has been implemented. Easy for implementation in hardware design.
Reduction in computation costs.
36 Generalized Low-Error Area-Efficient
Fixed-Width Multipliers Lan-Da Van, Member, IEEE, and Chih-Chyau Yang IEEE error-compensation biases can be easily mapped
to low-error area-efficient fixed-width multipliers suitable for
very large-scale integration implementation and digital signal
processing application Low complexity
More delay
37 Variations on Truncated Multiplication James E. Stine and Oliver M. Duverne IEEE truncated multiplication called hybrid correction
truncation that utilizes the advantages of two
previous methods to obtain lower average and maximum
Absolute error. Comparisons are presented contrasting
power, area, and delay for all three methods compared to
Standard parallel multipliers. We can recognise Absolute error and low average.
Lower latencies. Increase in portability.
38 An Improved Search Algorithm for the Design of
Multiplierless FIR Filters with
Powers-of-Two Coefficients HENRY SAMUEL IEEE algorithm allocates an extra
nonzero digit in the CSD code to the larger coefficients to compensate for
The very no uniform nature of the CSD coefficient distribution. small increase in the filter complexity however the improvement
in the frequency response is substantial Improvement in the frequency response is substantial. Increase in the complexity.
39 An Optimal Lower Bound on the Number of Variables ‘or
Graph Identification Jin-yi Cai Martin Furert IEEE R[n] variables are needed
for first-order logic with counting to distinguish a sequence
Of pairs of graphs G, and H,. These graphs
have n vertices each, have color class size 4, and admit a linear time canonical labeling algorithm. Easy to identify the graph on n vertices.
40 Data-Dependent Truncation Scheme for Parallel Multipliers Eric J. King
Earl E. Swartzlander, Jr. IEEE method for minimizing
The error of a truncated multiplier. Therefore error is reduced
By using information from the partial product bits of the Column adjacent to the truncated LSB. Reducing the complexity and also hardware.
4. Proposed Design/ Algorithm
Multiple constant multiplication/accumulation in a direct FIR structure is implemented using an improved version of truncated multipliers like DADDA and Wallace tree multipliers by using higher bit. Comparisons with various truncated multipliers in FIR design approaches show that the proposed designs achieving the parameters like best area and time results.
4.1 PP TRUNCATION AND COMPRESSION
The FIR filter design in this brief adopts the direct form in Fig. 1(a) where the MCMA module sums up all the products ??ai ?? x[n ‘ i]. Instead of accumulating individual multiplication for each product, it is more efficient to collect all the PPs into a single PPB matrix with carry-save addition to reduce the height of the matrix to two, followed by a final carry propagation adder.

5. Summary & Conclusion
A new truncated multiplier design by jointly considering the deletion, reduction, truncation, and
Rounding of the PP bits. The faithfully truncated multiplier has a total error of no more than 1 ulp and can be used in applications that require accurate result. Low-cost FIR filter designs by
Jointly considering the optimization of coefficient bit width and hardware resources in implementations. Although most prior designs are based on the transposed form MCMAT leads to the smallest area cost and power consumption. The present work can be implemented by using Dadda Algorithm and Wallace tree Algorithm and bit extension.

REFERENCES
[1] M. M. Peiro, E. I. Boemo, and L. Wanhammar, ‘Design of high-speed
multiplierless filters using a nonrecursive signed common subexpression
algorithm,’ IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.,
vol. 49, no. 3, pp. 196’203, Mar. 2002.
[2] C.-H. Chang, J. Chen, and A. P. Vinod, ‘Information theoretic approach
to complexity reduction of FIR filter design,’ IEEE Trans. Circuits Syst.
I, Reg. Papers, vol. 55, no. 8, pp. 2310’2321, Sep. 2008.
[3] F. Xu, C. H. Chang, and C. C. Jong, ‘Contention resolution’A new
approach to versatile subexpressions sharing in multiple constant multiplications,’
IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 2,
pp. 559’571, Mar. 2008.
[4] F. Xu, C. H. Chang, and C. C. Jong, ‘Contention resolution algorithms for
common subexpression elimination in digital filter design,’ IEEE Trans.
Circuits Syst. II, Exp. Briefs, vol. 52, no. 10, pp. 695’700, Oct. 2005.
[5] I.-C. Park and H.-J. Kang, ‘Digital filter synthesis based on an algorithm
to generate all minimal signed digit representations,’ IEEE Trans.
1529, Dec. 2002.
[6] C.-Y. Yao, H.-H. Chen, T.-F. Lin, C.-J. J. Chien, and X.-T. Hsu, ‘A novel
common-subexpression-elimination method for synthesizing fixed-point
FIR filters,’ IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 11,
pp. 2215’2221, Sep. 2004.
[7] O. Gustafsson, ‘Lower bounds for constant multiplication problems,’
IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 54, no. 11, pp. 974’978,
Nov. 2007.
[8] Y. Voronenko and M. Puschel, ‘Multiplierless multiple constant multiplication,’
ACM Trans. Algorithms, vol. 3, no. 2, pp. 1’38, May 2007.
[9] D. Shi and Y. J. Yu, ‘Design of linear phase FIR filters with high probability
of achieving minimum number of adders,’ IEEE Trans. Circuits Syst.
I, Reg. Papers, vol. 58, no. 1, pp. 126’136, Jan. 2011.
[10] R. Huang, C.-H. H. Chang, M. Faust, N. Lotze, and Y. Manoli, ‘Signextension
avoidance and word-length optimization by positive-offset representation
for FIR filter design,’ IEEE Trans. Circuits Syst. II, Exp.
Briefs, vol. 58, no. 12, pp. 916’920, Oct. 2011.
[11] P. K. Meher, ‘New approach to look-up-table design and memory-based
realization of FIR digital filter,’ IEEE Trans. Circuits Syst. I, Reg. Papers,
vol. 57, no. 3, pp. 592’603, Mar. 2010.
[12] P. K. Meher, S. Candrasekaran, and A. Amira, ‘FPGA realization of FIR
filters by efficient and flexible systolization using distributed arithmetic,’
IEEE Trans. Signal Process., vol. 56, no. 7, pp. 3009’3017, Jul. 2008.
[13] S. Hwang, G. Han, S. Kang, and J.-S. Kim, ‘New distributed arithmetic algorithm
for low-power FIR filter implementation,’ IEEE Signal Process.
Lett., vol. 11, no. 5, pp. 463’466, May 2004.
[14] H.-J. Ko and S.-F. Hsiao, ‘Design and application of faithfully rounded
and truncated multipliers with combined deletion, reduction, truncation,
and rounding,’ IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 58, no. 5,
pp. 304’308, May 2011.