

International Journal of Microsystems and IoT



ISSN :(Online) Journal homepage: https://www.ijmit.org

# Design and Implementation of Multiplier Accumulator Unit Using Rounding Based Approximation

Sreenivasarao Ijjada, Ajaykumar Dharmireddy, M. Sushanth Babu and K. Lavanya

**Cite as:** Ijjada, S. R., Dharmireddy, A., Babu, M. S., & Kotha, L. (2024). Design and Implementation of Multiplier Accumulator Unit Using Rounding Based Approximation. International Journal of Microsystems and IoT, 2(1), 529-537. <u>https://doi.org/10.5281/zenodo.10715039</u>

| 9         | © 2024 The Author(s). Publisl    | hed by India | an Society for | VLSI Education,<br>- | Ranchi, India |
|-----------|----------------------------------|--------------|----------------|----------------------|---------------|
|           | Published online: 22 January<br> | 2024.        |                |                      |               |
|           | Submit your article to this jo   | ournal:      | ď              |                      |               |
| 11        | Article views:                   | ď            |                | -                    |               |
| à         | View related articles:           |              |                |                      |               |
| GrossMark | View Crossmark data:             | ď            |                |                      |               |

# DOI: https://doi.org/10.5281/zenodo.10715039

Full Terms & Conditions of access and use can be found at https://ijmit.org/mission.php

Vol. 2, Issue 1, pp. 529-537; DOI: https://doi.org/10.5281/zenodo.10715039

# **Design and Implementation of Multiplier Accumulator Unit Using Rounding Based Approximation**

Sreenivasa Rao Ijjada<sup>1</sup>, Ajaykumar Dharmireddy<sup>2</sup>, M. Sushanth Babu<sup>3</sup> and K. Lavanya<sup>2</sup>

<sup>1</sup>Department of Electronics and Communication Engineering, GITAM Deemed to be University, Visakhapatnam, India <sup>2</sup>Department of Electronics and Communication Engineering, Sir C. R. Reddy College of Engineering, Eluru, Andhra Pradesh, India <sup>3</sup>Department of Electronics and Communication Engineering, Chaitanya Bharathi institute of technology, Hyderabad, India.

#### ABSTRACT

This paper presents an analysis of the design of a programmable gain amplifier (PGA) based on an instrumentation amplifier. The instrumentation amplifier can be implemented in different ways, including the Single Op amp IA, 2 Op-amp INA, 3 Op-amp INA, Switched Capacitor Instrumentation amplifier (SCIA), Current Feedback Instrumentation amplifier (CFIA), Current Mirror Instrumentation amplifier (CMIA), and others. By adding switches or a multiplexer (Mux) to the amplifier, a precision programmable gain instrumentation amplifier (PG-IA) can be created. The literature suggests various approaches for enhancing the performance parameters of a PGINA, and this study aims to bring together and evaluate these approaches on a unified platform. In this research, an extensive examination of multiple instrumentation amplifier topologies has been carried out, and these topologies have been categorized based on their distinctive characteristics.

## 1. INTRODUCTION

The batteries in the latest portable devices (e.g. laptops, tabs, mobiles...) is limited size and the number of multimedia applications are growing with an exponential power rate, hence high energy efficient DSP architectures are most requirement. Several DSP applications need low power consumption and outstanding performance for real-time signal processing, as well as fast-speed and high-throughput multiplier-accumulator (MAC) units. Since MAC is essentially refined by redundant use of addition and multiplication, the speediness of the addition and multiplication arithmetic decides the execution speediness and execution of the whole count. The DSP cores in the smart devices will perform huge number of multimedia applications for the human benefits, where the hum intervention is very limited; hence the designs should be very energy efficiency with minimum loss of quality. This can be achieved at many design/modeling levels such as device, logic, circuit, algorithmic and architectural level. Hence the research should be in a direction such that the design of adders and multipliers should include approximate computation logics, and to bring the trade-off between many VLSI parameters. Multiplication operation is a blessing in a few components of an advanced framework or figuring gadget, most remarkably in signal interaction, designs, and logical calculation. The algorithmic programming that Booth developed may have a significant positive impact on the pattern of signed binary multiplication. Energy reduction is one of the critical necessities of any electronic device, incredibly portable like sophisticated mobile phones, tablets, and other gadgets. Achieving this decrease with the bare minimum of execution (speed punishment) is highly desired.

These typical devices' primary components for recognizing various multimedia demands are called DSP cores. In these DSP systems, the ALU unit is the central processing unit, and multiplication operations have the optimal distribution among all other ALU jobs. Therefore, increasing the multiplier's speed and power/energy efficiency becomes crucial for enhancing CPU performance. Because of this fact, we may employ approximations to boost efficiency and speed. To start, humans have a limited ability to take information presented to them in the form of a picture or video. The accuracy of the arithmetic operation is not fundamental to the usage of the system in many contexts, not only those involving video and image processing. Having access to the approximation register gives the designer more flexibility in terms of trading off precision for speed, power for efficiency, and so on. Power proficiency and high performance of a system can be acquired by approximate processing, and it can likewise diminish the complexity in design. In the most common DSP architectures, the multiplier is a crucial piece of hardware. Commonplace DSP uses where a multiplier has a major role Because of their small size, today's electronic devices use a lot of energy. Given the complexity of multipliers and their high clock rate, this helps reduce the time it takes to perform a multiplication.

An adder, multiplier, and accumulator make up MAC. In this activity, we multiply the two numbers together and add the product to a counter. A MAC unit comprises of amount of the past progressive items. The plan of a performance high 64-bit Multiplier-Accumulator (MAC) is carried out utilizing Verilog HDL. When partial items are created, they should be ordered and supplemented in a deliberate way with extraordinarily less deferral.

ROBA multiplier; MAC unit;

**KEYWORDS** Baugh Wooley multiplier; Vedic multiplier



It is possible to estimate the ALUs at the circuit, reasoning, and engineering stages of design. The estimation might be performed utilizing various strategies.

Many multipliers reported in the literature, reducing the complexity of the design and the number of components [1-4], using an approximation compressor to accumulate sets of partial outcomes yields reduced latency and energy consumption [5-9]. The use of approximation adders for accurately computing the addition of the final two sets following a compressor is achieved by splitting the process into two stages [10]. The inaccuracy in an approximation multiplier was reduced by using bitwise AND-OR gates to calculate insignificant parts, as described in [11]. In [12], we see a multiplier implementation that uses a signature with just two bits. This research article is organized into five sections below. Section 2 has the discussion over the literature survey of the work. Similarly, section 3 discusses basic MAC unit and basic RoBA multipliers, and the modified RoBA multiplier and MAC processes. The design approach of all the units discussed here. In section 4, the implementation process of all the blocks in Verilog HDL, simulation results and the Synthesis reports of the designs discussed. The comparisons among the different multiply along with proposed multiplier discussed. The final section represents conclusion.

## 2. LITERATURE SURVAY

Broken Multiplier (BMA) technique was employed in the design of an approximate adder and multiplier explained in [13]. With the use of BAM approximation design in [13] a conventional altered Booth Multiplier is modified assigned binary approximate multiplier presented in [14]. The approximate multiplier design has low power and less area when compared to a systematic Booth multiplier. Approximate multipliers have an approximate buildings block that leads to reduce power consumption around 31.8%-45.4% when compared to the normal multiplier designs. A Pipelined processor implementation using an approximate signed multiplier of 32-bit design presented in [15]. This design has an error of nearly 14%, and the design exhibits higher speed compared to full adder-based Wallace tree multiplier. An error-resistant multiplier, which appraised the inaccurate outcome by separating the multiplication operation into one accurate and one inaccurate part discussed in [18]. This design exhibits a 50% reduction in power consumption when applied to a 12-bit multiplier. In the publication [16], two approximation compressors were developed and evaluated using a conventional DADDA Multiplier. Many approximate multipliers which were used earlier are more reliant on either changing the design or intricacy decrease of a particular precise multiplier. An approximate Wallace tree multiplier using with an incorrect counter 4:2, has been presented in [19]. Here, additionally an auto error correction unit recommended correcting the results. To design a larger size of multipliers, this incorrect WTM can be in an array structure. As in [17], we proposed doing the approximate multiplication through simple operation. The extensive literature over this topic can be read from the papers [20-26]. The novelty of this paper with the

other literature papers is that the approach may be same for signed and unsigned numbers, but this design is faster than other designs and techniques. The suggested architecture has a unified multiplier that can handle both signed and unsigned operations. This technique has a low mistake rate. This project aims to develop and execute a low-power MAC block using a block-enabling approach to achieve substantial power savings. This work initiated with the design of 1-bit MAC unit along with power, area, and delay optimization algorithms, later its extended to the N-bit MAC. Control logic is designed to enable the pipelined stages at an appropriate time, consequently, reduces the power usage. High speed adder cells with low gate count along with low power included in the design process.

#### **CNN for Security in Computer Vision and Multimedia**

Le Cun et al. achieved pioneering research by using CNN in conjunction with the back propagation method to recognize handwritten postal codes, resulting in a significant level of accuracy. Due to advancements in machine functioning faster, particularly in graphics processing units, In [4] developed a sophisticated and deep CNN called AlexNet. This design won the ILSVRC2012. Some minor modifications desire to be made to AlexNet's training resources and network layout. They then developed a pre-trained CNN called CaffeNet, which is based on the Caffe framework and is the benchmark model. The power of deep models has been further enhanced by the widespread use of fine-tuning; a popular technique rooted in transfer learning. Some layers in a base network that have already been trained can be used to train a new job more efficiently by refinement. Within the field of multimedia security, several methods employ CNN for steganalysis [6]-[8] and image forensics. There were positive results reported in [6] for a deep model that used CNN for structural analysis. Subsequently, Pibre et al. [7] conducted a study on the "structure" of convolutional neural networks (CNNs) and determined the optimal CNN model through extensive experimentation. CNN had much better performance than older steganography methods when using a previously used confidential key to insert information. Nevertheless, the efficacy of their model when employing a random private key for each embedding in a practical scenario was inferior to that of conventional methods relying on manually designed features. They presented a CNN deep model for detecting hidden information in images using JPEG-domain steganography. This approach differs from the traditional method of analyzing the spatial domain of the image [8]. The CNN architecture was enhanced by incorporating the concept of JPEG-phase awareness. Tuama et al. [4] employed a combination of a high-pass filter and CNN for source camera identification. This approach allowed for the automatic extraction of features and simultaneous learning for classification.

Bondi et al. [5] recently used a convolutional neural network (CNN) to automatically extract distinctive camera model characteristics. They then trained a support vector machine (SVM) for classification. This approach surpasses prior approaches in analyzing tiny color picture patches, and its characteristics show a solid potential to generalize. Bondi et al.

[6] used these profound characteristics to create an iterative clustering approach that effectively addresses the problem of detecting and pinpointing picture manipulation. Previous CNN-based approaches for multimedia security, as shown in references [2], [3], [6], [8], including current approaches [4], often use a preliminary layer in the deep model that is somewhat arbitrary, either fixed or limited. This layer typically consists of one or more HPF and remains unchanged, making it non-trainable during training. In [6] and [8] used a high-pass filter (HPF) to isolate the remaining noise of an input picture at the lowermost part of their network.

In [2] CNN used an initial set filter level that takes an image as input and produces the residual of its median filtering as output. This effectively carries out nonlinear high-pass filtering. In [3] developed a limited filter layer for global picture alteration recognition. In this layer, the filter kernel's center is set to -1, while the sum of the other components is set to +1. The first layer must acquire a collection of high-pass prediction error filters within a specific limitation. The primary concept behind this is presumed to have a high frequency while reducing the input picture's semantic information (mostly believed to have a low frequency). As RGB pictures are input, our study's first layer has a 3-D filtering set instead of 2D linear convolutional kernels. The 3D convolutional filters may be trained without limitations, making them adaptable. Our tests and visualization show that such filters readily bring out valuable and unique features from the data we received for our classification task. While revising this publication, we discovered a new study called Stats Net that similarly utilizes CNN to differentiate between CG pictures and NIs [5]. In Section V–H, we look at our method in detail, including both personal and numeric comparisons. Our method is better than Stats Net in several areas, especially on the tough Columbia dataset including network layout, design, and test performance. Pan et al. [11] recommended a process that takes advantage of fractal dimensions to identify the difference in color perception between photos and computer-generated images. A technique to differentiate between computergenerated and genuinely captured human faces was suggested by Nguyen et al. [12]. Pattern noise is the basis of the technique developed in [14] and [15] devised a method using demosaicking and chromatic aberration traces. Analyzing demonstration traces using threshold-based classification is the approach put forth by Gallagher et al. [16]. The features of CFA iteration and image response variation noise are used. A method was suggested by Peng and Zhou [17]. While using this method, the SVM classifier achieved a classification accuracy of up to 99.5%. This method shows promising results even when dealing with additive noise and JPEG compression. A CNN classifier and noise from sensor patterns are the ingredients in the recipe that Yao et al. [18] laid forth. When tested on a picture dataset of 1800 CG and PG images, the method achieved 100% accuracy. This research compares existing methods to distinguish between computer-generated (CG) images and photos (PG) using characteristics associated with the image-gathering process. A combined method using both texture interpolation and local patch statistical features was suggested by Sankar et al. [19]. It was possible to use both wavelet-based features and complex pattern noise data by Conotter and Cordin [20] as a possible method. A method combining the ResNet-50 model with CNNs was suggested in

[21]. 4850 computer-generated and photographically generated images was used to assess this method. A classification accuracy of 97% was reached using the approach.

The feature length is 2048, which is considerable. Two investigations were conducted by Holmes et al. [22] that used the human visual system. To classify images as computergenerated (CG) or photographically generated (PG), they engaged 250 people from Amazon's Mechanical Turk online workforce. Researchers have shown that trained experts, rather than amateurs, should examine the images to improve classification accuracy. Farid et al. [23] presented a statistical model for images using the first four-order wavelet coefficient statistics in different sub bands. To implement this method, we computed 216 features and then used a support vector machine classifier to determine whether the image was CG or PG. Research using a dataset of 40,000 PG pictures and 6,000 CG images has confirmed of 71% classification accuracy. In the discrete Fourier transform area, Cui et al. [24], a method that depends on comparable properties. This method reportedly attained an accuracy rate of 94% on the Columbia photo dataset. A total of 780 characteristics are obtained by computing the statistical moments of the 1-D and 2-D characteristic functions. In the next step, the SVM classifier is fed these characteristics [25–29]. The method of increasing feature selection further reduces the number of attributes to no more than 390. Applying this method to the Columbia photo dataset yields a subpar classification accuracy of 88%. Bo et al. [30] introduced a technique that uses SVM classifiers and Benford's law. This method uses a 54-length feature vector, which is computationally efficient. This method achieves a classification accuracy of 91.6% on a dataset that includes 2400 CG and 2400 PG images.

#### **3. PROPOSED DESIGN**

In this chapter, basic MAC approach modified MAC approach, basic ROBA and modified ROBA multiplication techniques and the design processes discussed. **3.1 Basic MAC approach** 

n bitsbmultiplicand(x) Step 1 Booth encoding m bitsbmultiplier(y) n bits partial product(po) n bits partial product(p1) Step 2 Partial n bits partial product(p2) product . summation n bits partial product(p3) n bits partial product Pn-1 Step 3 Final addition sum carry accumulation (n+m) multiplicatior esult(x+y) Step 4 (n+m)bits accumulation result

Fig. 1 Basic arithmetic steps of multiplication and accumulation.

In Fig. 1, we can see the fundamental procedures for adding and multiplying. The action of a multiplier is split into three distinct phases. The radix-2 Booth encoding is a method that employs the multiplicand X and multiplier Y to generate a sample product. Additionally, there's the option of using either an adder array or partial product compression technique to merge all the partial products and convert them into a format that consists of the total and carry. You get the multiplication result when you add the total and the carry. A MAC has four stages, as shown in Figure 1 (which depicts the operational processes) when the procedure to aggregate the multiplied outcomes is considered.



Fig. 2. Standard MAC Hardware Architecture

The figure provided illustrates the overall hardware design of this MAC. The multiplication process is performed by multiplying the input multiplier X with the multiplicand Y. This is appended to the prior product Z as the accumulation step [27].

The binary number 2's complement of N-bit 2 is stated as,  $X=-2^{N-1} x_{N-1}+\sum_{(i=0,n-2)} x_i 2^i x_i \in 0,1$ 

(1)

(2)

(3)

 $X = \sum_{(I=0,N/2-1)} d_i 4^i$ 

$$D_i = -2x_{2i+1} + x_{2i} + x_{2i} - 1$$

If we apply (2), multiplication is represented as

$$X \times Y = \sum_{(i=0,(N/2)-1} d_i 2^{2i}$$
(4)

With these equations, the fore-mentioned multiplicationaccumulation results is expressed as  $P=X+Y+Z=\sum_{(i=0,(N/2)-1)}d_i2^{i}+\sum_{(i=0,(2N-1))}d_i2^{i}$  (5)

The two factors on the opposite side of Equation (5) are computed separately, and their sum is the result. The architecture provided by (5) is the standard design in the MAC domain.

#### 3.2 Proposed MAC approach

The three-stage modified MAC approach, first shown in Figure 1, is now offered in Figure 3. Here are two modifications: first, the accumulation is now integrated into the adding partial products process, and second, though it is not explicitly indicated, the final addition process of the third step is only sometimes done. Step 3 may be skipped until the result for the final accumulation is needed since the result from step 2 is utilised for accumulation instead of the one from step 3. The proposed MAC's layout is shown in Figure 4. The (n+1)bit partial product was generated from the n-bit MAC inputs X and Y using the Booth encoder. Not only can the nth carry save adder and accumulator do partial product and save additions, but they can also execute accumulation. By adding and carrying the lowest bits of the sum, the n-bit variables S, C, and Z are produced as the result. These three values are returned for the subsequent accumulation. To get the result, P [2n-1: n], the terms S and C are added to the previously created term P[n-1:0].



Fig.3. Proposed multiplication-accumulation operation



Fig.4. Proposed MAC architecture

#### 3.3 Basic Round Based Approximate Multiplier:

Multiplication can be done using shifters, adder, and a subtractor as shown in figure 5. The output of the multiplier changes with the inputs X and Y. This multiplier gives the precise results only for the positive numbers as the rounded values of negative numbers are not in the form of 2n. Therefore, before doing multiplication need to eradicate the sign and then perform the multiplication. After performing the multiplication, the sign should be given to the output based on the input. Multiplication of two input numbers X and Y with the Rounding Based Approximate concept is expressed as is explain here [28].

$$X*Y = (Xr-X)*(Yr-Y)+(Xr*Y)+(Yr*X)-(Xr*Yr)$$
(6)

Where Xr and Yr are the rounded nearest power of two of inputs X and Y. The terms (Xr\*Y), (Yr\*X) and (Xr\*Yr) are achieved with the shift logic. The terms (Xr-X) and (Yr-Y) represent rounding error terms of X and Y, these terms can be ignored if X and Y efficiently rounded. Hence, the above equation is approximated as

$$X*Y \approx (Xr*Y) + (X*Yr) - (Xr*Yr)$$
(7)

*Sign Detector:* This is the first block to identify the sign of the inputs and provide sign for the output as for mentioned. If the MSB of the input is '0', it will treat as a positive number, if it is '1', then input will treat as negative integer.



#### Fig. 5 Basic ROBA

Rounding block: It will perform the rounding of the inputs to nearest power of two value.

*Shifter:* Multiplication terms, Xr\*Y, X\*Yr, and Xr\*Yr are performed through logical shift operation.

Adder: generates the sum result  $(Xr^*Y) + (X^*Yr)$ .

Subtracter: Finds the subtraction of adder and shifter terms,  $(Xr^*Y) + (X^*Yr) - (Xr^*Yr)$ .

Sign Set: To provide the sign to the output result.

#### 3.4 Modified ROBA

In the above existing method, the term Xr\*Yr, suffers from the large error as both X and Y rounded the power of two, and X\*Y  $\approx$  (Xr\*Y) or (X\*Yr) will provide high accuracy over the

533

previous discussed term. If we consider all the terms for multiplication, it leads to an increase in both computational accuracy and design complexity. Hence to bring the trade of between both the parameters, four multiplication terms need to be considered and implemented separately to use as the requirement.

X\*Y = (Xr\*Yr); X\*Y=(Xr\*Y); X\*Y=[(Xr\*Y) + (Yr\*X)]/2; X\*Y=(Xr\*Y) + (Yr\*X)-(Xr\*Yr)

The technique shown in figure 5 should be modified in such a way to generate these terms.

# 3.4.1 Implementation of Proposed MAC with proposed ROBA technique

High-speed multiplication using parallel counters is accomplished via a modified version of the booth method. This partial product generator uses the modified booth method for radix 2 to produce partial products. Instead of shifting and adding for each multiplier term column and multiplying by 1 or 0, multiply every other column by -1, -2, or 0. First, multiply the partial product aligned with the third column by 2, then multiply the product aligned with the least significant bit by -1 to multiply by 7.As a result, the circuit's latency in propagation, complexity, and power consumption have all decreased. Skipping zero rows is feasible using booth multiplication. The modified-Booth encoder circuit in Fig. 7 conserves transient signal power using SPST. A detector regulates this approach. One of the two operands is used by the detecting unit to determine whether the booth encoder is computed redundantly.



Fig.6 SPST equipped modified Booth encoder

In Figure 6 shows an SPST-enabled version of the booth encoder. The PP generator makes five possible partial products: (2X-X), (X-0), (2X), and (2X-2). Then, they'll make a call based on operand Y's encoded results from both booths. This partial product generator uses the modified booth method for radix 2 to produce partial products. Each group of bits is sent to multiplier group, for each group one partial product is

generated.



Fig.7 Proposed high performance low power equipped adder.



Fig.8 Low power adder/subtractor adopting SPST

A SPST-based adder/subtractor with 16 bits is seen in Figure 8. The 8th and 9th bits of a 16-bit adder/subtractor are split off to form the MSP and LSP, respectively, in this implementation. AND gates are used to implement latches, which are used to regulate the MSP's input data. When the MSP is required, its input data remains unchanged; when it is not, its input data becomes zeros to prevent unnecessary switching power usage.







Fig.10 Case(ii):MAC with CSA output with reset 0.



Fig.11 Modified RoBA Multiplier result.



Fig.12 Multiplier accumulator unit using ROBA.



Fig.13 Modified RoBA Multiplier RTL schematic



Fig.14 MAC with Modified RoBA technique RTL schematic.

Figure 9, Figure 10, Figure 11, Figure 12, Figure 13, and Figure 14, shows MAC CSA Output with reset=1, MAC with CSA output with reset 0, Modified RoBA Multiplier result, Multiplier accumulator unit using ROBA, Modified RoBA Multiplier RTL schematic, MAC with Modified RoBA technique RTL schematic respectively. Table 1 and Table 2 shows Synthesis report, and Comparison with existing design parameters.

#### Table 1: Synthesis report

| mac1 Project Status |                                 |                                         |             |  |
|---------------------|---------------------------------|-----------------------------------------|-------------|--|
| Project File:       | mac1.ise                        | Current State:                          | Synthesized |  |
| Module Name:        | top_tst                         | • Errors:                               | No Errors   |  |
| Target Device:      | xc3s500e-4fg320                 | • Warnings:                             | 22 Warnings |  |
| Product Version:    | ISE 10.1 - Foundation Simulator | <ul> <li>Routing Results:</li> </ul>    |             |  |
| Design Goal:        | Balanced                        | <ul> <li>Timing Constraints:</li> </ul> |             |  |
| Design Strategy:    | Xilinx Default (unlocked)       | <ul> <li>Final Timing Score:</li> </ul> |             |  |

mac1 Partition Summary

No partition information was found.

| Device Utilization Summary (estimated values) |      |           |             |  |
|-----------------------------------------------|------|-----------|-------------|--|
| Logic Utilization                             | Used | Available | Utilization |  |
| Number of Slices                              | 1182 | 4656      | 25%         |  |
| Number of Slice Flip Flops                    | 678  | 9312      | 7%          |  |
| Number of 4 input LUTs                        | 1863 | 9312      | 20%         |  |
| Number of bonded IOBs                         | 1    | 232       | 0%          |  |
| Number of BRAMs                               | 4    | 20        | 20%         |  |
| Number of GCLKs                               | 2    | 24        | 8%          |  |

#### Table 2: Comparison with existing design parameters

| Types of Multipliers                 | Max Output | Delay | Power | Area   | Error |
|--------------------------------------|------------|-------|-------|--------|-------|
|                                      | time (ns)  | (ns)  | (mW)  | (IOBs) | Rate  |
|                                      |            |       |       |        |       |
| VedicMultiplier [5]                  | 21.470     | 2.63  | 17.58 | 50     | 1     |
| Baugh-Wooley [7]                     | 15.181     | 1.59  | 17.42 | 40     | 1     |
| RoBAMultiplier [28]                  | 10.436     | 1.14  | 9.21  | 33     | 1     |
| Modified RoBA multiplier             | 9.21       | 0.84  | 1.1   | 26     | 1     |
| MAC [27]                             | 5.8        | 1.21  | 2.7   | 16     | 1     |
| MAC withRoBA multiplier              | 15.22      | 1.97  | 10.4  | 42     | 1     |
| MAC with modified<br>RoBA multiplier | 7.252      | 1.05  | 5.37  | 36     | 0.29  |

### 5. CONCLUSION

This paper, the basic ROBA multiplier is modified and is used in the implementation process of modified MAC process. Since the SPST technique, modified booth algorithms and approximate rounding terms generation process has been simplified and reduce the complex terms and number of transition signals, the accuracy and the power consumptions values are improved in the MAC with modified ROBA than the MAC with ROBA implementation. The modified MAC reduced to three stages of execution from four stages of execution. Nearly 50% delay and power reduced in the proposed design designs.

#### REFERENCES

 Narayanamoorthy, S., Moghaddam, H. A., Liu, Z., Park, T., & Kim, N. S. (2015). Energy-efficient approximate multiplication for digital signal processing and classification applications. IEEE Transactions on Very Large-Scale Integration (VLSI) Systems, 23(6), 1180– 1184.https://doi.org/10.1109/TVLSI.2014.2333366.

- Hashemi, S., Bahar, R., & Reda, S. (2015). Drum: A dynamic range 2 unbiased multiplier for approximate applications. In Proceedings of the IEEE/ACM international conference on computer-aided design (pp. 418-425). IEEE Press.https://doi.org//10.1109/ICCAD.2015.7372600
- Zendegani, R., Kamal, M., Bahadori, M., Afzali-Kusha, A., &Pedram, 3. M. (2017). ROBA multiplier: A rounding-based approximate multiplier for high-speed yet energy-efficient digital signal processing. IEEE Transactions on Very Large-Scale Integration (VLSI) Systems, 25(2), 393-401.https://doi.org//10.1109/TVLSI.2016.2587696
- Leon, V., Zervakis, G., Soudris, D., & Pekmestzi, K. (2018). Approximate hybrid high radix encoding for energy-efficient inexact multipliers. IEEE Transactions on Very Large-Scale Integration (VLSI) Systems, 26(3), 430.https://doi.org// 10.1109/TVLSI.2017.2767858.
- Akbari, O., Kamal, M., Afzali-Kusha, A., & Pedram, M. (2017). Dual-5. quality 4:2 compressors for utilizing in dynamic accuracy configurable multipliers. IEEE Transactions on Very Large-Scale Integration (VLSI) Systems, 25(4), 1352 -1361.https://doi.org// 10.1109/TVLSI.2016.2643003.
- Prithvi, J. Mohana, & D. Ajaykumar.(2013). Multitrack Simulator 6. Implementation in FPGA for ESM System. International Journal of Electronics Signals and Systems ,81-84. https://doi.org//10.47893/ IJESS.2014.1208
- 7. Marimuthu, R., Rezinold, Y. E., & Mallick, P. (2017). Design and analysis of multiplier using approximate 15-4 compressor. IEEE Access, 5, 1027-1036.https://doi.org// 10.1109/ACCESS.2016.2636128.
- Esposito, D., Strollo, A. G. M., Napoli, E., De Caro, D., & Petra, N. 8. (2018). Approximate multipliers based on new approximate compressors. IEEE Transactions on Circuits and Systems I: Regular Papers, 99, 1-14.https://doi.org// 10.1109/TCSI.2018.2839266
- Esposito, D., De Caro, D., Napoli, E., Petra, N., &Strollo, A.G. 9 (2017). On the use of approximate adders in carry-save multiplieraccumulators. In 2017 IEEE international symposium on circuits and (ISCAS) (pp. 1-4). systems IEEE.https://doi.org// 10.1109/ISCAS.2017.8050437.
- 10. Kyaw, K. Y., Goh, W.-L., & Yeo, K.-S. (2010). Low-power high-speed multiplier for error-tolerant application. In 2010 IEEE international conference of electron devices and solid-state circuits (EDSSC) (pp. 1-4).https://doi.org// 10.1109/EDSSC.2010.5713751
- 11. Garg, B., & Sharma, G. (2016). Low power signal processing via approximate multiplier for error-resilient applications. In 2016 11th international conference on industrial and information systems 546-551). (ICIIS) (pp. IEEE.https://doi.org// 10.1109/ICIINFS.2016.8263000
- 12. Kulkarni, P., Gupta, P., &Ercegovac, M. (2011). Trading accuracy for power with an underdesigned multiplier architecture. In 24th international conference on VLSI design (VLSI design), 2011 (pp. 346-351).https://doi.org// 10.1109/VLSID.2011.51
- 13. B. J. P. D. R. Kelly & S. Al-Sarawi (2009). Approximate signed binary integer multipliers for arithmetic data value speculation. Proc. Conf. Design Archit. Signal Image Process., 97-103. https://doi.org//10.1109/ TCSI.2018.2839266
- 14. P.M.Momeni, J.Han & F. Lombardi (2022). Design and analysis of approximate compressors for multiplication. IEEE Trans. Comput., 64(4),984-994. https://doi.org//10.1109/NANO54668.2022.9928768.
- 15. Aswathy Sudhakar & D. Gokila (2010). High-Speed Power-Efficient Modified Baugh-Wooley Multipliers. VLSI Design Group Department of ECE, 97(1),44-57. https://doi.org// 10.1109/ICECS.2008.4674784
- 16. J. N. Mitchell (1962). Computer multiplication and division using binary logarithms IRE Trans. Electron. Comput., 11(4), 512-517. https://doi.org// 10.1109/TEC.1962.5219391
- 17. A. R. V. Gupta, D. Mohapatra & K. Roy,(2012). Low-power digital signal processing using approximateadders. IEEE Trans. Compute. -Aided Design Integer Circuits Syst., 32(1),15-19. https://doi.org// 10.1109/ TCAD.2012.2217962.
- 18. Ajay Kumar Dharmireddy, Dr Sreenivasa Rao Ijjada & Dr I. Hema Latha (2022). Performance Analysis of Various Fin Patterns of Hybrid Tunnel FET. IJEER 10(4), 806-810. https://doi.org// 10.37391/IJEER.100407.
- 19. Ajaykumar Dharmireddy, ISR, & P.H.S.Tejomurthy (2019). Performance analysis of Tri-Gate SOI FinFET structure with various fin heights using TCAD simulations. Journal of Advanced Research in Dynamical and Control Systems. 11(2), 1291-1298. https://doi.org//10.34391/JRDCS.100319.
- Ajaykumar Dharmireddy & Sreenivasarao Ijjada (2023). Performance 20 Analysis of Variable Threshold Voltage ( $\Delta V$ th) Model of Junction less

FinTFET. IJEER 11(2), 323-327. https://doi.org// 10.37391/IJEER. 110211.

- 21. Ajaykumar Dharmireddy & S. R. Ijjada (2022). Design of Low Voltage-Power: Negative capacitance Charge Plasma FinTFET for AIOT Data Acquisition Blocks. 2022 International Conference on Breakthrough in Heuristics And Reciprocation of Advanced Technologies (BHARAT), Visakhapatnam, India, 144-149. https://doi.org// 10.1109/BHARAT 53139.2022.00039.
- K.Shasidhar, B.Naresh & Sreenivasa Rao Ijjada (2019). A 75 µW Two-22 Stage Op-Amp using 0.18µm CMOS Technology for High-Speed Operations. Journal of Acta Physica Polonica A,135(5), 1075-1077. https://doi.org// 10.12693/APhysPolA.135.1075
- 23. ChaithanyaMannepalli, Rajesh Kumar Srivastava & Sreenivasa Rao Ijjada(2019) . Design of a Two Stage Operational Amplifier with Zero Compensation for Accurate Bandgap Reference Circuit. Journal of Acta Physica Polonica A, 135(5), 977-979.https://doi.org// 10.12693/ APhysPolA.135.977
- A.kumar D, SRao I, KV Gayathri, K Srilatha, K Sahithi & M Sushma 24. (2021). Rad-Hard Model SOI FinTFET for Spacecraft Application. Advances in Micro-Electronics, Embedded Systems and IoT, 83(8), 113-119. https://doi.org//10.1007/978-981-16-8550-7 12.
- 25. K.Shasidhar & Sreenivasa Rao Ijjada (2019). 1.5mW, 14.68V/µS-Low Power and High speed Comparator Design for ADC Applications. International Journal of Innovative Technology and Exploring Engineering, 8(6S4), 1322-1326. https://doi.org// 10.35940/ ijitee. F1268.0486S419
- Ajaykumar Dharmireddy & Sreenivasarao Ijjada (2022). High 26. Switching Speed and Low Power Applications of Hetro Junction Double Gate (HJDG) TFET. IJEER 11(2), 596-600. https://doi.org// 10.37391/ijeer.110248.
- 27. A. Dharmireddy, A. S. Manohar, G. T. S. Hari, G. Gayatri, A. Venkateswarlu & C. T. Sai (2022). Detection of COVID-19 from X-RAY Images using Artificial Intelligence (AI). 2022 2nd International Conference on Intelligent Technologies (CONIT), Hubli, India. 1-5. https://doi.org//10.1109/CONIT55038.2022.9847741.
- A. Dharmireddy, M. Greeshma, S. Chalasani, S. T. Sriya, S. B. Ratnam 28. & S. Sara (2023). Azolla Crop Growing Through IOT by Using ARM CORTEX-M0. 2023 3rd International conference on Artificial Intelligence and Signal Processing (AISP), Vijayawada, India, 1-5. https://doi.org// 10.1109/AISP57993.2023.10135032.
- 29. Ajaykumar Dharmireddy. Sreenivasa Rao Ijjada, Dr. I. Hemalatha & Dr. Ch. Madhava Rao (2022). Surface Potential Model of Double Metal Fin Gate Tunnel FET. Mathematical Statistician and Engineering Applications, 71(3), 1044 -50. https://doi.org/10.17762 /msea. v71i3 .381.

#### **AUTHORS**



Sreemivasarao Ijjada his AMIE degree from The Institution of Engineers (INDIA) in the year 2001 and received MTech degree in the year 2006 from JNTU Kakinada. He completed his PhD from GITAM University, Visakhapatnam and is

currently working as an Assistant Professor in GITAM Institute of Technology, GITAM Deemed to be University, Visakhapatnam, Andhra Pradesh. He is a senior member of IEEE, Research gate and life member of AMIE. His research activities are related to low power Fin-FET VLSI design, and TFET technology, Microwave and Bio-signal processing. Email: <u>sijjada@gitam.edu</u>



Ajaykumar Dharmireddy received his BTech degree from BVCIT & S. Amalapuram affiliated to JNTUH in the year of 2006 and received ME degree in the year of 2009 from the Bannari Amman Institute of Technology, Sathyamagalam, Tamil Nadu affiliated to ANNA University,

Coimbatore. He is pursuing a PhD in GITAM Institute of Technology, GITAM Deemed to be University, Visakhapatnam and presently working as an Assistant Professor in SIR C R Reddy College of Engineering, Eluru, Andhra Pradesh. His research interests include Low-power VLSI Design, Fin FET and TFET technology. He is a life member of the IEI, ISTE and IAENG.

Corresponding author Email: ajaykumardharmireddy@sircrrengg.ac.in



M Sushanth Babu is currently the Director of academics at Chaitanya Bharathi Institute of Technology. He received PhD degree in electronics from Jawaharlal Nehru Technological University, Hyderabad. He is a senior

member of IEEE, Research gate and life member of IETE, LMISTE. He published several reputed research journals. His research activities are related to low power VLSI Design, wireless communications, cellular communications, 4G, 5G and Cooperative Networking. Email: <u>sushanthbabu@gmail.com</u>



Kotha Lavanya received her BTech and MTech degree from SIR C R Reddy College of Engineering, Eluru, India and presently working as an Assistant Professor in SIR C R Reddy College of Engineering, Eluru, Andhra Pradesh.

Her areas of interest are Low-power VLSI Design and IOT device.

Email: lavanyasudeep@gmail.com