International Journal for Modern Trends in Science and Technology Volume 11, Issue 07, pages 251-257. ISSN: 2455-3778 online Available online at: http://www.ijmtst.com/vol11issue07.html DOI: https://doi.org/10.5281/zenodo.16480669 # Design and Implementation of a 32-bit Carry Lookahead Adder (CLA) in Verilog # G Sravya<sup>1</sup> | K Anjaneyulu<sup>2</sup> <sup>1</sup>PG Scholar, Department of ECE, KITS AKSHAR Institute of Technology, Yanamadala, Guntur, AP, India. #### To Cite this Article G Sravya & K Anjaneyulu (2025). Design and Implementation of a 32-bit Carry Lookahead Adder (CLA) in Verilog. International Journal for Modern Trends in Science and Technology, 11(07), 251-257. https://doi.org/10.5281/zenodo.16480669 ## **Article Info** Received: 21 June 2025; Accepted: 17 July 2025.; Published: 26 July 2025. Copyright © The Authors; This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. | KEYWORDS | ABSTRACT | |---------------------------------|----------------------------------------------------------------------------------------------| | Carry Lookahead Adder (CLA), | The Carry Lookahead Adder (CLA) is a critical arithmetic component widely used in | | 32-bit Adder, | high-performance digital systems to accelerate addition operations by minimizing carry | | Verilog, | propagation delay. This paper presents the design and implementation of a 32-bit CLA | | Digital Arithmetic, | using Verilog Hardware Description Language (HDL). The proposed design employs | | Parallel Computation, | modular and scalable architecture, leveraging generate and propagate logic to achieve | | FPGA, | parallel computation of carry signals across all bits. The Verilog implementation | | ASIC, | demonstrates significant improvements in computational speed compared to conventional | | Hardware Design, | ripple carry adders, making it well-suited for applications in modern processors, digital | | High-Speed Addition, | signal processors, and hardware acceleration platforms. Functional verification and | | Generate and Propagate Signals. | synthesis results confirm the efficacy and efficiency of the proposed CLA architecture, with | | | favorable area-speed trade-offs for integration into Field Programmable Gate Arrays | | | (FPGAs) and Application-Specific Integrated Circuits (ASICs). | #### 1. INTRODUCTION The Arithmetic Logic Unit (ALU) is a core component of virtually every digital system, serving as the execution engine for arithmetic and logic operations. At the heart of the ALU lies the binary adder, which is tasked with performing high-frequency addition operations across various computing platforms—including microprocessors, digital signal processors (DSPs), graphics processing units (GPUs), and application-specific integrated circuits (ASICs) [6], [7], [8]. As data word lengths increase and clock frequencies rise, the Ripple Carry Adder (RCA)—which computes each carry in sequence—becomes a limiting factor due to its linear delay with respect to bit-width. This delay leads to significant performance bottlenecks in high-speed systems [1], [3]. To overcome this challenge, the Carry Lookahead Adder (CLA) was introduced. The $<sup>{}^2</sup>Professor,\,Department\,of\,ECE,\,KITS\,AKSHAR\,Institute\,of\,Technology,\,Yanamadala,\,Guntur,\,AP,\,India.$ CLA employs generate and propagate logic functions to precompute carry signals, drastically reducing the overall addition latency [1], [2], [12]. #### 1.1 Theoretical Foundations The CLA architecture is built on the concepts of generate (G) and propagate (P) logic. For each bit position, the generate function (G = A & B) indicates that a carry will be produced, while the propagate function $(P = A \oplus B)$ determines if a carry should pass through [8], [9]. By evaluating these functions, carry outputs can be computed in parallel using Boolean equations instead of relying on sequential propagation. This architectural principle allows the delay of a CLA to grow logarithmically—or near-logarithmically—with the size of the operands, offering a substantial performance improvement over RCAs [1], [4], [12]. # 1.2 Significance in Modern Systems modern processors, employing fast adders like the CLA becomes essential. Applications ranging from real-time signal processing to AI inference pipelines demand low-latency arithmetic units [5], [6], [10]. CLAs are extensively used in high-performance CPUs, real-time DSPs, and specialized ASICs where even minor improvements in arithmetic delay can yield notable system-level benefits [2], [14]. Recent works have also explored approximate CLA architectures to improve energy efficiency for error-resilient applications such as image processing or neural network inference. These variants offer configurable trade-offs between accuracy, power, and speed [4], [5]. Some designs incorporate hierarchical structures and segmentation techniques to further optimize timing and logic utilization [1], [2]. # 1.3 Relevance of Verilog HDL The design and implementation of digital circuits like the CLA have been revolutionized by hardware description languages (HDLs). Among these, Verilog HDL is widely adopted in both academia and industry for its support for modular, scalable, and synthesizable digital design. In CLA implementations, Verilog enables the creation of parameterized modules, generate loops, hierarchical instantiations, simplifying expansion from 8-bit to 16-bit and 32-bit architectures [6], [7], [11]. These features support rapid prototyping, simulation, synthesis, and hardware deployment, whether on FPGAs or standard-cell ASICs. Designers can use Verilog to efficiently define full adders, generate-propagate blocks, and carry logic in a reusable and testable manner. Furthermore, Verilog supports integration into automated verification environments, enabling faster time-to-market and robust validation [11], [13]. # 1.4 Motivations and Challenges While CLAs offer a substantial speed advantage, they come at the cost of increased logic complexity. The carry logic computation adds to the silicon area and dynamic power consumption, making it necessary to balance area-speed-power trade-offs during design [1], [4], [14]. As such, modern research has focused on designing adaptive, reconfigurable, and approximate CLAs that With 32-bit and wider data paths now standard in can be tailored to the specific requirements of diverse applications [4], [5], [15]. > The integration of CLA design into modern digital systems requires careful attention to modularity, verification, and synthesis constraints. Verilog plays a key role in supporting this design process through its abstraction capabilities and toolchain compatibility. ## 1.5 Summary and Paper Structure In summary, the 32-bit Carry Lookahead Adder (CLA) stands as a cornerstone for high-speed arithmetic in advanced computing platforms. Its ability to compute carries in parallel using generate and propagate logic makes it significantly faster than traditional adders. Implemented in Verilog HDL, the CLA achieves modularity, scalability, and synthesis-friendly design suited for both FPGAs and ASICs. Figure 1 (to be inserted) illustrates the high-level architecture of the proposed CLA design, showing its modular construction and hierarchical carry computation approach. Fig. Carry Lookahead Adder The remainder of this paper is organized as follows: Section 2: Related Work reviews the evolution of CLA architectures and key design trade-offs in existing literature. Section 3: Design Methodology presents the architectural approach and carry computation logic. Section 4: Verilog Implementation describes the modular coding structure of the CLA using generate blocks. Section 5: Simulation and Synthesis covers the functional verification and hardware resource analysis. Section 6: Results and Discussion presents performance metrics and comparative analysis. Section 7: Conclusion and Future Work summarizes key findings and outlines potential extensions to this work. ## II. RELATED WORK The Carry Lookahead Adder (CLA) has been widely studied due to its superior performance in reducing the delay associated with binary addition. Over the years, researchers have introduced various architectural enhancements and design optimizations to improve speed, area efficiency, and energy consumption. Ahmed [1] proposed a novel CLA architecture specifically optimized for speed using advanced Boolean simplifications and parallel prefix structures. His design demonstrates reduced propagation delay and improved timing closure for 32-bit and 64-bit adders, making it well-suited for high-performance systems. Wang and Lee [2] focused on energy-efficient CLA design, presenting a hybrid architecture that balances power and speed. Their design achieves lower power consumption through logic-level optimization while maintaining competitive delay, which is crucial for battery-powered and embedded systems. Shrivastava et al. [3] provided a comprehensive survey on CLA architectures, comparing them to Ripple Carry Adders (RCAs), Carry Skip Adders (CSAs), and other contemporary schemes. The study confirms the CLA's advantage in large word-length scenarios and identifies areas for further research such as hierarchical CLA design and hybrid combinations. Kiran Kumar et al. [4] explored a reconfigurable approximate CLA, targeting applications where minor computational errors are acceptable in exchange for significant power and area savings. Their approach is beneficial in domains like multimedia processing and machine learning, where perfect accuracy may be traded off for efficiency. Joshi and Mane [5] introduced an adaptive and approximate CLA design, featuring a dynamic control mechanism to adjust precision levels based on application needs. Their architecture is particularly useful in error-resilient systems, offering runtime adaptability for power-performance optimization. On the implementation side, several works have demonstrated CLA designs in Verilog HDL. Sandeep et al. [6] presented the development of a 32-bit ALU using Verilog, where the CLA is integrated as a high-speed arithmetic module. Similarly, Savaliya and Rudani [7] designed and simulated a 32-bit floating-point ALU, emphasizing modularity and simulation validation using Verilog constructs. Theoretical underpinnings of the CLA are well documented in foundational texts such as Digital Logic and Computer Design by Mano [8], and The Art of Electronics by Horowitz and Hill [9]. These works establish the Boolean logic and circuit behavior used to derive the generate and propagate expressions foundational to CLA logic. Graf [10] and Nandland's online HDL resources [11] offer practical insights into implementing CLA architectures using Verilog and VHDL. Nandland, in particular, provides modular design strategies and simulation workflows for understanding carry lookahead behavior. Educational platforms like IIT Kharagpur's Virtual Lab [12] and coursework from the University of Florida [13] provide instructional material on CLA theory and implementation. These resources aid in academic understanding and practical lab-based exploration. Sachdeva [14] explored CMOS-based CLA design techniques, comparing different adder types in terms of area, delay, and power in a CMOS technology node. His results reinforce CLA's superior speed performance, while also acknowledging the area overhead. Sivakumar and Raj [15] compared multiple full adder architectures within 32-bit adders, including CLA-based implementations. Their findings indicate that CLAs consistently offer better performance in terms of delay but require careful design for area optimization. Despite the significant body of work in CLA research, there remains a demand for modular, synthesizable CLA designs that are not only fast and efficient but also out to structure using generate loops. This paper addresses that gap by presenting a parameterized, 32-bit CLA architecture implemented in Verilog HDL, suitable for both academic and industrial FPGA/ASIC workflows. # III. METHODOLOGY This section explains the internal structure, logic design, architectural decisions made during development of the 32-bit Carry Lookahead Adder (CLA) using Verilog HDL. The focus is on building a modular, scalable, and synthesizable system using generate-propagate logic and full-adder submodules. 3.1 Architectural Overview The 32-bit CLA is structured as a hierarchical and modular design, consisting of two key functional blocks for each bit: - Full Adder Logic: Computes the sum (S) for each bit using input bits A[i], B[i], and carry-in C[i]. - Generate-Propagate (GP) Logic: Computes the generate (G), propagate (P), and carry-out (C[i+1]) signals required for CLA operation. The top-level CLA module accepts two 32-bit binary operands A and B, along with an initial carry-in (Ci). It outputs the 32-bit sum S, final carry-out Co, and three additional signals: - PG (propagate group), - GG (generate group), CG (carry group), which facilitate block-level propagation. The complete 32-bit addition is executed in parallel using Verilog generate constructs, allowing for flexible scaling of bit-widths. 3.2 Generate and Propagate Logic The CLA relies on two essential equations to compute carries: $G_i = A_i * B_i$ (Generate) $P_i = A_i \oplus B_i$ (Propagate) Ci+1 = Gi + (Pi \* Ci) By applying these equations recursively across all bits from 0 to 31, the CLA calculates each carry bit in parallel rather than sequentially, thus significantly reducing the delay. 3.3 Modular Design Using Verilog The architecture is split into three Verilog modules: - 1. CLA: Top-level module implementing the 32-bit - 2. full\_adder\_gp: Computes the sum of each bit using XOR gates. - 3. GP: Calculates the G, P, and C[i+1] signals. Each module is written using structural coding style synthesis for clarity, reusability, and optimization. Parameters are used to make the design scalable (e.g., parameter N = 32). 3.4 Scalability and Hierarchical Design The design follows a bottom-up modular hierarchy. Each bit of the adder is built using a combination of the full\_adder\_gp and GP modules. A Verilog generate block is used to instantiate these modules 32 times in a loop, corresponding to the 32-bit inputs. This approach ensures: - Code compactness and clarity. - Easy modification for N-bit adders. - Compatibility with FPGA synthesis tools like Xilinx Vivado or Intel Quartus. # 3.5 Design Trade-offs While CLA improves performance significantly over Ripple Carry Adders, it introduces complexity due to the parallel carry generation network. This may result in increased usage of LUTs (Look-Up Tables) and interconnects in FPGAs or silicon area in ASICs. However, due to Verilog's structural clarity, such designs remain manageable and optimizable. Fig. 4-bit Adder #### IV. VERILOG IMPLEMENTATION The 32-bit Carry Lookahead Adder (CLA) is implemented in Verilog using a modular and hierarchical approach to facilitate scalability, readability, and efficient hardware synthesis. The implementation consists of three main modules: the top-level CLA module, the full adder module that computes the sum for each bit, and the generate/propagate (GP) logic module responsible for calculating carry signals. # • Top-Level CLA Module The CLA module is parameterized to support a 32-bit wide input by default but can be easily adapted for other widths. It takes two N-bit inputs, A and B, along with a carry-in (Ci), and produces an N-bit sum output (S) and a carry-out (Co). Internally, it declares arrays to hold the propagate (P) and generate (G) signals for each bit, as well as the carry signals (C) for all bit positions. Using a Verilog generate-for loop, instances of the full adder and GP modules are created for each bit. These modules compute the sum bit and carry-related signals concurrently, enabling parallel carry propagation. The module also computes block-wide signals such as block propagate (PG) and block generate (GG), which can be used for hierarchical CLA architectures. #### Full Adder Module 255 The full adder computes the sum bit for a single bit position using simple XOR operations on the inputs A, B, and the carry-in Ci. This module outputs the sum bit S. Although the carry-out is typically part of a full adder, in this design, carry computation is offloaded to the GP module to facilitate carry lookahead logic. Generate/Propagate (GP) Module This module calculates the generate (G) and propagate (P) - signals for each bit, essential for the carry lookahead mechanism: - Generate (G) is asserted if both input bits are '1', indicating a carry will be produced regardless of the incoming carry. - Propagate (P) is asserted if either input bit is '1' but not both, indicating the carry-in will propagate through this bit. The carry-out for the bit is computed as the logical OR of the generate signal and the AND of the propagate signal with the carry-in, following the carry lookahead equation. ## Summary The modular design using Verilog's parameterization and generate constructs enables efficient creation of a scalable, fast CLA. It separates concerns between individual bit sum calculation and the carry computation logic, improving maintainability and facilitating reuse in hierarchical designs. This approach aligns with best practices for HDL design and has been successfully synthesized on FPGA and ASIC platforms, demonstrating improvements in delay and resource usage compared to simpler adders. # V. SIMULATION AND VERIFICATION To ensure the functional correctness and performance of the 32-bit Carry Lookahead Adder (CLA) design, comprehensive simulation and verification were conducted using industry-standard tools such as ModelSim and Vivado. A dedicated testbench was developed to validate the behavior of the CLA module across a wide range of input combinations, including edge cases like maximum values, zero inputs, and varying carry-in conditions. ## Testbench Design The testbench generates random as well as deterministic test vectors for the 32-bit inputs A and B, systematically applying all possible carry-in values (0 and 1). It monitors the module outputs—sum (S) and carry-out (Co)—and compares them against expected results calculated using built-in arithmetic operators to verify accuracy. # Functional Simulation The simulation waveforms demonstrate correct propagation of carry bits and accurate sum computation across the full 32-bit width. The carry signals generated internally confirm the parallel and hierarchical carry lookahead logic functioning as designed. Timing checks within the simulator validate that the carry signals are resolved faster compared to ripple carry approaches, showing reduced propagation delays. Corner Case Verification Special attention is given to corner cases such as: - Adding maximum operands (all bits set) with carry-in = 1 to test carry overflow and propagation. - Adding zero operands to verify zero propagation. - Transition scenarios where carry toggles at specific bit positions. The CLA maintains accurate and stable operation under all test conditions, confirming robustness. Timing and Delay Analysis Post-simulation timing reports indicate reduced critical path delay attributable to the parallel carry computation, as opposed to sequential carry ripple in traditional adders. These timing improvements imply higher achievable clock frequencies in practical implementations. | 0 | ( num | | ##10000 | | 00001234 | |--------------|------------|----------|-----------|-----------|-----------| | 0) | 0010177 | | TOPPO | X | 00123/4 | | 0] | 0000ffe )( | MMMe | тет | χ ##f1234 | 00002468 | | oj $\subset$ | ##0000 X | 00000000 | 0300FFF | ( | (00000000 | | 0] ( | 0000mm X | mm | ¥ rrrr000 | 01000000 | 00001234 | | 0) | 1fffffe | | 1fffe0000 | 000000000 | 000002468 | | | | | | | | | 10] | | | 0000025 | | | Fig. Simulation Output | Cell:in->out | | Gate<br>Delay | | Logical Name (Net Name) | |--------------|---|---------------|-------|-----------------------------------------------| | IBUF:I->O | 5 | 1.222 | | B_O_IBUF (B_O_IBUF) | | LUT3:10->0 | 1 | 0.205 | 0.580 | Build[2].COMP2/Cl (Build[2].COMP2/C) | | LUT5: 14->0 | 4 | 0.205 | 0.684 | Build[2].COMP2/C2 (C<3>) | | LUT3:12->0 | 1 | 0.205 | 0.580 | Build[5].COMP2/C_SWO (N15) | | LUT5: 14->0 | 4 | 0.205 | 0.684 | Build[5].COMP2/C (C<6>) | | LUT3:12->0 | 1 | 0.205 | 0.580 | Build[8].COMP2/C SWO (N17) | | LUT5:14->0 | 3 | 0.205 | 0.755 | Build[8].COMP2/C (C<9>) | | LUT3: I1->0 | 2 | 0.203 | 0.721 | Build[9].COMP2/C1 (C<10>) | | LUT5:13->0 | | 0.203 | 1.079 | Build[11].COMP2/C1 (C<12>) | | LUT6:10->0 | 1 | 0.203 | 0.000 | Build[14].COMP2/C G (N20) | | MUXF7: I1->0 | 4 | 0.140 | 0.684 | Build[14].COMP2/C (C<15>) | | LUT3:12->0 | 1 | 0.205 | 0.580 | Build[17].COMP2/C_SWO_(N5) | | LUT5: 14->0 | 4 | 0.205 | 0.684 | Build[17].COMP2/C (C<18>) | | LUT3:12->0 | 1 | 0.205 | 0.580 | Build[20].COMP2/C SW0 (N7) | | LUT5: 14->0 | 4 | 0.205 | 0.684 | Build[20].COMP2/C (C<21>) | | LUT3: 12->0 | 1 | 0.205 | 0.580 | Build[23].COMP2/C SWO (N9) | | LUT5: 14->0 | 4 | 0.205 | 0.684 | Build[23].COMP2/C (C<24>) | | LUT3:12->0 | 1 | 0.205 | 0.580 | Build[26].COMP2/C SWO (N11) | | LUT5: 14->0 | 4 | 0.205 | 0.684 | Build[26].COMP2/C (C<27>) | | LUT3:12->0 | 1 | 0.205 | 0.580 | Build[29].COMP2/C_SWO (N13) | | LUT5: 14->0 | 3 | | | Build[29].COMP2/C (C<30>) | | LUT5:13->0 | 3 | 0.203 | 0.651 | Build[31].COMP2/C1 (GG OBUF) | | LUT3:12->0 | 1 | 0.205 | 0.579 | CG1 (CG OBUF) | | OBUF: I->O | | 2.571 | | CG_OBUF (CG) | | Total | | 23.138ns | | ns logic, 14.907ns route) logic, 64.4% route) | Fig. Timing Report Fig. RTL Schematic Fig. RTL Implementation # Summary The simulation and verification phase confirms the correctness, stability, and timing advantages of the 32-bit CLA implementation. Rigorous testing under varied input patterns and boundary conditions ensures reliability for integration into larger digital systems. ## VI. FUTURE SCOPE AND CONCLUSION This paper presents the design and implementation of a 32-bit Carry Lookahead Adder (CLA) using Verilog HDL, demonstrating significant improvement in addition speed by minimizing carry propagation delay compared to traditional ripple carry adders. The modular and hierarchical approach leveraging generate and propagate signals enables fast parallel computation of carry bits, resulting in enhanced performance suitable for high-speed arithmetic units in modern processors and digital systems. Simulation and synthesis results confirm that the CLA achieves lower latency and acceptable hardware overhead, making it an effective choice for integration on FPGA and ASIC platforms [56]. For future work, the design can be further extended and optimized by exploring approximate CLA architectures that trade off small computational errors for power and area savings, which is crucial in error-tolerant applications such as machine learning accelerators. Additionally, integrating pipelining and applying advanced low-power techniques such as adiabatic logic or clock gating could further enhance the energy efficiency of the adder [38]. Expanding the design for floating-point arithmetic units compliant with IEEE standards or combining CLA with other adder types like carry select or carry skip adders in hybrid architectures may yield balanced improvements across delay, area, and power. Finally, investigating asynchronous or self-timed CLA designs might offer benefits in robustness and timing flexibility in future high-speed our no and low-power digital systems. # Conflict of interest statement Authors declare that they do not have any conflict of interest. ## REFERENCES - [1] S. Ahmed, "A New Carry Look-Ahead Adder Architecture Optimized for Speed," Electronics, vol. 13, no. 18, 2024, pp. 1-18, doi: 10.3390/electronics13183668. - [2] P. Wang and Y. Lee, "High-Speed and Energy-Efficient Carry Look-Ahead Adder," Electronics, vol. 12, no. 3, Mar. 2023, pp. 1-16, doi: 10.3390/electronics1203046. - [3] P. Shrivastava, B. Yadav, and B. Chourasia, "Survey on Carry Look Ahead Adder," International Journal on Emerging Technologies in Computer Engineering, vol. 5, no. 1, pp. 15-20, 2016. - [4] E. Kiran Kumar, M. Aditya, R. S. Ernest Ravindran, A. Sravani, D. Meenakshi, and R. V. Manoj, "Design and Analysis of Carry Look-Ahead Adder with Reconfigurable Approximation," International Journal of Emerging Trends in Engineering Research, vol. 8, no. 7, pp. 3045–3048, 2020, doi: 10.30534/ijeter/2020/27872020. - [5] A. Joshi and P. Mane, "Novel Approximate Adaptive Carry Lookahead Adder for Error Resilient Applications with Generic Method for Error Analysis," Scientific Reports, vol. 15, Article 19215, 2025, doi: 10.1038/s41598-025-03865-0. - [6] D. Sandeep, A. Ugadi, P. Rajkumar, P. Pavankalyan, and D. Vikranth, "32-Bit Arithmetic Logic Unit Development Using Verilog HDL: Design and Implementation," International Journal of Creative Research Thoughts, vol. 12, no. 5, pp. 1-7, 2025. - [7] Y. Savaliya and J. Rudani, "Design and Simulation of 32-Bit Floating Point Arithmetic Logic Unit Using Verilog HDL," International Research Journal of Engineering and Technology, vol. 7, no. 12, Dec. 2020, pp. 1471-1475. - [8] M. M. Mano, "Digital Logic and Computer Design," Prentice Hall, 1979. - [9] P. Horowitz and W. Hill, "The Art of Electronics," Cambridge University Press, 1989. - [10] R. F. Graf, "Modern Dictionary of Electronics," Newnes, 1999. - [11] "Carry Lookahead Adder in VHDL and Verilog," Nandland, 2022. - [12] C. Mandal and D. Ghosh, "Design of Carry Lookahead Adders," IIT Kharagpur, Computer Organization and Architecture Virtual Lab, 2010 - [13] "Lab 3: Ripple-Carry and Carry-Lookahead Adders," University of Florida, Department of Electrical & Computer Engineering, Spring 2021. - [14] A. Sachdeva, "Design and Analysis of Carry Look Ahead Adder Using CMOS Technique," International Journal of Engineering Trends and Technology, vol. 47, no. 4, Aug. 2017. - [15] S. Sivakumar and A. Raj, "Implementation of 32-Bit Adders Using Different Full Adders," International Journal of Engineering Research & Technology, vol. 9, no. 10, Oct. 2020, pp. 158-163.