

American Journal of Advanced Research, 2024, 8–2

December 2024, pages 11-14
doi: 10.5281/zenodo.13862647

http://www.ibii-us.org/Journals/AJAR/

ISBN 2572-8849 (Online), 2572-8830 (Print)

# Development and Evaluation of a Platform for Comparison of Processor Performance

Muhammad Z Hasan\*, Jacob Flores

School of Engineering, Texas A&M International University, 5201 University Boulevard, Laredo, TX 78041.

\*Email: muhammad.hasan@tamiu.edu

Received on 05/24/2024; revised on 09/27/2024; published on 09/30/2024

#### **Abstract**

Embedded microprocessor systems are used every day by millions of people. They are buried inside the products (or the equipment) that it controls such as cars, fridges, ovens, traffic lights, hand-held devices. Embedded processors are expected to grow worldwide. Current embedded processors consume more power than their earlier generations. It is important for embedded systems (especially battery-operated ones) to choose a processor that consumes less power. The focus of this project is to develop a platform to investigate the dynamic power consumption of various processors when it is running. Such a platform is proposed and power consumption of two simple processors is measured on the platform using floating point benchmark application. Power consumption is reported at various operating frequencies.

# 1 Introduction, Literature Review, and Problem Definition

There are two kinds of power consumption in a processor: static and dynamic. Static power, as the name implies, is the power consumption when the processor is idle. In contrast, dynamic power is the power consumption when the processor is executing a code. Authors in [1] have developed methodology to support the various concepts of reducing static power. Authors have simulated the methodology in an appropriate framework and have observed more than 50% reduction in static power consumption (when idle) for two example microprocessors. The authors of this paper would like to investigate the dynamic power consumption when the processor is running.

The Very Simple Central Processing Unit (VSCPU) has only four instructions whereas the Relatively Simple Central Processing Unit (RSCPU) has sixteen instructions [2]. However, for a particular task, the code written for VSCPU is expected to be larger than that of RSCPU. As such, the execution time for VSCPU is expected to be longer as well. In this project, RSCPU is designed and simulated with floating point operations to estimate its power consumption. These consumption figures will be compared with those of VSCPU. When completed, this work will demonstrate the effect of instruction-set size on dynamic power consumption of processors. The first objective is to develop a platform to measure dynamic power. The

next objective is to compare the dynamic power of RSCPU with that of VSCPU.

Authors in [3] have discussed factors affecting power consumption in multicore and multithreaded processors. They discussed the architectural features that affect power consumption. However, there is no mention about instruction set size and benchmark application. Authors in [4] have considered power consumption of a complete system with cache, DMA, memory, and disk subsystems. They also considered benchmark applications for servers without any reference to the processor instruction set. As such, it does not include exclusive power consumption of the processor. Proposers in [5] have mentioned a hardware platform for measuring power of multi-core processor. However, it also measures the total power of the whole system including the processor but not exclusively of the processor. Researchers in [6] have proposed a hardware platform to measure runtime power of processor. The platform can do so only for hardwired processors. Our platform can do the same for custom processors in FPGA. Authors of [7] have stated many types of energy measurement techniques. They concluded that instrument-based measurement is expensive and needs difficult hardware modifications. Simulators estimate the energy consumption of embedded software rapidly and enable researchers and developers of embedded systems to obtain the required energy consumption data without setting up a hardware environment. These capabilities shorten the development cost and time and lay the foundation for power consumption analysis and optimization for embedded software. Researchers mentioned guidelines for measuring energy consumption of software applications in [8]. The focus of this paper is on two versions of software running on fixed hardware and not on multiple hardware. Based on the above, we define the problem as follows: Given a task and two different processors (with varying computing capabilities), the objective is to find out which processor consumes less dynamic power. The Very Simple Central Processing Unit (VSCPU) has only four instructions whereas the Relatively Simple Central Processing Unit (RSCPU) has sixteen instructions. Using design and simulation tools, it is required to develop a platform to measure dynamic power. Also, using the platform, it is expected to find their (VSCPU, RSCPU) power consumption for a particular benchmark task with floating point operations, then compare their performance.

# 2 Design of the Platform

The CPU hardware was designed using an iterative process of making each subcomponent and doing multiple rounds of testing to ensure proper operation [9]. Each component having a different operation, was hierarchically integrated to form the overall processor. A memory was also attached to the designed processor that holds the executable code. The experimental design flow is depicted in Figure 1.



Figure 1: The Experimental Design Flow

Design is carried out according to specifications of [2] within Quartus II tool using VHDL. After compilation of the VHDL file, pin assignment is completed for inputs and outputs. Then the generated netlist is downloaded to an Intel / Altera FPGA board for functional verification [10]. Once verified, the design is ported into Modelsim tool and simulated with the floating-point execution code. The resulting signal activity is captured in Value Change Dump (VCD) file. VCD file is imported to Quartus II for power analysis. The power consumption values are then generated for comparison.

## 3 Design of Experiment

Experimental design consists of measuring power consumption for two simple processors under a benchmark application. It involves all the steps mentioned above for the two experimental processors. Design of the VSCPU with ADD, AND, JMP, and INC instructions, as in Table 1, was completed first. Then the benchmark code for floating point operations was written. Execution of the code was verified on the FPGA hardware. Simulation tool was used for power calculation.

Table 1: VSCPU Instructions

| Instruction | Code     |
|-------------|----------|
| ADD         | 00AAAAA  |
| AND         | 01AAAAAA |
| JMP         | 10AAAAAA |
| INC         | 11XXXXXX |

AAAAAA in the table represents a 6-bit address. XXXXXX represents any possible value of no significance.

Then the design of the RSCPU that has sixteen instructions (as in Table 2) was completed next. The benchmark code for floating point operations was written with this larger choice of instructions. Use simulation tool for power calculation. Execution of this code was verified on the FPGA hardware. As before, simulation tools were used for power calculation.

Table 2: RSCPU Instructions

| Instruction | Code     |
|-------------|----------|
| NOP         | 00000000 |
| LDAC        | 00000001 |
| STAC        | 0000010  |
| MVAC        | 00000011 |
| MOVR        | 00000100 |
| JUMP        | 00000101 |
| JMPZ        | 00000110 |
| JPNZ        | 00000111 |
| ADD         | 00001000 |
| SUB         | 00001001 |
| INAC        | 00001010 |
| CLAC        | 00001011 |
| AND         | 00001100 |
| OR          | 00001101 |
| XOR         | 00001110 |
| NOT         | 00001111 |

Following these steps, power consumption figures for the two processors were compared to find the effect of instruction-set size on dynamic power consumption of processors.

# 4 Performance Benchmark Application

To validate the processor performance of power, authors used floating point multiplication and floating-point addition. This is widely used in scientific computing. Both operations are coded as close as possible using the available instructions of the processor.

The test programs use the IEEE 754 standard [11] for floating point numbers as shown in Figure 2.

| Sign          | Exponent | Mantissa |
|---------------|----------|----------|
| 1-bit         | 8-bits   | 23-bits  |
| Total 32-bits |          |          |

Figure 2. Representation of Floating-Point Numbers

In the memory, the numbers (operands) are stored as shown in Table 3. EEEEEEEE is the exponent, S is the sign, and MMMMMMMM is the mantissa part of the number.

Table 3. Memory Map of the two Operands

| Memory Location |           | Contents             |
|-----------------|-----------|----------------------|
| 56              |           | EEEE EEEE (Exponent) |
| 57              |           | SMMM MMMM (Sign,     |
|                 | Operand 1 | Mantissa)            |
| 58              |           | MMMM MMMM (Man-      |
|                 |           | tissa)               |
| 59              |           | MMMM MMMM (Man-      |
|                 |           | tissa)               |
| 60              |           | EEEE EEEE (Exponent) |
| 61              |           | SMMM MMMM (Sign,     |
|                 | Operand 2 | Mantissa)            |
| 62              |           | MMMM MMMM (Man-      |
|                 |           | tissa)               |
| 63              |           | MMMM MMMM (Man-      |
|                 |           | tissa)               |

Floating Point Multiplication: The general algorithmic steps are as follows.

Multiply the mantissas.

Add the Exponents.

Multiplication can be considered as repeated addition. As such, the above two steps imply a series of ADD operations for the processors.

**Floating Point Addition**: The general algorithmic steps are as follows. Adjust the mantissa of one operand by changing exponent and making exponents equal.

Add mantissas.

Comparison of exponents can be achieved by subtraction (2's complement addition) and checking for non-zero result. Adjustments of the mantissa involve multiplication or division. As such, the above two steps imply a series of AND, JMP (JPNZ), and ADD (SUB) operations.

# 5 Results and Analysis

There were three programs executed by the processors. The first one was to simply increment the accumulator repeatedly in an infinite loop. This was used to verify the functionality of the processor on FPGA board. The second one was floating point addition, and the third one was floating point multiplication. Simulating and analyzing the power consumption of the VSCPU with each program (increment, addition, multiplication) in the above-mentioned platform resulted in the power values as seen in Tables

4, 5, and 6. In the table, total power consumed represents the power consumed by full FPGA device whereas the other column represents the exclusive power consumption of the processor.

Table 4. VSCPU Increment Procedure Clock Frequency and Power Consumption

| Clock Frequency | Power Consumed | Total Power |
|-----------------|----------------|-------------|
|                 | by Processor   | Consumed    |
|                 | Hardware       |             |
| 50 MHz          | 3.17 mW        | (67.79 mW)  |
| 40 MHz          | 2.52 mW        | (67.14 mW)  |
| 20 MHz          | 1.21 mW        | (65.83 mW)  |
| 10 MHz          | 0.56 mW        | (65.18 mW)  |
| 5 MHz           | 0.15 mW        | (64.77 mW)  |

Table 5. VSCPU Addition Procedure Clock Frequency and Power Consumption

| Clock Frequency | Power Consumed | Total Power |
|-----------------|----------------|-------------|
|                 | by Processor   | Consumed    |
|                 | Hardware       |             |
| 50 MHz          | 3.17 mW        | (67.79 mW)  |
| 40 MHz          | 2.52 mW        | (67.14 mW)  |
| 20 MHz          | 1.21 mW        | (65.83 mW)  |
| 10 MHz          | 0.56 mW        | (65.18 mW)  |
| 5 MHz           | 0.15 mW        | (64.77 mW)  |

Table 6. VSCPU Multiplication Procedure Clock Frequency and Power Consumption

| Clock Frequency | Power Con- | Total Power |
|-----------------|------------|-------------|
|                 | sumed by   | Consumed    |
|                 | Processor  |             |
|                 | Hardware   |             |
| 50 MHz          | 3.17 mW    | (67.79 mW)  |
| 40 MHz          | 2.52 mW    | (67.14 mW)  |
| 20 MHz          | 1.21 mW    | (65.83 mW)  |
| 10 MHz          | 0.56 mW    | (65.18 mW)  |
| 5 MHz           | 0.15 mW    | (64.77 mW)  |

Based on the results furnished in the above tables, it appears that the amount of power consumption of the VSCPU for each of the programs is the same because each program executes in an infinite loop. Furthermore, an increase in the clock speed causes a linear increase in the amount of power for the same program.

Finally, the same analysis method was used to quantify the amount of power consumed by the RSCPU operation. The results are furnished in Tables 7, 8, and 9.

Table 7. RSCPU Increment Procedure Clock Frequency and Power Consumption

| Clock Frequency | Power Consumed | Total Power |
|-----------------|----------------|-------------|
|                 | by Processor   | Consumed    |
|                 | Hardware       |             |
| 50 MHz          | 0.42 mW        | (65.04 mW)  |
| 40 MHz          | 0.39 mW        | (65.01 mW)  |
| 20 MHz          | 0.34 mW        | (64.96 mW)  |
| 10 MHz          | 0.31 mW        | (64.93 mW)  |
| 5 MHz           | 0.3 mW         | (64.92 mW)  |

Table 8. RSCPU Addition Procedure Clock Frequency and Power Consumption

| Clock Frequency | Power Consumed | Total Power |
|-----------------|----------------|-------------|
|                 | by Processor   | Consumed    |
|                 | Hardware       |             |
| 50 MHz          | 0.42 mW        | (65.04 mW)  |
| 40 MHz          | 0.39 mW        | (65.01 mW)  |
| 20 MHz          | 0.34 mW        | (64.96 mW)  |
| 10 MHz          | 0.31 mW        | (64.93 mW)  |
| 5 MHz           | 0.3 mW         | (64.92 mW)  |

Table 9. RSCPU Multiplication Procedure Clock Frequency and Power Consumption

| Clock Frequency | Power Consumed by Processor Hardware | Total Power<br>Consumed |
|-----------------|--------------------------------------|-------------------------|
| 50 MHz          | 0.42 mW                              | (65.04 mW)              |
| 40 MHz          | 0.39 mW                              | (65.01 mW)              |
| 20 MHz          | 0.34 mW                              | (64.96 mW)              |
| 10 MHz          | 0.31 mW                              | (64.93 mW)              |
| 5 MHz           | 0.3 mW                               | (64.92 mW)              |

The results of the above tables show that the amount of power consumed by the RSCPU is much lower when compared to the VSCPU despite the higher complication in the design. The amount of power is consistent between the simulation of the different test programs since each program executes in an infinite loop. The increase in the power is also consistent with the increase in the clock speed.

#### Conclusion

A platform was developed for measuring dynamic power of processors. Two simple processors with different instruction set sizes were used as objects. Floating point benchmark programs were executed on the processors. From the results gathered, the platform is shown to be consistent with the expectations set from the onset of this project. The platform can perform all the actions required during the process of processor design, alteration, and analysis. Furthermore, it was shown to give desired results for processor power analysis which lends itself to the reliability of the established platform. The results show that the RSCPU consumes less power than the VSCPU for the given floating point benchmark programs.

### **Funding**

This work was awarded a University Research Grant by the Texas A&M International University.

Conflict of Interest: none declared.

#### References

- M. Z. Hasan and Mathew Bird, "Energy Reduction for Processors in Reconfigurable Logic," IEEE International Conference on Electro/Information Technology (EIT), Minnesota State University, Mankato, MN, USA, May 15-17, 2011.
- [2] Carpinelli, John D. "CPU DESIGN." Computer Systems Organization & Architecture, Addison-Wesley, Boston, 2001.
- [3] Vijayalakshmi Saravanan, Senthil Kumar Chandran, Sasikumar Punekar and D P Kothari, "A Study on Factors Influencing Power Consumption in Multithreaded and Multicore CPUs", WSEAS TRANSACTIONS on COMPUTERS, Issue 3, Volume 10, March 2011.
- [4] W. Lloyd Bircher and Lizy K. John, "Complete System Power Estimation using Processor Performance Events", https://lca.ece.utexas.edu/pubs/bircher-TC2012.pdf (May 22, 2024).
- [5] Kuo-Yi Chen1, Fuh-Gwo Chen, Jr-Shian Chen, "A Cost-effective Hardware Approach for Measuring Power Consumption of Modern Multi-core Processors", https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=606ba4353460df484e617bb24e311eb-dbd0ab2b8 (May 23, 2024).
- [6] Canturk Isci and Margaret Martonosi, "Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data", https://mrmgroup.cs.princeton.edu/papers/canturkmicro.pdf (May 23, 2024).
- [7] Chen guo, Song C I, Yanglin zhou, and Yang yang "A Survey of Energy Consumption Measurement in Embedded Systems", IEEE Access, April 19, 2021.
- [8] Luca Ardito, Riccardo Coppola, Maurizio Morisio, Marco Torchiano, "Methodological Guidelines for Measuring Energy Consumption of Software Applications", Scientific Programming, vol. 2019, Article ID 5284645, 16 pages, 2019.
- [9] "Central Processing Unit (CPU)." GeeksforGeeks, GeeksforGeeks, 12 July 2023, www.geeksforgeeks.org/central-processing-unit-cpu/.
- [10] P0493 TERASIC Inc. | Development Boards, Kits, Programmers | DigiKey, www.digikey.com/en/products/detail/terasic-inc/P0493/7034078. Accessed 10 Dec. 2023.
- [11] "IEEE Standard 754 Floating Point Numbers." Desforges's, GeeksforGeeks, 16 Mar. 2020, www.geeksforgeeks.org/ieee-standard-754-floating-point-numbers/.
- [12] Muhammad Z. Hasan and Jacob Flores, "Design and Development of a Platform for Comparison of Processor Performance", 2024 International Conference of Advanced Research in Applied Science, Engineering and Technology (ICARASET'24), Houston, TX, March 28, 2024.