Artículos

Performance evaluation of M-ary algorithm using reprogrammable hardware

Evaluación del desempeño del algoritmo M-ary en hardware reprogramable

Sergio Andrés Arenas-Hoyos
Universidad del Valle, Colombia
Álvaro Bernal-Noreña
Universidad del Valle, Colombia

Performance evaluation of M-ary algorithm using reprogrammable hardware

DYNA, vol. 84, no. 203, pp. 75-79, 2017

Universidad Nacional de Colombia

Received: 05 June 2017

Revised document received: 23 August 2017

Accepted: 18 September 2017

Abstract: Several ways to perform data encryption have been found, and one of the functions involved in standard algorithms such as RSA is the modular exponentiation. Basically, the RSA algorithm uses some properties of modular arithmetic to cipher and decipher plain text, with a certain performance dependence on text lengths. The growth in computing capacity has created the need to use robust systems that can perform calculations with significantly large numbers and the formulation of procedures focused on improving the speed to achieve it. One of these is the M-ary algorithm for the execution of the modular exponential function. This paper describes an implementation of this algorithm in reprogrammable hardware (FPGA) to evaluate its performance.

The first section of this work introduces the M-ary algorithm. The second section uses block description for implementation understanding. The third section shows the results in time diagrams, and finally, the last section conclusions.

Keywords: cryptosystems, modular exponentiation, modular arithmetic, RSA algorithm, FPGA, M-ary algorithm..

Resumen: Se han encontrado diversas formas de realizar cifrado de datos, y una de las funciones involucradas en algoritmos estándar como el RSA es la exponencial modular. Básicamente, el algoritmo RSA utiliza algunas propiedades de la aritmética modular para cifrar y descifrar textos planos, con cierta dependencia en la longitud del texto. El crecimiento en la capacidad de cómputo ha creado la necesidad de utilizar sistemas robustos que puedan realizar cálculos con números significativamente grandes, y la formulación de procedimientos enfocados en mejorar la velocidad para lograrlo. Uno de éstos es el algoritmo M-ary para la ejecución de la función exponencial modular. Este artículo describe una implementación de este algoritmo en hardware reprogramable (FPGA) para evaluar su desempeño.

La primera sección introduce el algoritmo M-ary. La segunda, usa descripción en bloques para comprender la implementación. La tercera, muestra los resultados en diagramas de tiempo, y finalmente, la última sección expone conclusiones.

Palabras clave: criptosistemas, exponencial modular, aritmética modular, algoritmo RSA;FPGA, Algoritmo M-ario..

1. Introduction

Modern systems are designed to obtain better performance in terms of merit figures, such as power, area and/or speed. In hardware terms, the natural language of a computer is a set of integers restricted by a number, named modulo, establishing a ring of integers. This introduces the concept of modular arithmetic and its consequent number representation that is used in many areas, including digital signal processing or cryptographic systems [5]. This representation been probed using modular arithmetic, specifically Residue Number System RNS [12], instead of two-complement or classical weighted binary, to increase the speed performance in FIR filters [8,9].

This paper focuses on a specific implementation of a special modular function, modular exponentiation. Taking advantage of the inherent flexible characteristics of FPGA architecture, it is possible to develop a system that uses the M-ary algorithm [10] to test various characteristics, which are verified with a software equivalent and compared with previous implementations of the method.

Previous systems were made with different schemes, taking advantage of a pure hardware co-design approach (hardware subsystem and software subsystem) or using algorithms such as an addition-chain to reduce multiplication steps [7,11]. These systems were based on combining their best characteristics.

Modular exponentiation consists in finding a solution to the following eq. (1):

where n is the length of M in bits.

The M-ary algorithm can be subdivided into different sub processes as follows [5]: exponent segmentation in w windows of d bits, preprocessing and storing of every possible power of any base in a range determined by 0 to 2^d, squaring and multiplying to obtain a result. These steps are illustrated in the pseudo code scheme shown in Fig. 1.

Pseudo code for M-ary algorithm
Figure 1
Pseudo code for M-ary algorithm
Source: [4]

Modular exponentiation implies another operation, modular multiplication. This function is implemented with an algorithm called the Montgomery Method [6,12]. This process takes two cycles, one for pre-calculating a result in a space called the M-residues Space with an undesirable factor R^-n called the multiplicative inverse of R = 2^n mod M. After the first cycle, a second cycle is used to recover the desired multiplication without the R^-n factor. This method avoids computing the integer division, which is an expensive operation in hardware.

The Montgomery Method is described in the pseudo code shown in Fig. 2.

Pseudo code for Modular Multiplication using the Montgomery algorithm.
Figure 2
Pseudo code for Modular Multiplication using the Montgomery algorithm.
Source: [2]

Modular multiplication can be described as shown in Fig. 3.

Pseudo code for Modular Multiplication
Figure 3
Pseudo code for Modular Multiplication
Source: The authors.

2. Implementation

The complete system was modeled with VHDL language using basic libraries like the IEEE Standard, allowing the system to be more generic and portable among different architectures. Reprogrammable hardware is used because it allows the design of digital architectures with certain flexibility.

In a block description, modular exponentiation has four subsystems that work together including the Montgomery Multiplier, the Storage Unit, the Exponent Segmentation Unit, and the Control Unit. These blocks are shown in Fig. 4.

Modular exponentiation general system
Figure 4
Modular exponentiation general system
Source: The authors.

Sequential logic uses a Finite States Machine under the Moore scheme, where the outputs depend only on the actual state because asynchronous behaviors should be avoided due to a complex and significant quantity of states (approximately 40 for the principal control module).

Internally, the Montgomery Multiplier has another sub control unit, which synchronizes a data path and advertises to the central unit when multiplication is finished. This controller also switches sources in the multiplier when a Montgomery cycle is completed, as illustrated in Fig. 5.

Modular multiplication subsystem
Figure 5
Modular multiplication subsystem
Source: The authors.

Operands have a 12-bits length, implying the use of 12-bits and 1-bit carry units.

The segmentation unit uses a 4-to-3 multiplexer in order to fragment the exponent in fields that are used as a kind of pointer to preprocessing powers. A counter helps select which of those segments is going to be used. There is an offset of 2 addresses because X^0 and X^1 are not stored in the memory register bank. The block diagram (Fig. 6) shows the internal constitution of this unit.

Segmentation Unit
Figure 6
Segmentation Unit
Source: The authors.

Address lines are used to control a storage unit, which is composed by a set of 16 registers with 12 bits width each, and an input decoder takes the address and enables writing to one register. When reading is required, a 16-to-12 multiplexer connected to the address lines takes the register data and puts it into the output register; these lines send a Vj signal to the Montgomery Multiplier. This system allows writing and reading one register at a time, and the internal structure of the system is shown in Fig. 7.

Storage Unit
Figure 7
Storage Unit
Source: The authors.

3. Experimental results

For testing purposes, a test block is added. The test block contains the inputs pattern to the M-ary block. Each block contains a set of 48 bits, which are subdivided in 4 numbers of 12-bit length and represent the needed inputs as follows: a base (X), an exponent (Y), a modulus (M) and a correction factor (R). There are a total of 4 samples, as reflected in the use of a storage system with an attribute of 4 words - 48 bits ROM. This system starts each exponential process and senses a finish flag bit provided by the exponential block. When an exponential process finishes, a counter changes the address pointer that is controlling the ROM behavior and loads the respective data subsets in the output registers. Fig. 8 illustrates the testing structure.

Testing system
Figure 8
Testing system
Source: The authors.

A detailed scheme of the internal structure in the testing block can be seen in Fig. 9.

Test block, detailed overview
Figure 9
Test block, detailed overview
Source: The authors.

This system is for testing purposes, so the hardware requirements were not great, allowing for a low-cost system, such as the DE0 evaluation board.

Physical implementation of the previous block was carried out using Quartus II software, Signal-Tap, and a DE0 development board that has an EP3C16F484C6N FPGA from Altera Corp. Figs. 10(a) and 10(b) show the timing diagrams for some data sets.

Data set X = 1697, Y = 25, M = 2773, Inv = 566, Z = 1197
Figure 10(a)
Data set X = 1697, Y = 25, M = 2773, Inv = 566, Z = 1197
Source: The authors.

Data set X = 2000, Y = 140, M = 2901, Inv = 733, Z = 982
Figure 10(b)
Data set X = 2000, Y = 140, M = 2901, Inv = 733, Z = 982
Source: The authors.

An estimation of the merit figures was obtained using Timer Quest and Quartus II tools as follows:

With all this examples in mind, it is convenient to show another. The following example addresses the cryptographic application of the modular exponentiation, and its aim is to cipher and decipher a character that is written in an ASCII-like style.

The next steps are used to obtain the public and private keys of the RSA algorithm [3]:

1. Takes two prime numbers p and q as well as their product n:

2. Calculates Euler’s Phi function of n using:

3. Chooses a prime number e, which is less than φ(n)

4. Finds the multiplicative inverse in modulus φ(n) of e, defined by:

5. Uses an iterative algorithm in Matlab to find the multiplicative inverses, based on the following fact:

Iterating over i, where d is an integer, to find that:

Multiplies both d and e:

This proves that e is the multiplicative inverse of d in modulo φ(n).

6. Creates a private key with the set (p, q, d) = (11, 227, 33) and a public key (n, e) = (2497, 137). Now, using ASCII code, “s” is represented with 83 as the message to transport to use the cipher and decipher steps.

7. Cipher step consists of using the modular exponentiation to solve the next equation:

Fig. 11 shows cipher results.

Ciphering “s” character (plain text)
Figure 11
Ciphering “s” character (plain text)
Source: The authors

8. Finally, the decipher step consists of using modular exponentiation to solve the next equation:

Fig. 12 shows the decipher results.

Deciphering the cipher equivalent of “s” character
Figure 12
Deciphering the cipher equivalent of “s” character
Source: The authors.

4. Conclusions

Bibliographic

[1] De Macedo-Mourelle, L. and Nedjah, N., Fast reconfigurable hardware for the M-ary modular exponentiation. In Digital System Design, DSD 2004. Euromicro Symposium on System Design, pp. 516-523, IEEE. August, 2004.

[2] Harris, D., Krishnamurthy, R., Anders, M., Mathew, S. and Hsu, S., An improved unified scalable radix-2 Montgomery multiplier. In Computer Arithmetic, 2005. ARITH-17 2005. 17th IEEE Symposium on IEEE. June, 2005, pp. 172-178.

[3] Rivest, R.L., Shamir, A. and Adleman, L., A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM, 21(2), pp. 120-126, 1978.

[4] Bernal, A., Conception et étude d'une architecture numérique de haute performance pour le calcul de la fonction exponentielle modulaire Dr. dissertation, Institut National Polytechnique de Grenoble-INPG, France, 1999.

[5] Chang, C.H., Molahosseini, A.S., Zarandi, A.A.E. and Tay, T.F., Residue number systems: A new paradigm to datapath optimization for low-power and high-performance digital signal processing applications. IEEE circuits and systems magazine, 15(4), pp. 26-44, 2015.

[6] Montgomery, P.L., Modular multiplication without trial division. Mathematics of Computation, 44(170), pp. 519-521, 1985.

[7] Nedjah, N. and de Macedo-Mourelle, L., Four hardware implementations for the m-ary modular exponentiation. In Information Technology: New Generations, 2006. ITNG 2006. Third International Conference on 2006 IEEE. April, 2006, pp. 210-215.

[8] Di Claudio, E.D., Orlandi, G. and Piazza, F., Fast RNS DSP algorithms implemented with binary arithmetic. In Acoustics, Speech, and Signal Processing, 1990. ICASSP-90., 1990 International Conference on IEEE. April, 1990, pp. 1531-1534.

[9] Ramírez, J. and Meyer-Baese, U., High performance, reduced complexity programmable RNS-FPL merged FIR filters. Electronics Letters, 38(4), pp. 199-200, 2002.

[10] Egecioglu, O. and Koç, C.K., Fast modular exponentiation. Communication, Control and Signal Processing, 1, pp. 188-194, 1990.

[11] Nedjah, N., Mourelle, L.M., Santana, M. and Raposo, S., Massively parallel modular exponentiation method and its implementation in software and hardware for high-performance cryptographic systems. IET Computers and Digital Techniques, 6(5), pp. 290-301, 2012.

[12] Menezes, A.J., Van Oorschot, P.C. and Vanstone, S.A., Handbook of applied cryptography, Chap 14 Efficient Implementation. CRC press. Retrieved from: http://cacr.uwaterloo.ca/hac/(1996).

Notes

How to cite: Arenas-Hoyos, S.A. and Bernal-Noreña, A., Performance evaluation of M-ary algorithm using reprogrammable hardware DYNA, 84(203), pp. 75-79, December, 2017.
HTML generated from XML JATS4R by