Artículos

SYSTEMATIC MAPPING STUDY ON FAST FACTORIZATION USING PARALLEL OR DISTRIBUTED PROCESSING APPLIED TO CRYPTANALYSIS

Estudio de mapeo sistemático sobre factorización rápida utilizando procesamiento paralelo o distribuido aplicado al criptoanálisis

Jhon-Alejandro Melo
Universidad del Cauca, Colombia
Siler Amador-Donado
Universidad del Cauca, Colombia
César-Jesús Pardo-Calvache
Universidad del Cauca, Colombia

SYSTEMATIC MAPPING STUDY ON FAST FACTORIZATION USING PARALLEL OR DISTRIBUTED PROCESSING APPLIED TO CRYPTANALYSIS

Revista Facultad de Ingeniería, vol. 33, no. 69, e17935, 2024

Universidad Pedagógica y Tecnológica de Colombia

Received: 20 July 2024

Accepted: 25 September 2024

ABSTRACT: Cryptography is one of the branches of research within computer security and cybersecurity, it provides security to the stored information and travels between devices. Cryptanalysis, in turn, studies the weaknesses within cryptography, thus allowing improving constants about cryptographic algorithms. Currently there are several algorithms that allow to keep information secure, one of them is RSA (Rivest, Shamir and Adleman), which is used in digital certificates implemented in some communication protocols. However, there is no algorithm capable of deciphering that type of algorithms yet; therefore, the objective of this study is to support other researchers in the area of cryptanalysis. This rapid factorization study using parallel or distributed processing contains 6 research questions that allow us to deepen the use of this type of processing to speed up the execution times of the algorithms. The results made it possible to show that by using this type of processing, factoring time can be reduced.

Keywords: Cryptanalysis, distributed processing, factoring, parallel processing.

RESUMEN: La criptografía es una de las ramas de investigación dentro de la seguridad informática y la ciberseguridad, proporciona seguridad a la información almacenada y viaja entre dispositivos. El criptoanálisis, a su vez, estudia las debilidades dentro de la criptografía, permitiendo así mejorar las constantes en los algoritmos criptográficos. Actualmente existen varios algoritmos que permiten mantener la información segura, uno de ellos es RSA (Rivest, Shamir y Adleman), que se utiliza en certificados digitales implementados en algunos protocolos de comunicación. Sin embargo, aún no existe un algoritmo capaz de descifrar este otro tipo de algoritmos; por lo tanto, el objetivo de este estudio es apoyar a otros investigadores en el área del criptoanálisis. Este estudio de factorización rápida mediante procesamiento paralelo o distribuido contiene 6 preguntas de investigación que nos permiten profundizar en el uso de este tipo de procesamiento para acelerar los tiempos de ejecución de los algoritmos. Los resultados permitieron demostrar que, mediante el uso de este tipo de procesamiento, se puede reducir el tiempo de factorización.

Palabras clave: Criptoanálisis, factorización, procesamiento distribuido, procesamiento paralelo.

1. INTRODUCTION

Cryptography is crucial for information security, from the Caesar cipher-named after Julius Caesar- to modern cryptographic algorithms like RSA. These systems have evolved to protect data storage, processing, and transmission over networks. Today, privacy and confidentiality in online operations depend on encryption algorithms; for example, RSA uses the HyperText Transfer Protocol Secure (HTTPS) to secure data transmission, whether personal or financial. A 2022 study by Entrust showed that 62% of companies use encryption strategies, reflecting a 19% increase since 2018, and highlighting the importance of continued research on these algorithms [1].

Cryptanalysis studies methods for deciphering encrypted messages without having the key. The factorization of semi-prime numbers is a vital approach to compromising the security of public-key algorithms like RSA. The security of RSA relies on the difficulty of factoring very large composite numbers into their prime factors. For example, if asked to factor the number 15, one can easily identify 3 and 5 as its prime factors. However, for a number as large as RSA-2048, for example https://acortar.link/c48jLQ, which has 617 digits, finding its prime factors is extremely difficult, even for a non-quantum supercomputer. Advances in factorization techniques, such as the Multiple Polynomial Quadratic Sieve (MPQS) and the General Number Field Sieve (GNFS), require longer encryption keys to maintain security.

This study analyzes cryptanalysis in the factorization of semi-prime numbers through a systematic mapping of the literature. Factorization methods for numbers with more than 10 digits were identified, including the use of artificial intelligence. Although parallel processing has improved execution times, the complexity remains high. Therefore, it is important to explore how emerging technologies can assist. The document is organized as follows: methodology, results and discussions, and conclusions.

2. METHODOLOGY

2.1. Definition of Search Objectives and Research Questions

A systematic mapping is a process that allows the collection, categorization, and structuring of existing information on a topic of research interest [2]. For the design of this systematic mapping, the protocol proposed by Petersen et al. in [3] was used as a reference along with the guidelines presented by Kitchenham and Charters in [4] and Budgen et al. in [5]. The following activities were carried out: (i) Apply a GQM approach; (ii) Define a search and selection strategy; (iii) Conduct a review; (iv) Generate a review report. Figure 1 presents a more detailed diagram of the process.

Stages of the process for systematic mapping. Suggested by [6].
Figure 1
Stages of the process for systematic mapping. Suggested by [6].

To effectively direct the present systematic mapping, the following search objectives (OB) have been defined:

OB1. Identify the types of solutions proposed in the selected primary studies and group them to determine the most relevant ones.

OB2. Analyze the main contributions to rapid factorization, considering the types of solutions proposed to identify those with the greatest impact.

OB3. Support academics, cryptanalysts, and other interested parties in researching rapid factorization by presenting the challenges encountered in the selected primary studies.

Based on the search objectives, three (3) research questions (P) have been defined, as presented in Table 1. Each question is mapped to the proposed objectives, along with its respective motivation.

For research question P1, the types of solutions described in Table 2 were defined.

Table 1
Research questions.
Research questions.

Table 2
Types of solutions.
Types of solutions.

2.2. Search strategy

The following activities were carried out to search for articles: (i) identification of key terms through document review and consultation with cryptography experts; (ii) combination of terms using "AND" and "OR;" (iii) refinement of the search string; (iv) adaptation of the string to different search engines; and (v) definition of the time frame and inclusion/exclusion criteria. The refined search string was ("factoring large numbers" OR "factoring large integers") AND ("fast" OR "speed up") AND ("cryptography" OR "cryptanalysis") AND ("sieve" OR "sieving") AND ("parallel" OR "distributed" OR "optimized"). A period from January 2010 to December 2022 was defined. Searches were conducted in Google Scholar, ACM Digital Library, IEEE Digital Library, ScienceDirect, Scopus, and Springer Link, adapting the string to each database. Details of the adaptations are presented in Table 3, and findings are shown in Table 8, including discarded articles and those with no available access.

2.3. Inclusion and exclusion criteria

For the selection of relevant articles, a two-level review was conducted: (Level 1) review of the title; (Level 2) review of the abstract, introduction, and conclusions. To obtain relevant articles, only those studies that met at least one of the inclusion criteria described in Table 4 and did not meet any of the exclusion criteria listed in Table 5 were selected. Subsequently, to achieve the objectives and identify the primary studies, a (Level 3) review of the full text was made.

2.4. Quality assessment criteria

To measure the quality of the selected primary studies and determine their relevance to rapid factorization using parallel or distributed processing, a questionnaire containing twelve (12) questions was created with a scoring system of three values (-1, 0, +1), as described in Table 6. Each article can receive a quality score ranging from -12 to +12. It is important to clarify that a low-quality score does not imply exclusion but rather helps in ranking articles by relevance for future research.

2.5. Data extraction

The following link https://acortar.link/UPrRAU presents the summary sheet that ensured a uniform data extraction strategy for all articles, making it easier to classify the information. This sheet summarizes the most important aspects to be considered for each article.

Table 3
Adaptation of the basic search string in the databases.
Adaptation of the basic search string in the databases.

Table 4
Inclusion criteria.
Inclusion criteria.

Table 5
Exclusion criteria.
Exclusion criteria.

Table 6
Criteria for evaluating the quality of primary studies.
Criteria for evaluating the quality of primary studies.

3. RESULTS AND DISCUSSIONS

It is important to clarify that the search and selection of articles began by entering the search string in Google Scholar, which was also the data source that yielded the most results. For other sources, most of the articles found were not selected due to EC2. As a result, as shown in Table 7, the total number of selected articles that met at least one IC and did not meet any EC was 15, except for 2 articles selected through backward snowballing. A total of 309 articles were excluded. The compendium of primary articles resulting from the search is presented in Table 8 along with their references, so that readers can explore them further. The results of the quality assessment are presented in Table 9, highlighting that only 2 articles (A6, A13) achieved the highest rating with 9 points each. It is also noteworthy that most articles scored above 1 point, with 5 articles scoring 5 points.

Table 7
Search results.
Search results.

Table 8
Compendium of primary articles resulting from the search.
Compendium of primary articles resulting from the search.

Table 9
Quality rating for each article.
Quality rating for each article.

3.1. P1: What types of solutions have been proposed?

As shown in Table 10, 45% (10 articles) of the studies propose the use of parallel processing to accelerate the factorization process. Among these articles, A5, A9, and A10 use code optimization, while A6 employs low-level instructions to enhance the sieving performance. Articles A2, A3, A11, A13, A16, and A17 use parallel processing in one or more common steps of algorithms such as MPQS, GNFS, and Fermat. 18% (4 articles) of the studies use code optimization to improve the methods or functions of each factorization algorithm. Studies A5, A9, A10, and A14 use both code optimization and parallel processing or low-level instructions. 14% (3 articles) of the studies use low-level instructions that provide faster response times between software and hardware, thus accelerating factorization processes. A6 and A14 also employ parallel processing and code optimization, while A12 focuses solely on low-level instructions. Regarding artificial intelligence (AI), which represents 9% (2 articles), A4 and A15 use neuromorphic computing and genetic algorithms, proposing a different type of solution where improvements to algorithms are commonly found. Similarly, performance evaluations account for 9% (2 articles). A1 and A14 evaluate the performance of algorithms or tools for factoring large numbers. It is worth noting that all articles include a section on evaluation or experiments for their solutions. Finally, distributed processing represents 5% (1 article). Shende et al. in A14 apply distributed processing using mobile agents across several servers to divide processing among multiple machines, thereby reducing factorization times.

Table 10
Classification of primary articles by type of solution.
Classification of primary articles by type of solution.

Table 10 presents the analysis and summary of the different types of solutions found in the reviewed literature on large number factorization. These types of solutions include parallel processing, artificial intelligence, code optimization, low-level instructions, performance evaluation, and distributed processing. Each type of solution is listed along with the corresponding articles that use it and the percentage represented in the total number of reviewed articles.

Table 10 presents the analysis and summary of the different types of solutions found in the reviewed literature on large number factorization. These types of solutions include parallel processing, artificial intelligence, code optimization, low-level instructions, performance evaluation, and distributed processing. Each type of solution is listed along with the corresponding articles that use it and the percentage represented in the total number of reviewed articles.

3.2. P2: What results have been achieved with the proposed solutions?

This study identified 17 articles with various proposals focused on the factorization of large numbers. Primary studies are categorized into different types of solutions, with parallel processing showing the highest percentage (45%). This technique is commonly used in factorization algorithms. This section provides a more detailed exploration of the types of solutions found.

In the parallel processing solutions, A2, A5, and A13 focus on accelerating sparse matrix solutions, a key component in some factorization algorithms. These studies use parallel processing and code optimization to speed up cryptanalysis. A3 proposes a parallel implementation of the Fermat factorization method, demonstrating its significantly faster performance compared to sequential approaches. A6 and A14, in addition to parallel processing, employ low-level instructions that execute more rapidly, enhancing screening in algorithms such as the Quadratic Sieve (QS), Multiple Polynomial Quadratic Sieve (MPQS), and Number Field Sieve (NFS).

A9 compares the sequential and parallel versions of the Self-Initialization Quadratic Sieve (SIQS) algorithm, showing superior performance for the parallel version. A10 introduces a new parallel module for collecting relations to find B-smooth numbers for the screening process in algorithms. A11 implements parallel screening in the NFS algorithm, evaluating its performance compared to the sequential version, highlighting the advantages of the parallel algorithm. A16 proposes parallelizing the MPQS algorithm using parallel symbolic computations, demonstrating reduced factorization times with more processors. A17 presents a parallel implementation of the NFS algorithm on a SUN cluster, achieving favorable execution times. Similarly, A8 uses mobile agents to distribute QS algorithm processing, showing that using three machines reduces factorization time.

In a different approach, A4 and A15 focus on artificial intelligence. A4 uses neuromorphic computing to propose a neuromorphic screening method for identifying B-smooth numbers, while A15 introduces a genetic algorithm for integer factorization, successfully factoring numbers up to 8 digits. A12 employs low-level instructions to accelerate screening in the MPQS and NFS algorithms, achieving speedups of 15% to 40%. Finally, A1 and A7 are dedicated to performance evaluation, with A1 focusing on QS and Pollard's rho (PR) algorithms, and A7 on factorization tools like MSieve, GGNSF, and CADO-NFS. Although other studies also include evaluations and experiments, these two specifically concentrate on performance assessment.

3.3. P3: What is the main challenge of factoring large numbers?

There are various challenges in researching fast factorization, with the selected studies revealing interrelated issues concerning computational complexity and algorithm optimization. The increase in computational effort required as numbers grow becomes a key point, exacerbated by the demand for faster sieving. In the context of parallel implementation, obstacles include synchronization between processors, memory optimization, and the efficient selection of algorithms and parameters. Proper memory allocation, managing calculation complexity, effective synchronization, and process efficiency are critical factors to address. Table 11 details each of these challenges in the selected studies.

3.4. Main Observations

Systematic mapping allowed for the identification of related works on fast factorization using parallel or distributed processing. After analyzing the results, the following observations are made:

4. CONCLUSIONS

In this study, which covers 17 research articles dedicated to the factorization of large numbers using various methods and/or techniques to address the computational challenge of performing time-costly operations, it was evident that most approaches focus on the implementation of parallel processing techniques, representing 45% of the studies. This approach particularly leverages the performance of various processors to accelerate processes such as sieving or solving sparse matrices, which are critical steps in cryptanalysis algorithms like QS, MPQS, and NFS. This approach showed significant improvements in algorithm performance compared to their sequential implementations.

The study provides a diverse perspective on the strategies used by researchers in the field of cryptanalysis, such as code optimization, low-level instructions, AI techniques, and distributed processing. By evaluating the performance of various algorithms and factorization tools, these approaches enrich the research field and should continue to be explored as alternatives to the majority of proposed solutions for accelerating factorization processes. Most of these approaches demonstrated performance improvements that help reduce the factorization times of some algorithms.

Table 11
Challenges in factorization
Challenges in factorization

There are several challenges surrounding research on fast factorization of semi-prime numbers. One of the most significant is the increase in computational effort required as the numbers to be factored grow in bit length. This complexity is further intensified by the volume of operations and results that need to be stored in memory. One of the most relevant challenges in the context of parallel processing is synchronization between processors and memory management, as inadequate control or optimization can impact the factorization process.

Finally, it is worth noting that despite recent advances in computing and mathematics, it is still not possible to factorize numbers as large as a 2048-bit RSA key within a reasonable time frame. The largest number factored to date is RSA-250 in February 2020 [24], an 829-bit number that would have taken about 2700 years to factorize with a single core. However, by utilizing parallel and distributed processing, it was factorized in a few months using multiple machines around the world. Given that this is an exponential problem, researchers in [24] estimate that a 1024-bit number would be 200 times harder to factor. Studies in quantum technology, such as [25] suggest that with a quantum computer, factorization could be achieved in polynomial time, leading to significantly shorter times compared to current algorithms, which have exponential complexity. This poses a challenge for current information security, prompting the development of quantum cryptographic systems [26].

ACKNOWLEDGMENTS

To the University of Cauca, especially to the SEC (Security, Encryption & Cybersecurity) research group, the GTI research group, and the Altenua-Matdis research group for the support provided in the development of this work. Professors Siler Amador Donado and César Pardo are grateful for the contribution of the Universidad del Cauca, where they currently work as a full professors.

REFERENCES

Ponemon Institute, Global Encryption Trends Study: The data is in the cloud, but who's in control? 2022. Available: https://www.entrust.com/es/resources/reports/global-encryption-trends-study

E. Suescún-Monsalve, J.-C. Sampaio-do-Prado-Leite, C.-J. Pardo-Calvache, "Semi-Automatic Mapping Technique Using Snowballing to Support Massive Literature Searches in Software Engineering," Revista Facultad de Ingeniería, vol. 31, no. 60, e14189, May 2022, https://doi.org/10.19053/01211129.v31.n60.2022.14189

K. Petersen, H. Flensburg, R. Feldt, M. Mattsson, S. Mujtaba, Systematic Mapping Studies in Software Engineering, 2008.

B. Kitchenham, S. M. Charters, Guidelines for performing Systematic Literature Reviews in Software Engineering, 2007.

D. Budgen, M. Turner, P. Brereton, B. Kitchenham, Using Mapping Studies in Software Engineering" 2008. https://www.ebse.org.uk

E. Nicolás, P. Paredes, C. E. Orozco, C. Pardo, "Análisis del estado del arte acerca de la (in)felicidad en las comunidades de desarrollo de software ágil," RISTI - Revista Iberica de Sistemas e Tecnologias de Informacao, vol. 57, pp. 425-437, 2023.

Z. Li and W. Gasarch, An Empirical Comparison of the Quadratic Sieve Factoring Algorithm and the Pollard Rho Factoring Algorithm, 2021. https://doi.org/10.48550/arXiv.2111.02967

C. Bouillaguet, P. Zimmermann, Parallel Structured Gaussian Elimination for the Number Field Sieve, 2021.

H. M. Bahig, H. M. Bahig, Y. Kotb, "Fermat factorization using a multi-core system," International Journal of Advanced Computer Science and Applications, vol. 11, no. 4, pp. 323-330, 2020. https://doi.org/10.14569/IJACSA.2020.0110444

J. V. Monaco, M. M. Vindiola, "Factoring Integers with a Brain-Inspired Computer," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 3, pp. 1051-1062, 2018. https://doi.org/10.1109/TCSI.2017.2771533

L. T. Yang, Y. Huang, J. Feng, Q. Pan, C. Zhu, "An improved parallel block Lanczos algorithm over GF(2) for integer factorization," Information Sciences, vol. 379, no. 2, pp. 257-273, 2017. https://doi.org/10.1016/j.ins.2016.09.052

B. Sengupta, A. Das, "Use of SIMD-based data parallelism to speed up sieving in integerfactoring algorithms," Appl Math Comput, vol. 293, pp. 204-217, 2017. https://doi.org/10.1016/j.amc.2016.08.019

E. J. Vuicik, D. Sesok, S. Ramanauskaitè, "Efficiency of RSA Key Factorization by Open-Source Libraries and Distributed System Architecture," Baltic Journal of Modern Computing, vol. 5, no. 3, pp. 269-274, 2017. https://doi.org/10.22364/bjmc.2017.5.3.02

V. Shende, G. Sudi, M. Kulkarni, "Fast cryptanalysis of RSA encrypted data using a combination of mathematical and brute force attack in distributed computing environment," in IEEEInternational Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI), 2017, pp. 2446-2449. https://doi.org/10.1109/ICPCSI.2017.8392156

D. Breitenbacher, I. Homoliak, J. Jaros, P. Hanacek, "Impact of optimization and parallelism on factorization speed of SIQS," in 20thWorld Multi-Conference on Systemics, Cybernetics and Informatics, 2016, pp. 55-62.

H. Yu, G. Bai, "Strategy of Relations Collection in Factoring RSA Modulus," Lecture Notes in Computer Science, vol. 9543, pp. 199-211, 2016. https://doi.org/10.1007/978-3-319-29814-6_16

S. Daoud, I. Gad, "A parallel line sieve for the GNFS Algorithm," International Journal of Advanced Computer Science and Applications, vol. 5, no. 7, pp. 178-185, 2014. https://doi.org/10.14569/ijacsa.2014.050727

B. Sengupta, A. Das, "SIMD-based implementations of sieving in integer-factoring algorithms," Lecture Notes in Computer Science, vol. 8204, pp. 40-55, 2013. https://doi.org/10.1007/978-3-642-41224-0_4

L. T. Yang, L. Xu, S. S. Yeo, S. Hussain, "An integrated parallel GNFS algorithm for integer factorization based on Linbox Montgomery block Lanczos method over GF(2)," Computers and Mathematics with Applications, vol. 60, no. 2, pp. 338-346, 2010. https://doi.org/10.1016/j.camwa.2010.01.020

U. Meyer-Bäse, G. Botella, E. Castillo, A. García, "Nios II hardware acceleration of the epsilon quadratic sieve algorithm," Proceedings of SPIE - The International Society for Optical Engineering, vol. 7703, e77030M, 2010. https://doi.org/10.1117/12.849883

R. V. Yampolskiy, "Application of bio-inspired algorithm to the problem of integer factorization," International Journal of Bio-Inspired Computation, vol. 2, no. 2, pp. 115-123, 2010. https://doi.org/10.1504/IJBIC.2010.032127

G. Macariu, D. Petcu, "Parallel multiple polynomial quadratic sieves on multi-core architectures," in 9thInternational Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 59-65, 2007. https://doi.org/10.1109/SYNASC.2007.21

L. T. Yang, L. Xu, M. Lin, "Integer factorization by a parallel GNFS algorithm for public key cryptosystems," Lecture Notes in Computer Science, vol. 3820, pp. 683-695, 2005. https://doi.org/10.1007/11599555_65

F. Boudot et al., "State of the Art in Integer Factoring and Breaking Public-Key Cryptography," IEEE Security and Privacy Magazine, vol. 2022, no. 2, e314918. https://doi.org/10.1109/MSEC.2022.3141918Ï

R. Young, P. Birch, C. Chatwin, A simplification of the Shor quantum factorization algorithm employing a quantum Hadamard transform, 2018. https://doi.org/10.1117/12.2309468

S. K. Sehgal, R. Gupta, "Quantum Cryptography and Quantum Key," in International Conference on Industrial Electronics Research and Applications, 2021, pp. 1-5. https://doi.org/10.1109/ICIERA53202.2021.9726722

Notes

How to cite: J-A. Melo, S. Amador-Donado, C-J. Pardo-Calvache, "Systematic Mapping Study on Fast Factorization Using Parallel or Distributed Processing Applied to Cryptanalysis", Revista Facultad de Ingeniería, vol. 33, no. 69, e17935, 2024. https://doi.org/10.19053/01211129.v33.n69.2024.17935
AUTHOR CONTRIBUTIONS Jhon-Alejandro Melo: Conceptualization, data curation, formal analysis, research, software, visualization and writing - original draft. Siler Amador-Donado: Conceptualization, project administration, methodology, supervision and writing - review and editing. César-Jesús Pardo-Calvache: Conceptualization, project administration and writing - review and editing.

Author notes

Esta edición se financió con recursos del Patrimonio Autónomo Fondo Nacional de Financiamiento para la Ciencia, la Tecnología y la Innovación, Francisco José de Caldas, Minciencias
HTML generated from XML JATS by