Abstract: The industrial internet of things (IIoT) has grown in recent years, which makes it possible to publicize recent technological innovations and be able to integrate them with each other, such as smart cities, among other applications such as health, education, transit and others, but at the same time there is a problem that is security, due to the fact that incidents related to IIoT have been registered against data networks, for this reason it is necessary to generate intelligent solutions in cybersecurity, which allow to give a satisfactory solution. The objective of this work was to propose an intelligence technique adaptable to a cybersecurity framework with the ability to solve cybersecurity problems in networks of IIoT devices, for the development of which the research-action methodology (I-A), which consists of merging theory with practice in such a way that the researcher can generate accurate conclusions about the practices carried out. In this sense, with this methodology it is intended to provide solutions to specific problems in a given situation. Based on the above, a systematic literature review of the different artificial intelligence techniques was carried out, to finally determine the most appropriate ones and proceed to carry out the respective validations until the appropriate one was selected. Where it was found that there is a great variety of intelligence techniques such as Deep Learning (Deep Learning), who obtained a very high score in the characterization that was carried out due to its great possibilities when integrating the algorithm into the field of cybersecurity, it was identified that they are very poorly characterized; however, in the initial research that was done, the result was how to work with this technology and how to adapt it to cybersecurity. There are different ways to analyze and secure data on the network, one of these is learning techniques, in this research several techniques were identified that with their respective algorithms provided the basis for adaptability with a framework related to IIoT technologies.
Keywords: Cybersecurity, Industrial Internet of Things-IIoT, Artificial Intelligence, Intrusion Detection System, Security Models.
Resumen: El internet industrial de las cosas (IIoT) ha tenido un crecimiento en los últimos años, que permite dar a conocer las recientes innovaciones tecnológicas y poder integrarlas entre sí, como lo son las ciudades inteligentes, entre otras aplicaciones como la salud, educación, tránsito y otras más, pero a su vez se cuenta con una problemática que es la seguridad, debido a que se han registrado incidentes relacionados con IIoT frente a las redes de datos, por ello se hace necesario generar soluciones inteligentes en ciberseguridad, que permitan dar una solución satisfactoria. El objetivo de este trabajo fue proponer una técnica de inteligencia adaptable a un framework de ciberseguridad con la capacidad de solucionar los problemas de ciberseguridad en las redes de los dispositivos IIoT, para el desarrollo de éste se hace uso de la metodología investigación-acción (I-A), la cual consiste en fusionar la teoría con la práctica de tal forma que el investigador pueda generar conclusiones acertadas sobre las prácticas realizadas. En este mismo sentido, con dicha metodología se pretende dar soluciones a problemas concretos en una situación determinada. A partir de lo anterior, se realizó una revisión sistemática de literatura de las diferentes técnicas de inteligencia artificial, para finalmente determinar las más adecuadas y proceder hacer las respectivas validaciones hasta seleccionar la apropiada. Donde se encontró que existen una gran variedad de técnicas de inteligencia como Deep Learning (aprendizaje profundo), quien obtuvo un puntaje muy alto en la caracterización que se realizó por sus grandes posibilidades a la hora de integrar el algoritmo al ámbito de la ciberseguridad, se identificó que se encuentran muy poco caracterizados; sin embargo, en la investigación inicial que se hizo, se obtuvo como resultado el cómo trabajar con esta tecnología y cómo poderla adaptar a la ciberseguridad. Existen diferentes formas de analizar y dar seguridad a los datos en la red, una de esas son las técnicas de aprendizaje, en esta investigación se identificaron varias técnicas que con sus respectivos algoritmos proporcionaron las bases para la adaptabilidad con un framework relacionado con tecnologías IIoT.
Palabras clave: Ciberseguridad, Internet industrial de las cosas-IIoT, Inteligencia artificial, algoritmos de ML, Modelos de seguridad.
Artículo
An adaptable Intelligence Algorithm to a Cybersecurity Framework for IIOT
Un algoritmo de inteligencia adaptable a un marco de ciberseguridad para IIOT
Received: 22 November 2021
Accepted: 15 January 2022
It is well known that network traffic is a matter of study nowadays, because threats are becoming more and more frequent in information systems and data networks, especially those that are exploited through protocols and ports. such as: HTTP or HTTPS (1, this is how the administrators or engineers in their respective organizations are in search of solutions in an efficient and secure way, that guarantee the integrity of the data that circulates in the networks (1.
It is important to mention that currently cybersecurity frameworks are characterized by being static, they are not capable of making decisions or learning from an incident, so this work seeks to propose an intelligence technique adaptable to a cybersecurity framework for IIoT that it is able to prevent and react to any threat without the need to have it registered in its signature bank or intruder detector rules, since threats vary daily, thus leaving systems outdated and vulnerable 2. Due to this, a study was carried out about learning techniques in conjunction with the inclusion of IIoT technologies, in order to integrate into a system that was capable of intelligently to learn under unsupervised algorithms that are immersed in the selected technique and thus demonstrate optimal results.
The development of this article is presented below in the following sessions: the methodology section where some conceptual aspects are addressed, the background and the methods sections are used to solve the problem raised, the final section results and discussion show the analysis of what was obtained and finally the conclusions where the appraisals and contributions of the work carried out are presented.
Within the development of this work, certain concepts and references that are important were taken into account, and they are listed below:
- Cybersecurity Framework: The cybersecurity framework is a predefined set of policies and procedures by leading cybersecurity organizations (3, cybersecurity frameworks are built and documented to improve cybersecurity strategies in an organization or company.
- IDS: (Intruder Detection System) is a device or application that monitors a network or systems for malicious activity or policy violations (4, the intrusion detection system is constantly analyzing the traffic, it has the ability to identify anomalies or security violations based on patterns and heuristics. Immediately detects an attack, notifies an administrator; likewise, it collects information on each anomaly or attack detected.
- IIoT: With the arrival of the Internet of Things, the industry realized that this technology could be used in its operations, for which the IIoT (Industrial Internet of Things) arose. It refers to the tight integration of computing, networks, and physical objects for industry, in which embedded devices are networked to detect, monitor, andcontrol the physical world to promote business and manufacturing progress (5, IIoT is changing the world of industry in terms of automation in its manufacturing processes, since devices or machines can connect and transfer data between themselves.
Among the most significant references for carrying out the project, the following can be highlighted:
In the first job 6, it combines the management of supervision and artificial intelligence based on ML-Models divided into study of network patterns, together with anomalies-intrusion detection based on IoT systems. Especially with attacks of the DoS (Denial of Service) type using the data mining approach, which is hugely popular in detecting attacks with high performance and low price. In relation to the anomaly-intrusion detection system based on IoT, the scenario has been evaluated in an intelligent environment using SVM (Support Vector Machines).
In the next article 7, the study "Intrusion Detection System using Artificial Intelligence" is described, which is characterized by incorporating an intelligent factor based on neural networks oriented to the detection of the specific problem of port scanning, using the IDS (Intrusion Detection System - SnortTM open source Intrusion Detection System. This article 8, involves the design of a novel intrusion detection system, the use and evaluation of its study model. The new system consists of a collection module, a data management module, a study module, and a response module. For the use of the study module, it involved the implementation of a neural network for intrusion detection.
Each of the aforementioned works contributed to the selection and study of the appropriate intelligence and learning technique capable of analyzing network traffic.
Regarding the development of the activities, the following results were obtained, which are shown below:
Activity 1. Carry out a review of the deep learning techniques used in the articles found and classified as primary.
It is important to mention that this process was carried out through a systematic review of the literature based on (9, where 201 studies were initially identified in the bibliographic database engines, these from now on are considered as found. Based on the inclusion and exclusion criteria that were determined, it was possible to eliminate redundant studies, until reaching 191 studies that were considered as not repeated. The inclusion criteria that was based on an analysis of the title, abstract and keywords of the articles obtained in the search, where it was taken into account that they integrate blockchain technologies, intrusion detection, framework, deep learning techniques focusing on the field of the IoT(Internet of things); that is, if the article mentions the use of any model, process, framework or methodology for the security of IoT devices through deep learning techniques or methods for the analysis of malicious traffic. The foregoing allowed obtaining 134 studies that were considered relevant, by reading the title, abstract and keywords. The 134 studies were read completely and with the exclusion criteria whose function is to exclude articles that had the following aspects: the study presents a process or methodology for the security of IoT devices, through blockchain techniques or contains deep learning methods for the identification of malicious traffic, but it does not present enough information on its use or application, with this it was possible to obtain 67 primary studies, as can be seen in Table 1
These articles provided the basis to characterize and select the learning technique, there were 20 of the 67 articles classified as primary, these are listed in Table 2, where the contribution for which it was taken as a reference for its contribution in this is identified. job.
Taking into account the different learning techniques identified in the previous table, the most appropriate and used to perform traffic analysis were found and they are the ones that can provide a supervised or unsupervised section, given that in the process of capture, processing and transformation of data it is seen the need to do it this way; likewise, it is articulated to the architecture of the proposed framework, where it allows the use of these types of algorithms. Next, the different learning techniques found for a model of detection, treatment and execution of results to be used in components identified in the currently known tools are listed 29.
BIDIRECTIONAL(BiLSTM): The bidirectional short-term and long-term memory algorithm presents an extended version of RNs (Neural Networks), which has the inability to learn contextual information for a long time, this mainly caused by the leakage gradient problem. While LSTM employs the idea for related unit gates, which overcomes the problem of gradients and therefore allows information to be preserved for longer periods and thus kept for analysis. Bidirectional RNs handle forward and backward input sequences using two different hidden layers (30.
SWARM (PSO): Particle Swarm Optimization (PSO) is a population-based EC algorithm, which can be used to solve optimization problems without domain knowledge. The population is made up of a number of particles. Each of them represents a candidate. Find the best solution by updating the velocity and the particle vector according to the equations. In one of the research papers like GetNet an encoding strategy of using a fixed length binary string to represent CNN architectures was proposed (31.
DEEP LEARNING OF MLP: The algorithm of an MLP (multilayer perceptron) can be interpreted as an extension of the regression algorithm where first the input is transformed using a non-linear transformation, with the purpose of projecting the input data to a space that is linearly separable. It is an algorithm that uses many layers to comply with the philosophy of deep learning, one of them and the one that is mentioned the most is the hidden or intermediate one that works as a universal approximation (32.
RESTRICTED BOLTZMANN MACHINES (RBM): It is a neural network that applies to Deep learning a probability distribution on its set of inputs. RBMs have been found applications in (dimensionality reduction, classification, collaborative filtering, feature learning, and mechanical modeling, among others). RBMs are a variant of Boltzmann machines that with the restriction of their neurons forms a bipartite graph, with nodes commonly called visible and hidden respectively. It contains restrictions between algorithms that are used in the subject that interests us, such as deep learning (33.
MULTILAYER PERCEPTRON (MLP): It is a neural network formed by multiple layers, which makes it a good alternative to solve problems that are not linearly separable, but it can be seen as a limitation when applying Deep learning, for this reason it is called a simple perceptron. In the first case, each output of a neuron in layer "i" is the input of all the neurons of layer "i + 1", while in the second, each neuron of layer "i" is an input of a series of neurons. (region) of layer "i + 1" (34.
From the list of aforementioned techniques, a pre-selection was made, which was generated in relation to a characterization of each technique, in order to select the best at the level of behavior compared to cybersecurity frameworks, according to how it will the algorithm works based on the data collected by IDS technology that will provide the data after analyzing the network traffic, in order to generate the rules intelligently and as accurately as possible (35. For this, it was necessary to follow the guidelines proposed by the algorithms that will be used such as: Scikit-learn, which is one of the best libraries known to program intelligently, due to its large number of algorithms and associated processes, which at the when it is put into operation, it is very easy to use and execute. Kmeans clustering will be the algorithm that will facilitate finding the anomalous information in the dataset. To achieve this, it is necessary to instantiate 4 initial values called “k” values, the columns that will be used in the dataset for the “k” value will be: IP destination, port, time and finally the type of alert. It should be noted that columns of the dataset can be selected randomly, but it is advisable to adapt the algorithm to the project in question, as shown in Table 3.
It can be seen in Table 3, the 4 necessary characteristics that must be taken into account in the selection of the intelligent technique, for the generation of the rule through the intruder detector (36, it is necessary that when classifying and giving some values to the characteristics the scale that was taken was: 5 as the most important, 3 moderately and 1 not important, in this way the comparative table can be read to start in the capturing the characteristics required by the algorithm for automatic rule creation (37.
Next, a review of the possible Deep learning algorithms that meet the requirements identified in Table 3 and the framework architecture was carried out, based on the list of articles that were taken as primary, which was extracted from the list of techniques of learning that have already been defined, which can be evidenced that the most appropriate for doing traffic analysis is the deep learning technique.
Once all the algorithms obtained from the systematic review of the defined articles had been studied, it began with the comparison of the functioning of each one and thus be able to verify, which is the most optimal, for this a comparison of the algorithms of deep learning found. In addition, a comparison was made taking into account the basic criteria of the operation of deep intelligence or Deep learning (38, with the essential characteristics used and determined in the articles named as primary (see table 4), where it gave as results that the bidirectional deep learning techniques of short and long-term memory (BiLSTM) and Swarm (PSO) with a result of 26 points placing them as the most appropriate to be implemented in the proposed framework, due to the significant fulfillment of the characteristics evaluated as a requirement.
In Table 4, it can be seen that the algorithms were evaluated according to 5 characteristics, which allowed determining the most suitable for the project, giving a justification for why it was selected. To begin with, you have the prediction, which is nothing more than the way the algorithm behaves based on its way of grouping the information, to generate predictions that end up giving detailed information; Regarding precision, the way in which the technique generates the results was evaluated, it is important to note that not all algorithms have a learning preprocessing, some only allow the generation of tables and possible information, for this the precision section will help us; Regarding the response, the speed and precision in which the algorithm responds can be evaluated, but the main characteristic is the information that it can capture and allow us to use; Regarding resources, this item allows determining the dependency of the hardware factor together with its relationship with the time for the execution of the technique.
From the previous classification, it allowed to identify the software and hardware needs that the Deep Learning technique and its algorithm needs for its optimal functioning in the proposed system, in Table 5 the 2 selected algorithms are shown in detail together with the software and Suggested hardware for algorithm execution and processing.
After performing a test with both algorithms with an IIoT traffic dataset, it was demonstrated by its effectiveness and efficiency in the identification of anomalies that the deep learning technique with its bidirectional algorithm of short-term, long-term memory (BiLSTM) It is the most suitable for its precision in the answer and the solution to the problem, performance and low cost of resources were definitive aspects to determine it as the most suitable Deep Learning algorithm for this project (39.
We live in an environment in which it is necessary to have secure information both on a business and personal level. This is more and more necessary every day, since with the increase in connectivity and the great flow of information there are also great risks generated by computer attacks and fraud attempts in administrative networks, so several must be taken into account aspects of cybersecurity and one of them is the implementation of controls (40. In this same sense, it is important to mention that a latent vulnerability occurs in the transport of data collected by IoT devices, since 98% of the traffic that is generated is not encrypted, which exposes personal and confidential data on the network. Attackers who have successfully overcome the first line of defense (most often through phishing attacks) and have established command and control can eavesdrop on unencrypted network traffic, collect personal or confidential information, and then exploit that data to profit from being on sites like the Dark Web (2020 Unit 42 IoT Threat Report). Under the above Deep learning is the technique framed in artificial intelligence that is responsible for emulating the neural networks of human beings to feed back and automate predictive analysis, a cybersecurity framework allows the automation of problem solving, in other words deep learning It is used for problems where traditional learning methods do not achieve an appropriate performance, this technology gives the possibility not only to solve the security problem in the network but also to integrate all the necessary elements for its operation in general 41.
On the other hand, regarding the difficulties that arose during the selection process of the intelligence technique with its algorithm, it was when generating the characterizations, since it was difficult to determine which could be the important characteristics to take into account for detection traffic, but the scientific community helped a lot in making these comparisons, since the documentation is scarce, compared to what is the integration of techniques and devices as it is a technology that is booming as well as the solutions, given by the different organizations that offer ways of how to have a faster connection to the ideas raised in the articles consulted.
Thanks to the University of Cauca, especially to its GTI research group and to the Computer Science I+D research group of the Colegio Mayor of Cauca University Institution, for the support provided for the development of the project.