Abstract: The early detection of diseases in plants through artificial intelligence techniques has been a very important technological advance for agriculture since, through machine learning and optimization algorithms, it has been possible to increase the yield of various crops in several countries around the world. Different researchers have focused their efforts on developing models that allow supporting the task of detecting diseases in plants as a solution to the traditional techniques used by farmers. In this systematic literature review, an analysis of the most relevant articles is presented, in which image processing techniques and machine learning were used to detect diseases by means of images of the leaves of different crops. In turn, an analysis of the interpretability and precision of these methods is carried out, considering each phase of the image processing, segmentation, feature extraction and learning processes of each model. In this way, there is evidence of a void in the field of interpretability since the authors have focused mainly on obtaining good results in their models, beyond providing the user with a clear explanation of the characteristics of the model.
Keywords: classificationclassification,early detection of diseasesearly detection of diseases,image processingimage processing,interpretabilityinterpretability,machine learningmachine learning.
Resumen: La detección temprana de enfermedades en las plantas mediante técnicas de inteligencia artificial, ha sido un avance tecnológico muy importante para la agricultura, ya que por medio del aprendizaje automático y algoritmos de optimización, se ha logrado incrementar el rendimiento de diversos cultivos en varios países alrededor del mundo. Distintos investigadores han enfocado sus esfuerzos en desarrollar modelos que permitan apoyar la tarea de detección de enfermedades en las plantas como solución a las técnicas tradicionales utilizadas por los agricultores. En esta revisión sistemática de literatura se presenta un análisis de los artículos más relevantes, en los que se usaron técnicas de procesamiento de imágenes y aprendizaje automático, para detectar enfermedades por medio de imágenes de las hojas de diferentes cultivos, y a su vez se lleva a cabo un análisis de interpretabilidad y precisión de estos métodos, teniendo en cuenta cada fase las fases de procesamiento de imágenes, segmentación, extracción de características y aprendizaje, de cada uno de los modelos. De esta manera se evidencia vacío en el campo de la interpretabilidad, ya que los autores se han enfocado principalmente en obtener buenos resultados en sus modelos, más allá de brindar al usuario una explicación clara de las características propias del modelo.
Palabras clave: clasificación, detección temprana de enfermedades, procesamiento de imágenes, interpretabilidad, aprendizaje automático.
Resumo: La detección temprana de enfermedades en las plantas mediante técnicas de inteligencia artificial, ha sido un avance tecnológico muy importante para la agricultura, ya que por medio del aprendizaje automático y algoritmos de optimización, se ha logrado incrementar el rendimiento de diversos cultivos en varios países alrededor do mundo. Diferentes pesquisadores têm focado seus esforços no desenvolvimento de modelos que permitam apoiar a tarefa de detecção de doenças em plantas como solução às técnicas tradicionais utilizadas pelos agricultores. Nesta revisão sistemática da literatura, é apresentada uma análise dos artigos mais relevantes, nos quais técnicas de processamento de imagens e aprendizado de máquina foram utilizadas para detectar doenças por meio de imagens de folhas de diferentes culturas, e por sua vez é realizado um análise da interpretabilidade e precisão destes métodos, tendo em conta em cada fase as fases de processamento da imagem, segmentação, extração de características e aprendizagem, de cada um dos modelos. Desta forma, evidencia-se um vazio no campo da interpretabilidade, uma vez que os autores têm se focado principalmente na obtenção de bons resultados em seus modelos, além de fornecer ao usuário uma explicação clara das características do modelo.
Palavras-chave: classificação, detecção precoce de doenças, processamento de imagem, interpretabilidade, aprendizado de máquina.
Artículos
Interpretability in the Field of Plant Disease Detection: A Review
Interpretabilidad en el campo de la detección de enfermedades en las plantas: Una revisión
Interpretabilidade no campo da detecção de doenças de plantas: uma revisão
Received: 16 October 2021
Accepted: 25 November November 2021
Agriculture is one of the most important economic industries for the internal production of a country since it makes possible the production of food and various products that depend primarily on small farmers, who are responsible for 80% of the general production. Unfortunately, at least 50% of the yield of this production is lost due to crop pests and diseases [1]. However, automated systems capable of identifying the disease type and providing timely support to take appropriate action have been developed to address these issues and benefit agriculture [2].
In recent years, different fast and effective methods have been proposed to detect diseases in plants and thus avoid losses in the agricultural industry [3]; for example, by the end of the '70s, computer image processing technology was applied to agricultural studies. Researchers demonstrated that these techniques could be used as a disease detection system [4]. Later on, between the '80s and '90s, remote detection and technical diagnosis active applications based on image processing were developed to create detection algorithms with machine learning [5]. Thus, it is evident the growing interest in researching the application of machine learning algorithms to identify plant diseases. Several machine learning techniques have been used and discussed; nevertheless, it seems there is not a universal technique for all types of situations [6].
The world population is expected to increase to 9 billion by 2050; therefore, developing methods for detecting and mitigating plant diseases has two purposes that favor food security: increasing the crops yield and reducing the use of pesticides [7]. While the accuracy of traditional detection methods, such as the PCR, is not disputed, a system that uses one of these artificial intelligence structures can offer an early detection method to be used along with traditional laboratory tests, expediting responses and mitigating crop losses [8]. Furthermore, researching the detection of plant diseases benefits farmers since many of them worldwide do not have access to technical advice for rural areas, making their crops especially vulnerable to yield losses and other problems caused by plant diseases [9].
On the other hand, regarding theoretical research and practical applications, image classification for disease detection through crop leaves is of great importance [10] and, therefore, it is the focus of this study. Many researchers have studied crop disease detection based on pattern recognition and found that the farmers’ knowledge can be supported by data engineering and tools or techniques associated with several agriculture sectors to improve their productivity [11].
This paper aims at providing a vision of the different techniques used for plant disease detection from an accuracy and interpretability perspective. Additionally, it presents a summary of factors and processes commonly used when researching this topic. Initially, a literature review protocol that allows obtaining a set of articles with a high level of relevance regarding the main topic of this study is outlined. Then, a series of core questions to guide the purpose of this study are proposed. Subsequently, the search of the literature in different databases is carried out considering specific selection criteria. Finally, the scope and impact of the results obtained through this research are discussed. The rest of the document provides a detailed analysis of the different factors related to plant disease detection from an interpretability perspective and answers the initial research questions.
To determine the current state of the machine learning techniques used for the detection and classification of plant diseases, it is necessary to define a series of steps to follow. Following Massaro’s structured literature review (SLR) [12], where a guide for literature review is provided, the following steps are proposed:
Literature review protocol
Defining core questions to be answered by the literature review
Searching for literature related to the research subject
Defining the article’s impact
Establishing the analytical framework
The previous steps can be observed in Figure 1, where the process to obtain a set of adequately filtered literature is described in detail. Then, the analytical framework is established to obtain the results that allow to determine the state of the research in the area and answer the research questions.
This bibliographic review is focused on the articles most relevant for the main subjects related to image analysis, plant diseases, detection, and machine learning. A total of 51 articles were weighted based on the quartile of their respective journal, the article’s citation index, the journal’s h-index, the article’s year of publication, and their relevance based on the search equation of each database. These parameters will allow to select and reduce the total number of research papers to be analyzed, thus, guiding this literature review toward the most relevant articles in the research field.
The following questions aim at guiding the literature review:
- Which machine learning methods are currently the most widely used for plant disease detection using image analysis techniques?
- What are the trends in machine learning methods used for plant disease detection using image analysis techniques?
- What is the role of interpretability in the machine learning methods developed by the authors?
Once the literature review protocol and the research questions are defined, the search equation is created to obtain the largest number of articles related to the research subject, as shown in Table 1. A total of 226 articles were obtained in the academic databases Scopus, Web of Science, Science Direct, and Google Scholar.
1) Initial Cleansing. To delete possible duplicated records, the obtained data were cleansed based on the article’s title and authors, which resulted in 20 duplicated records for a total of 206 articles.
2) Defining the Article’s Impact. For the selection process of the most relevant and significant literature for this research, a weighting of different relevant factors from each article is established, among which are: year, citation index, quartile, h-index for the journal, and the relevance of each article given by the database from which it was extracted. In Table 2, each factor has its corresponding relevance for this search.
In this section, different aspects of the 51 articles selected in the bibliographic review are assessed.
This section presents a perspective of the systems developed for disease detection that reduce the follow-up work in extensive crop areas, which in early stages is beneficial if the symptoms are detected when they appear in the leaves [13]. However, although in different articles the detection and classification were considered altogether, it is evident that for several researchers, the most important one is detection, since it is the main objective of the methods developed by computer systems, as shown in the analysis of each of the stages implemented to detect diseases described below.
1) Image Acquisition. To detect diseases under any circumstances, authors acquired images in an adequate or random environment to train and test a model. Table 3 summarizes the sources used in the acquisition of images and a numerical range to indicate how many images were obtained by each source in the reviewed articles.
These sources usually belong to open-access databases to enable, in combination with advances in computer vision, the deciphering of specific symptoms of plant diseases [2]. It is noticeable that most of the images acquired by the researchers are focused on crops that are more susceptible to diseases that mainly affect their leaves.
2) Preprocessing. In this step, researchers use several techniques that highlight the affected regions based on elements such as temperature, noise, contrast, and others, trying to make the image more visible and easier to understand compared to the original. Figure 2 shows the aspects considered to make the images suitable for entering the detection model and thus improve its accuracy when determining if a plant is sick.
Forty of the papers analyzed improved the quality of the image based on visual concepts. Usually, the initial images are in an RGB color model, and the transformation depends on how the author considers the model will have better accuracy. Table 4 presents the different preprocessing techniques reviewed, emphasizing the techniques found according to the visual concept processing.
One image preprocessing example is presented in [26], where the Ycb, HSI, and CIELAB color models were used to detect disease spots successfully without being affected by the noise. Similarly, in [16] and [22], some techniques that transform color to CIELAB, HSV, and HSI models are used to perceive and interpret better the images helping the model detect explicit information more efficiently by eliminating lighting and contrast issues.
3) Segmentation. Image segmentation is one of the most critical stages of the pattern recognition process since it affects the classification results [24]. In this stage, the image is divided into regions with particular relevance to train the model with the help of different techniques based on artificial vision, which assigns a special meaning to the fragments of the image, then, a category is assigned to each pixel as the model analyzes an image.
Forty-three of the 51 papers reviewed specified the technique for the segmentation of the plants’ images and the automatic detection of lesions in them, which resulted in data labelling, pattern identification or similarities as appropriate. Figure 3 shows the techniques most commonly used by researchers to develop this stage during the data preparation, considering the relevance of each of them in the machine learning development within each computerized system. There are many different ways of carrying out image segmentation; these range from simple thresholding to advanced color image segmentation methods.
1) Feature Extraction. In this section, the main problem addressed is finding an accurate representation of the image of a plant affected by disease using feature extractors [24]. This stage is one of the most complex since the noticeable features such as texture, shape, size, and color are recognized in all the images used to train a model. Furthermore, obtaining good results in the validation and testing with different images depends on this. Therefore, feature extraction plays an essential role in artificial vision and pattern recognition to represent an image [22].
To fulfil the purpose of reducing and specifying the necessary data, the researchers consulted have established or designed diverse architectures based on the features of the lesions in plants, looking for the best performance, as shown in Table 5.
Feature extraction techniques have been a significant evolutionary development in areas such as agriculture since they reduce the redundancy of the information and contribute to the execution time, making crop disease detection a faster and more efficient process. Nevertheless, during this review, it was evidenced that for the execution of these techniques, researchers design their algorithms in specific coding platforms whose features contribute to machine learning, among which the following stand out: Matlab, Python, Opencv, C++, and TensorFlow.
2) Learning Technique. Finally, this is the stage related to the learning task. Its main objective is to identify patterns so that the entry data are classified in several groups repetitively based on the default algorithms for the system, continuously improving the process according to the relevance of the data.
The machine learning techniques are in charge of this task, as they have proven to be efficient classifiers in diverse applications throughout their history. Table 6 summarizes the main techniques used by the researchers consulted in this review for the data classification and disease detection, each trained for specific plants and diseases. Likewise, it is essential to consider that some models, such as the one proposed in [1], classify the data with the help of two or more learning techniques to optimize the results, which affects the accuracy percentage and the learning task carried out by these methods.
In general, early disease detection models based on machine learning solve the complex issue of recognizing patterns in images of infected or healthy leaves employing different techniques or models as required. Figure 4 presents the machine learning approach used in the 51 articles reviewed. Some models have been trained and tested with more than one approach to see their behavior and optimize the results. Nonetheless, those were included to observe the authors’ tendency toward these approaches when training their plant disease detection algorithms.
This section groups the optimization techniques used in the articles reviewed. It becomes evident that the main objective of implementing these techniques is to reduce the losses and provide the most accurate results, hence the importance of considering them during the review. In addition to being aware of how the proposed models work, it is relevant to know that it is common to combine them with other learning functions that help adjust some values or are developed based on a specific algorithm to improve their accuracy. In this case, 33 of the models reviewed were optimized with the techniques described in Table 7, whereas the rest of the papers did not optimize their models since they obtained an acceptable performance without implementing these techniques during the learning task determined by each author.
In machine learning, it is necessary to assess the general performance of a proposed model to know its pattern recognition ability and thus determine if it effectively complies with the assigned task. Although a variety of performance measures are used for that purpose, three specific measures were considered for this review, which have proven to be the most relevant and accurate to develop this task when considering the sensitivity to atypical values, the execution time, and even the true and false positives.
Figure 5 shows the usage percentage of the confusion or error matrix, F1 score, and the Kappa coefficient methods, which were outstanding and cross-sectional methods in the review to verify the performance of each method.
Some researchers assess their model by using several review metrics to verify and maintain a follow-up on the data since their objective is to improve the general prediction of each model before working with new data.
Finally, in the literature reviewed, it was found that only one author [35] addressed the issue of interpretability in their proposed model by developing a hybrid system composed of techniques known for their high accuracy and interpretability, such as vector support machines and neural networks, in addition to Mamdani-type diffuse systems. This way, the author addressed the construction of a hybrid model that allows the end-user to know part of the composition of the model and analyze its deductions without neglecting accuracy.
All the articles reviewed for this research meet the requirement of being models designed for the early detection of plant diseases in a non-destructive way based on images. These images were acquired from different crops and processed in various ways. However, there is a void in the interpretability field between the accuracy and interpretability of each model. Only 1 out of 51 articles is highly interpretable as it explains in detail its functionality and simplifies the complexity of understanding how the proposed model makes decisions. This void results in users not knowing what affects the results or the features recognized in each model.
During this review, it was also possible to evidence the gradual and significant progress of the image processing techniques applied to images from different crops. These are usually obtained by digital cameras in controlled environments, web databases such as PlantVillage and even hyperspectral images taken by UAVs, and then processed based on their visual concepts and color transformation models such as CIELAB and image enhancement such as enhanced RGB.
In developing machine learning models, image segmentation is vital to differentiate sick regions in plants. The most common techniques used by the researchers for this purpose are based on preestablished architectures and threshold methods, which accomplish the task of separating the relevant objects of an image from the rest when the image is clearly visible. In the same way, after this image segmentation, the algorithm must comply with the task of reducing and specifying the necessary data. The hybrid algorithms and deep learning architectures such as ResNet and U-Net are the most prominent.
In the feature classification stage, where the objectives are accuracy and efficiency, it is possible to answer the research question: Which machine learning methods are currently the most widely used for plant disease detection using image analysis techniques? In table 6, the methods used were organized from most to least used by the researchers, showing that the vector support machines, KNN, and convolutional neural networks methods are the most outstanding due to their accuracy in machine learning to detect diseases and classify plants as healthy or sick.
Similarly, this review answered the question: What are the trends in machine learning methods used for plant disease detection using image analysis techniques? Figure 4 summarizes the machine learning approaches used for plant disease detection through image analysis techniques, considering the traditional approaches and an additional one called transfer learning. This shows there is a trend toward using supervised learning.
Finally, another question was answered: What is the role of interpretability in the machine learning methods developed by the authors? It is concluded that most of the current models and those in development are focused mainly on high accuracy, leaving aside the role of interpretability since if it is not addressed correctly, it can significantly affect the accuracy of the models.
Although there are algorithms with good performance in detecting diseases without additional functions, most of the researchers prefer to optimize their models with the techniques described in Table 7. Two that stand out are the stochastic gradient descent and the gradient descent optimizer, which minimize the data loss, providing more accurate results due to all the training data and use the loss of only one sample from training to calculate the average loss.
When gathering the most critical data in the development of an automated system that detects plant diseases through images, it is concluded that machine learning is an appropriate assistance for farmers since early disease detection allows them to make better decisions for their crops and improve their productivity. Additionally, this type of research significantly increases the studied species’ scope, the techniques developed, and the classified diseases.
Similarly, it is concluded that there is a wide range of alternatives to develop a detection model using pattern recognition since there are various ways of capturing images, focusing on relevant areas in them and developing algorithms, supervised or not that contribute to the accuracy, in this case of agriculture, and thus improve the yield in each research continuously.
This review also allowed to notice a void in the interpretability field since the authors creating disease detection models through images have been focused mainly on obtaining good results, overlooking the possibility of giving the user a clear explanation of the features recognized or the specific issues affecting their results. Therefore, it is important to consider this element in the development of future models since it is relevant for the user to know the structure of the model and be able to deduce from it the process to obtain the results. This can help in the creation of more relevant models that improve agriculture.