Abstract: The use of Unmanned Aerial Vehicles (UAVs) equipped with spectral cameras has increased in recent years, especially in the agricultural sector, because it allows farmers and researchers to analyze the state of a crop, i.e., health, nutrients, growth, epidemics, among other parameters. In Colombia, the coffee production sector faces several challenges, such as the need to increase the productivity, the yield, and the quality of coffee. This work estimated the health status of a Castilla variety crop located in San Joaquín, Tambo, Cauca to support the decision-making of coffee growers. For this, chlorophyll data were measured in the field with the CCM-200 plus device, multispectral images were captured with the MAPIR SURVEY 3 camera airborne on a SOLO 3DR UAV, and synthetic data were generated to increase the data set. Six vegetation indices were set, which—together with the chlorophyll values—were modeled through the implementation of simple and multiple linear regressions, decision trees, vector machines, random forests, and k-nearest neighbors. The model with the best performance and the lowest mean square error was disorder with the support vector machine. Likewise, the best performance indices in the models were CVI, GNDVI, and GCI, which are widely used in agriculture to estimate the chlorophyll of plants.
Keywords: Agriculture, coffee, multispectral images, synthetic data, UAV, vegetation indices.
Resumen: El uso de Vehículos Aéreos No Tripulados (UAVs) equipados con cámaras espectrales se ha incrementado en los últimos años, especialmente en el sector agrícola ya que permite a los agricultores e investigadores analizar el estado de un cultivo, ya sea para analizar su salud, nutrientes, crecimiento, epidemias, entre otros parámetros. En Colombia, el sector cafetero enfrenta varios desafíos, como la necesidad de incrementar la productividad, el rendimiento y la calidad del café. Este trabajo estimó el estado sanitario de un cultivo variedad Castilla ubicado en San Joaquín, Tambo, Cauca para apoyar la toma de decisiones de los caficultores. Para ello, se midieron datos de clorofila en campo con el dispositivo CCM-200 plus, se capturaron imágenes multiespectrales con la cámara MAPIR SURVEY 3 aerotransportada en un UAV SOLO 3DR y se generaron datos sintéticos para aumentar el conjunto de datos. Se establecieron seis índices de vegetación, los cuales, junto con los valores de clorofila, se modelaron mediante la implementación de regresiones lineales simples y múltiples, árboles de decisión, máquinas vectoriales, bosques aleatorios y k-vecinos más cercanos. El modelo con el mejor rendimiento y el menor error cuadrático medio fue el modelo implementado con máquina de vectores de soporte. De igual forma, los mejores índices de desempeño en los modelos fueron CVI, GNDVI y GCI, los cuales son muy utilizados en agricultura para estimar la clorofila de las plantas.
Palabras clave: Agricultura, café, datos sintéticos, imágenes multiespectrales, índices de vegetación, UAV.
Resumo: A utilização de Veículos Aéreos Não Tripulados (VANTs) equipados com câmeras espectrais tem aumentado nos últimos anos, principalmente no setor agrícola, pois permite que agricultores e pesquisadores analisem o estado de uma lavoura, seja para analisar sua saúde, nutrientes, crescimento, epidemias, entre outros parâmetros. Na Colômbia, o setor cafeeiro enfrenta diversos desafios, como a necessidade de aumentar a produtividade, o rendimento e a qualidade do café. Este trabalho estimou o estado de saúde de um cultivo da variedade Castilla localizado em San Joaquín, Tambo, Cauca, para apoiar a tomada de decisões dos cafeicultores. Para fazer isso, os dados de clorofila foram medidos no campo com o dispositivo CCM-200 plus, imagens multiespectrais foram capturadas com a câmera aérea MAPIR SURVEY 3 em um SOLO 3DR UAV e dados sintéticos foram gerados para aumentar o conjunto de dados. Foram estabelecidos seis índices de vegetação que, juntamente com os valores de clorofila, foram modelados por meio da implementação de regressões lineares simples e múltiplas, árvores de decisão, máquinas vetoriais, florestas aleatórias e k-vizinhos mais próximos. O modelo com melhor desempenho e menor erro quadrático médio foi o modelo implementado com a máquina de vetores de suporte. Da mesma forma, os índices de melhor desempenho nos modelos foram CVI, GNDVI e GCI, amplamente utilizados na agricultura para estimar a clorofila das plantas.
Palavras-chave: Agricultura, café, dados sintéticos, imagens multiespectrais, índices de vegetação, UAV.
Artículos
Coffee Crops Analysis Using UAVs Equipped with Multispectral Cameras
Uso de VANTs equipados con cámaras multiespectrales para el análisis de cultivos de café
Uso de VANTs equipados com câmeras multiespectrais para análise de lavouras de café
Received: 03 August 2022
Accepted: 18 November 2022
Published: 27 November 2022
In Colombia, coffee farmers face several challenges, including the lack of financial support from government entities, limited resources, and low access to technologies to optimize their activities [1]. These limitations can be addressed from different points; however, from the technological perspective, it is necessary an efficient, economical, and accessible solution that is easy to implement.
One of the most used and least expensive technologies to optimize agricultural management are UAVs, since these equipment with multispectral cameras enables seeing the spectral difference between healthy and diseased vegetation [2], [3]. Moreover, the variety of soils, environments, and the given treatment must be considered because small variations in these conditions can impact the characteristics of crops and the analysis of their reflectance[4], [5]. For instance, analyzing coffee crops under agroforestry systems is different from free exposure crops because they grow in dissimilar microclimates.
This research will focus on the study of the health status of coffee crops through the analysis of images obtained by the Agrocam and Survey 3 multispectral cameras transported in Unmanned Aerial Vehicles (UAVs). This aims to create tools that support coffee growers to monitor the general health of their crops in a global and fast way in the future, thus saving resources, time, and money.
The Scimath tool and a Systematic Literature Review (SLR) on software engineering were used for the bibliometric collection following Kitchenham’s methodology [6]. In addition, when executing the bibliographic search using the filter "Multispectral and Coffee" without restriction of years in the Web of Science (WOS) and Scopus databases, 22 studies that analyzed coffee crops through multispectral images were found. None of these studies was carried out in Colombia, as shown in Figure 1. Subsequently, another search was carried out in Google Scholar using the filter "Multispectral and Coffee and Colombia", and 536 results were obtained, out of which only two focused on coffee crops [7], [8].
The most important works related to the use of UAVs and multispectral images in Colombia are highlighted below, Meneses et al. [9]used drones and RGBN (Red, Green, Blue, NIR) cameras to know the health status of the plants by studying the spectral response of a potato crop in Cundinamarca and calculating the NDVI. J. Rojas et al [10] proposed a different approach to analyze the health status, the authors calculated 7 vegetation indices: Relative Vigor Index (RVI), Green Normalized Difference Vegetation Index (GNDVI), Difference Vegetation Index (DVI), Transformed Vegetation Index (TVI), Corrected Transformed Vegetation Index (CTVI), Modified Soil-Adjusted Vegetation Index (MSAVI), and Normalized Difference Vegetation Index (NDVI) to estimate rice biomass at different stages of cultivation by UAV and a Tetracam ADC-lite multispectral camera.
Using the same equipment, Rojas et al [10], [11] developed a system (Hardware and Software) to capture and process multispectral images of rice crops at different stages and for two types of crops: Santa Rosa for lowlands and Palmira for highlands. To do this, they calculated the vegetation indices (RVI, NDVI, GNDVI, DVI, CTVI, TVI, and MSAVI). Debian et al [2] monitored rice fields in Bogotá using UAVs and a NIR camera. This provided farmers with an integrated tool to measure and assess living green vegetation by assembling a mosaic of multispectral images of the terrain.
In contrast to the works mentioned above, the present study aims to evaluate the use of images captured by the Survey 3 multispectral camera transported by a UAV to study the health status of freely exposed coffee crops in the department of Cauca, Colombia.
For the experimental design, the CRISP-DM model was adapted for this experimental design [12], [13]. It consists of 5 phases: i) understanding the business and the goals of the project; ii) data collection; iii) prepare and understand data; iv) techniques and processing applied to the data; v) data evaluation [12], [13].
Figure 2 presents the stages of the adapted experimental design linked to some phases of the CRIPS-DM model. The first phase, called Understanding the business, analyzes the problem presented in the previous section, in this adaptation it was not considered in the experimental design stages. The second and third phases are joined to form the Data capture stage, which involves the collection of data. The third stage, Data pre-processing, involves filtering the data captured initially and the generation of synthetic data to obtain the final data set with which the models will be trained. Finally, the fourth and fifth phases represent the Data modeling through learning techniques and the evaluation of the error of said models.
Figure 3 shows the internal stages of data collection and processing.
In Step 1, the flight of the SOLO 3DR UAV was programmed in Mission Planner to capture multispectral images of a coffee crop located in San Joaquín, Tambo, Cauca with the MAPIR SURVEY 3 camera. Step 2 represents the collection of the data, which includes taking pictures with the UAV and chlorophyll samples with the CCM-200 plus device. For mapping the terrain, the number of samples to be taken were determined based on the literature [14], [15]. Two samples per leaf, 30 leaves per plant, and 30 coffee trees in total were taken from a field of approximately 460 coffee trees in an X shape—as shown in Figure 4—to measure chlorophyll, thus fulfilling the first phase of data capture.
It should be noted that chlorophyll serves as an indicator of the health status, nutrients, and productivity of plants [15]- [17]. It is possible to establish a classification scheme for the health and nutritional status based on the chlorophyll values measured in situ [18]- [20]. The ICC values are close to those found in the literature; however, for this study the relationship between SPAD and ICC expressed by [21] will be considered because it includes a wider variety of crops (Table 1).
In Step 3, the orthophotos are created with the professional Agisoft Metashape program to have a solid image of the terrain to be studied. They are processed in the QGIS program where the values of the vegetation indices (NDVI, GNDVI, RVI, GCI, NRVI, and CVI) of the sampled plants are obtained and associated with the CCI taken from chlorophyll. In Step 4, due to the small amount of data obtained initially and the low correlation, the data set is filtered by date and time range to improve the latter. As shown in Table 2, filtering the data improves said correlation and synthetic data is generated with the Gretel-Synthetics library from Gretel using the new data set [22], thus finishing the data pre-processing phase. Based on these results, in Step 5 the machine learning models are trained with the initial data structure and the synthetic data. They are modeled with linear regressions, support vector machine, decision trees, random forest, and k-nearest neighbors.
At the end of the data collection and training the models, we found that for each case, integrating synthetic data decreased the given error by having more training data. Table 3 shows that the model that obtained the lowest error when having all the training data was support vector machine, with an error of 7.85 and a correlation of 0.58.
Figure 5 shows the behavior of the estimated data of each model compared to the real data (blue line), the X axis represents the data identifier, and the Y axis is the chlorophyll value. The vector machine model stands out.
Finally, to finish the data modeling phase, Table 4 shows the error for each vegetation index and trained model. The vector machine model stands out, like in Table 3. Moreover, the vegetation indices with less error were those related to chlorophyll (GCI, GNDVI, and CVI), which makes sense, and presented a high correlation.
When analyzing the errors of the different models, their values can be justified by several factors. In the first place, the chlorophyll data present a significant standard deviation both at the leaf and plant levels, which increases the dispersion between the data obtained. In addition, the long sampling times of the physical parameters with respect to the UAV flight time must be considered, since taking the physical samples took 7-8 hours approx. and the UAV flight was carried out in 10 minutes at a specific time of the day (usually between 11:00 a.m. and 01:00 p.m.). Another important aspect is that the climatic variation affects the response of chlorophyll and the reflectance captured by the camera; in one of the flights the day was clear and in the other there was intermittent cloudiness. More flights were made, but they were discarded because it was not possible to collect all the data completely since the rains affected the sampling. Additionally, the national strike on April 28, 2021, and the health emergency due to the COVID-19 pandemic affected the mobilization to carry out more flights.
For these reasons, it was necessary to categorize and segment the information to conduct the exploratory data analysis and model it in a better way. It is important to take the data at a certain time, preferably at noon, when the sun is at its highest point and the reflectance level is most appropriate. Having a sampling scheme is also important because defining the section and measurement points at the leaf and plant level enables better relating the data obtained.
Considering the aspects that affected the experiment, when observing the errors obtained, the correlation, and the level of precision, the vector machine model stands out. In it, outliers have less impact due to segmentation of hyperplanes. When reviewing multicollinearity through the analysis of the correlation among the vegetation indices, the division between two groups stands out: the first one relates the NDVI, RVI, and NRVI indices; the second group contains the GNDVI, GCI, and CVI indices. For most of the models, except linear regression, the second group obtains the lowest errors, and it makes perfect sense since—as has been expressed on several occasions and according to the literature—they are used to estimate the chlorophyll content. In the linear regression, the NRVI index stood out. It is similar to the NDVI, but also reduces the effects of topography, lighting, and atmospheric effects, which is helpful when clouds dim the capture of multispectral images.
Finally, the health of the crop in general is estimated as adequate in each plant, taking the reference value of chlorophyll as a reference and the estimate, since it exceeds 54 CCI in each plant. In general, the UAV airborne Mapir Survey camera was useful to estimate the health of coffee crops; however, it is advisable to have more indicators (carbon, nitrogen, fertilizer, among others) apart from the chlorophyll measured with the CCM 200 PLUS to feed the prediction models.
Research carried out in recent years documents the potential of UAVs and multispectral images applied to agriculture. By using a 3DR Solo UAV, the Mapir survey 3 and Agrocam multispectral cameras in coffee crops in the department of Cauca, it is expected to estimate the health, nitrogen levels, and maturity with acceptable precision, which supported coffee growers in making decisions.
By knowing the state of maturity, health, and nitrogen levels of an affected space quickly, control measures can be used to treat the area in the best possible way. Then, coffee growers will be able to determine the amounts of nitrogen fertilizer needed, to separate infected areas with some type of pest or damage, and to estimate times and harvest yield.
As future works, we propose to analyze more thoroughly the correlation among values of vegetative indices such as NDVI, SAVI or TVI with pathogens or common ailments in coffee, and to estimate other components such as biomass and chlorophyll, which will yield more detailed results about what is affecting the crop.
Considering the research opportunities opened by this research project, the following works are proposed: