VIEW POINTS

Analysis of Main Components, an Effective Tool in Agricultural Technical Sciences

Análisis de componentes principales, una herramienta eficaz en las Ciencias Técnicas Agropecuarias

Lucía Fernández-Chuairey *
Universidad Agraria de La Habana (UNAH), Cuba
Lazara Rangel-Montes de Oca
Universidad Agraria de La Habana (UNAH), Cuba
Mario Varela-Nualles
Instituto Nacional de Ciências Agrícola (INCA), Cuba
José Antonio Pino-Roque
Universidad Agraria de La Habana (UNAH), Cuba
Jany del Pozo-Fernández
Universidad Agraria de La Habana (UNAH), Cuba
Nelson Ulises Lim-Chamg
Universidad Agraria de La Habana (UNAH), Cuba

Analysis of Main Components, an Effective Tool in Agricultural Technical Sciences

Revista Ciencias Técnicas Agropecuarias, vol. 31, no. 1, e10, 2022

Universidad Agraria de La Habana

Received: 20 May 2021

Accepted: 12 November 2021

ABSTRACT: Currently there is a wide range of multivariate techniques, which are used in different areas of research. The present work focuses on the Principal Components Method and aims to establish a set of methodological criteria for the processing and interpretation of results in the use of this technique on mathematical-statistical bases. An example associated with post-harvest studies of the pineapple (variety Cayena Lisa) is developed. A sequence of steps is proposed that includes: previous analysis of correlation between variables, determination of the number of components to be selected (compromise between the different criteria), weight of variables in each component, biological interpretation and graphs that validate the results obtained in reference to components and individuals. The study had the variables: weight loss in g (PP), firmness, color index (IC), soluble solids content (SSC) and pH. The variables were grouped into two components that explain 88.36% of the variation in the data. A positive relationship was observed among PP, SSC and pH and the negative relationship of firmness with these variables. It is shown that the highest PP and pH are reached from the sixth day and the highest firmness, in the first two days, aspects to take into account in making timely decisions for storage, transportation and marketing. It is concluded that the use of multivariate techniques and, particularly, the analysis of principal components constitutes an efficient and non-destructive way in monitoring the quality of fruits in storage.

Keywords: Main Components, Agricultural Engineering, Multivariate Methods.

RESUMEN: En la actualidad existe una amplia gama de técnicas multivariadas, que se utilizan en las diferentes áreas de investigación. El presente trabajo se centra en el Método de Componentes Principales y tiene como objetivo establecer sobre bases matemático-estadísticas un conjunto de criterios metodológicos para el procesamiento e interpretación de resultado con el empleo de dicha técnica. Se desarrolla un ejemplo asociado a estudios pos cosecha de la Piña (variedad Cayena Lisa). Se proponen una secuencia de pasos que incluye: análisis previo de correlación entre variables, determinación de números de componentes a seleccionar (compromiso entre los diferentes criterios), peso de variables en cada componente, interpretación biológica y gráficos que validan los resultados obtenidos en sentido de las componentes e individuos. El estudio contó con las variables: pérdida de peso en g (PP), firmeza, índice de color (IC), contenido de solidos solubles (SSC) y PH. Las variables se agruparon en dos componentes que explican el 88,36 % de la variación de los datos. Se observó una relación positiva entre PP, SSC y PH y la relación negativa de la firmeza con estas variables, se muestra que la mayor PP y PH se alcanza a partir del sexto día, y la mayor firmeza en los dos primeros días, aspectos a tener en cuenta en la toma de decisiones oportunas para el almacenaje, trasporte y comercialización. Se concluye que el empleo de técnicas multivariadas y en particular el análisis de componentes principales constituye una vía eficiente y no destructiva en el monitoreo de la calidad de frutos en almacenamiento.

Palabras clave: componentes principales, ingeniería agrícola, métodos multivariados.

INTRODUCTION

Historically, in agricultural sector, the need for the use of different statistical-mathematical methodologies that respond to current problems in scientific research has been present. Recently, Fernández et al. (2018; 2019), established criteria and evaluations on mathematical-statistical bases in the analysis and application of models that describe agrarian processes (based mainly on univariate and bivariate statistics).

Similarly, the literature reports on the use of multivariate methods, which are used to study phenomena that include the measurement of several variables and which are applied depending on the characteristics of the research. Among the most used multivariate statistical techniques are: Multiple Regression; Principal Component Analysis (PCA); Factor Analysis (AF); Discriminant Analysis (AD); the Numerical Taxonomy (CLUSTER); Multidimensional Scaling, among others, those that have been addressed by Lozares & López (1991); Robaina et al. (2001); Hair & Anderson (2004); Bouza & Sistachs (2006); González et al. (2008); Miranda (2011); Coronados et al. (2017); Quindemil & Rumbaut (2019); Gozá et al. (2020); Varela (2021) among other authors.

The objective of this work is to establish, on mathematical-statistical bases, a set of methodological criteria for the processing and interpretation of results with the use of the Principal Components method, its analysis is focused on post-harvest studies of pineapple (variety Cayenne Lisa)

DEVELOPMENT OF THE TOPIC

Theoretical Fundament

Various criteria have been given on the definition of multivariate statistical techniques. A general definition was proposed by Hair & Anderson (2004), who argue that “Multivariate analysis refers to all statistical methods that simultaneously analyze multiple measures of each individual or object under investigation and emphasize that any simultaneous analysis of more than two variables can be considered approximately as a multivariate analysis”.

These methods group a set of statistical techniques that are responsible for the analysis of data corresponding to measurements of p variables observed in n individuals; allowing the study of interrelations. The literature collects various multivariate methods, and classifies them fundamentally according to the purposes pursued in the research. In this sense Varela (2021), based on an analysis carried out, groups them into descriptive or decisional and alleges that one of the most widespread Multivariate Analysis techniques at present is the Principal Component Analysis (PCA) where the variables are quantitative, since it works with the Pearson correlation coefficient, designed to measure linear association between variables of this type, although there is the Principal Component Analysis option for categorical variables, which will be addressed in a future work.

Miranda (2011), refers that the objective of the ACP is to reduce the number of variables that intervene in an analysis of a certain process under study. And it states that the method consists of obtaining new variables (called Yp components) that are unrelated to each other and that keep a logical order, where the first component is the one with the greatest influence on the phenomenon under study and so on, that is:

V a r Y 1 + V a r Y 2 + + V a r Y p = T o t a l   V a r i a n c e = V a r X 1 + V a r X 2 + + V a r X p

such that:

V a r Y 1 > V a r Y 2 > > V a r Y p

How to describe the information contained in a data set by a smaller set of new variables or components? When is it effective to apply the Principal Components Method?

Principal Component Analysis is more effective to the extent that initially there is a marked correlation structure between the variables. In this respect, Miranda (2011) corroborates that, when there is no association between the variables, it makes no sense to carry out these types of analysis.

This procedure is used above all in exploratory data analysis and for descriptive purposes, it manages to simplify the studies that will be made from a smaller number of variables than the original, as well as to elucidate the relationship and weight between the observed variables and, at the same time, it allows observing the formation of groups of individuals attending to their behavior from graphic representations.

The application of this method starts from the data matrix of n individuals with p variables in which n ≥ p, where a sequence of steps that correspond to the following aspects is applied:

At present, there are valuable results regarding the use of these techniques, as shown in the works of Mesa et al. (2018) in monoclonal antibody fermentation studies, in the same way they were used in investigations associated with biopharmaceutical purification processes carried out by Goza et al. (2020).Their use in problems associated with causality in Biomedical Sciences is also reported, which included the determination of risk factors and prognoses (Sagaro & Zamora , 2020), as well as studies of functional dynamic mechanical systems of internal combustion engines according to Aliaga et al. (2021), among other applications.

Example of Application of ACP in Post-Harvest Studies of Pineapple (Variety Cayena Lisa)

Pineapple is one of the most important commercial fruit crops in the world, it is known as the queen of fruits for its excellent taste and its implication in nutrition and health (Hernández et al., 2021), hence, currently the research associated with its characterization, nutritional composition, growth studies, quality, post-harvest, among other aspects, is intensified as shown in the works of Rangel et al. (2018) and Lorente et al. (2021), among others.

Luchsinger (2017) considers that one of the impacts of post-harvest studies lies in maintaining the quality of the products until their consumption, hence the importance of investigating the different indicators. The study was carried out in areas of the company of various crops located in Havana-Matanzas Plain, with a range of average annual temperature between 25 and 32 ºC and high environmental humidity. The Weight Loss (PP) was carried out through the weighing of the fruits with the use of the electronic scale, during the days (1, 2, 3, 5, 6, 8 and 10) of harvest, and indicators such as PP, firmness, color index (IC), soluble solids content (SSC) and pH. It is desired to analyze the behavior of these variables (5 variables) on the different days (6 individuals).

The data were processed using statistical software (Statgraphics Centurion, 2012). A previous analysis showed that there is a marked correlation structure among this group of variables, which shows a positive and direct relationship between (PP - pH with r = 0.84) and (of pH -SSC with r = 0.62). It was also observed a negative and inverse relationship between (PP-firmness with r = -0.80) and (CI-firmness with r = - 0.65), which suggests a study using principal component analysis.

Construction and Selection of the Number of Components

Table 1 shows the selection of two components (eigenvalues ​​above one). Note that the first two components explain 88.36% of the total variability. This indicates that, from 5 initial variables, two components can be extracted to explain the association between the variables and observations.

TABLE 1
Number of Principal Components from criteria of eigenvalues ​​and percentage
Number of Principal Components from criteria of eigenvalues ​​and percentage


Relationship or Weight of Variables in Each Component

The weight of the variables in component 1 is fundamentally characterized by the variables loss of weight, pH and firmness (Table 2) while component 2 is characterized by the soluble solids content and the color index.

TABLE 2
Component weights
Component weights


In the case of Component 1, with positive values ​​in weight loss and pH, it can be stated that as the value of Component 1 increases, the weight loss and pH increase and the firmness of the fruits decreases. On the other hand, in the case of the second component, as its value increases, it indicates that the values ​​of the contents of soluble solids increase and the color index decreases.

Formation of possible groups. Biological sense of the components from their relationship with the initial variables

Graphic analysis of individuals and group formation. Principal Component values ​​for each row.
FIGURE 1
Graphic analysis of individuals and group formation. Principal Component values ​​for each row.


Considering the graphic representation (Figure 1), it can be argued that there are basically three groups in post-harvest. The first group characterized by the greatest loss of weight and pH, which occurs from the sixth day. From the physical point of view, the weight losses, associated with the water content of the fruit, indirectly decrease the concentration of hydrogen ions by doing this, that the pH rises, due to the senescence or putrefaction that it is reaching, which it does not facilitate its consumption as fresh fruit, hence the importance and timely decision-making for commercialization and industrialization.

In contrast to it, there is the third group, formed by the first day, where the greatest firmness is achieved, with the least loss of weight and pH. This answer is given due to the nature of the product, because once the exchange of ethylene with the surrounding environment begins, it causes increased respiration and accelerates the ripening process, a recurring phenomenon in previous investigations with this or other agricultural products (Thompson, 1998). Likewise, a gradual response is reflected in the concentration of soluble solids contents that tends to influence its acceptance by consumers and marketers. As well as the color index which allows the naked eye to discern its state of maturity, regardless of its lowest value to be reached, is included in the first day after harvest as reflected in component 2.

The result obtained of the quality of pineapple represented by these groups constitutes a valuable tool that avoids from carrying out an exhaustive control of these properties during their commercialization, transport or storage and even to make up for the lack of instrumentation for their determination. This largely makes it a non-destructive tool to monitor the quality of the fruit in storage. One of the main purposes and curiosities of this research is also satisfied. This would enhance in this time range the timely decision-making in relation to its storage, transport and commercialization. This reaffirms the criterion that quality is sought from the field and is modulated post-harvest.

Biplot graph.
FIGURE 2
Biplot graph.

Finally, the Biplot graph (Figure 2), allowed the joint analysis of variables and individuals. The positive relationship among SSC, pH and weight loss and the negative relationship of firmness with the previous variables can be appreciated; corresponding to days 8 and 10 the highest values ​​of SSC, pH and weight loss and the lowest values ​​of firmness in contrast to day 1. Similarly, it is observed (by means of perpendicular to the firmness axis) that the greatest firmness is reached in the first two days.

CONCLUSIONS

REFERENCES

ALIAGA, N.R.; DE LA TORRE, S.F.; RODRÍGUEZ, S.A.A.; GUILLÉN, G.J.: “Análisis de componentes principales en los motores de combustión interna Hyundai 1.7 MW”, Revista Ingeniería Energética, 42(1), 2021, ISSN: 1815-5901.

BOUZA, C.N.; SISTACHS, V.: Estadística, teoría básica y ejercicios, Ed. Editorial Félix Varela, La Habana, Cuba, 2006, ISBN: 959-258-373-0.

CORONADOS, Y.; VILTRES, V.; SISTACH, V.: “Aplicación de técnicas estadísticas multivariantes en el análisis de datos”, Revista Cubana de Medicina Física y Rehabilitación, 9(2): 1-12, INFOMED., 2017.

FERNÁNDEZ, C.L.; GUERRA, B.C.W.; DE CALZADILLA, P.J.; CHANG, L.N.U.: “Desarrollo de la modelación estadístico-matemática en las ciencias agrarias. Retos y perspectivas”, Investigación Operacional, 38(5): 462-467, 2018, ISSN: 2224-5405.

FERNÁNDEZ, C.L.; RANGEL, M. de O.L.; GUERRA, B.C.W.; DEL POZO, F.J.: “Modelación Estadístico-Matemática en Procesos Agrarios. Una aplicación en la Ingeniería Agrícola”, Revista Ciencias Técnicas Agropecuarias, 28(2): 72-79, 2019, ISSN: 1010-2760, e-ISSN: 2071-0054.

GONZÁLEZ, Á.L.; SOLANO, H.L.; TILANO, J.: “Análisis multivariado aplicando componentes principales al caso de los desplazados”, Ingeniería y desarrollo, (23): 119-142, 2008, ISSN: 0122-3461.

GOZÁ, L.O.; FERNÁNDEZ, A.M.; RODRÍGUEZ, G.R.H.; OJITO, M.E.: “Aplicación del Análisis de Componentes Principales en el proceso de purificación de un biofármaco”, Vaccimonitor, 29(1): 5-13, 2020, ISSN: 1025-028X.

HAIR, J.F.; ANDERSON, R.E.: Multivariate data analysis, Ed. Pearson Prentice Hall, 5a ed., Madrid, España, 2004, ISBN: 84-8322-035-0.

HERNÁNDEZ, R.G.; ORTEGA, I.E.; ORTEGA, I.I.H.: “Composición nutricional y compuestos fitoquímicos de la piña (Ananas comosus) y su potencial emergente para el desarrollo de alimentos funcionale”, Boletín de Ciencias Agropecuarias del ICAP, 9(14): 24-28, 2021, ISSN: 2448-5357.

LORENTE, G.Y.; RODRÍGUEZ, H.D.; CAMACHO, R.L.; CARVAJAL, O.C.C.; DE ÁVILA, G.R.; GONZÁLEZ, O.J.; RODRÍGUEZ, S.R.: “Efecto de la aplicación de Biobras-16 sobre el crecimiento y calidad de frutos de piña ‘MD-2”, Revista de Cultivos Tropicales, 42(2), 2021, ISSN: 0258-5936.

LOZARES, C.C.; LÓPEZ, R.P.: “El análisis multivariado: definición, criterios y clasificación”, 1991.

LUCHSINGER, L.: Impacto de la postcosecha en la calidad de frutos de exportación, [en línea], Perú, Redagrícola, 2017, Disponible en: https://www.redagricola.com/pe/impacto-de-la-postcosecha-en-la-calidad-de-frutas-de-exportacion, [Consulta: 9 de julio de 2021].

MESA, R.L.; GOZÁ, L.O.; URANGA, M.M.; TOLEDO, R.A.; GÁLVEZ, T.Y.: “Aplicación del Análisis de Componentes Principales en el proceso de fermentación de un anticuerpo monoclonal”, Vaccimonitor, 27(1): 8-15, 2018, ISSN: 1025-028X, e-ISSN: 1025-0298.

MIRANDA, I.: Estadística Aplicada a la Sanidad Vegetal, Inst. Centro Nacional de Sanidad Agropecuaria (CENSA), folleto, San José de las Lajas, mayabeque, Cuba, 173 p., 2011.

QUINDEMIL, T.E.M.; RUMBAUT, L.F.: “Análisis de componentes principales para obtener indicadores reducidos de medición en la búsqueda de información”, Revista Cubana de Información en Ciencias de la Salud, 30(3), 2019, ISSN: 2307-2113.

RANGEL, M. de O.L.; MONZÓN, M.L.L.; GARCIA, C.J.; GARCIA, P.A.: “Técnicas matemáticas para inferir cambios poscosecha en las propiedades de productos agrícolas”, Revista Ciencias Técnicas Agropecuarias, 27(4): 42-54, 2018, ISSN: 1010-2760, e-ISSN: 2071-0054.

ROBAINA, C.G.R.; MEDINA, P.; MANUEL, J.; MORALES, R.J.M.; ROBAINA, C.R.E.: “Análisis multivariado de factores de riesgo de prematuridad en Matanzas”, Revista Cubana de obstetricia y ginecología, 27(1): 62-69, 2001, ISSN: 0138-600X.

SAGARÓ, D.C.N.M.; ZAMORA, M.L.: “Técnicas estadísticas multivariadas para el estudio de la causalidad en Medicina”, Revista Ciencias Médicas, 24(2), 2020, ISSN: 1561-3194.

STATGRAPHICS CENTURION: Statgraphics Centurion, X.: “Version 16.1. 17”, Statpoint Technologies, Inc., 2012.

THOMPSON, K.A.: Tecnología post-cosecha de frutas y hortalizas, Ed. Kinesis Ltda., Colombia, 268 p., 1998.

VARELA, M.: Análisis multivariado, [en línea], Ediciones INCA, 2021, Disponible en: http://ediciones.inca.edu.cu/files/folletos/analisismultivariado .pdf, [Consulta: 30 de abril de 2021].

Author notes

Lucía Fernández-Chuairey, Profesor Titular, Universidad Agraria de La Habana (UNAH), Departamento de Matemática y Física, e-mail: lucia@unah.edu.cu
Lazara Rangel-Montes de Oca, Profesor Asistente, (UNAH), Departamento de Ingeniería Agrícola, e-mail: lazarar@unah.edu.cu
Mario Varela Nualles, Investigador Titular, Instituto Nacional de Ciencia agrícola (INCA), e-mail: varela@inca.edu.cu
José Antonio Pino Roque, Profesor Auxiliar (UNAH), Departamento de Matemática y Física, e-mail: pino@unah.edu.cu
Jany del Pozo-Fernández, Instructor, Universidad Agraria de La Habana (UNAH), Facultad de Medicina Veterinaria, e-mail: janydelpozo@gmail.com
Nelson Ulises Lim Chamg, Profesor Auxiliar (UNAH), Departamento de Matemática y Física e-mail: limc@unah.edu.cu
AUTHOR CONTRIBUTIONS: Conceptualization: L. Fernández Data curation: L. Fernández, L.R. Montes de Oca. Formal analysis: L. Fernández, J. A. Pino, J.del Pozo, N. U. Lim, Investigation: L. Fernández, L.R. Montes de Oca ,M Varela, J. A. Pino ¸ J.del Pozo, N. U. Lim. Methodology: Resources : L. Fernández , L.R. Montes de Oca. Roles/Writing, original draft: L. Fernández. Writing, review & editing: L. Fernández, L.R. Montes de Oca , M Varela, J. A. Pino ¸ J.del Pozo, N. U. Lim

*Author for correspondence: Lucía Fernández-Chuairey: e-mail: lucia@unah.edu.cu

Conflict of interest declaration

The authors of this work declare no conflict of interests.
HTML generated from XML JATS4R by