Abstract: Yield prediction is still a major challenge in pear production. Forecasting fruit growth after modeled curves allows predicting both potential yield and quality. This research aimed to fit multilevel no-linear mixed models (NLMM) based on logistic curves to describe pear growth in the Upper valley of Rio Negro and Neuquén, Argentina. The models incorporated several thermo-accumulative indices accounting for temperature effects on fruit-growth physiology. In this way, they captured normal fruit-growth patterns and environmental variations along the growing season. The study was conducted on “William´s” pear trees for 16 seasons. Many trees and fruits were randomly selected and identified. Equatorial diameters were weekly measured with an electronic digital caliper. Climatic data was recorded for all studied seasons from INTA Upper Valley agrochemical station and thermo accumulative indexes were calculated from daily temperature. The best models were selected according to the information criteria index. Multilevel NLMM discerned and quantified the sources of stochastic variability at different levels, allowing better index criteria in comparison to models only considering a single level of variability among random effects. The incorporation of thermo accumulative indexes also increased model performance.
Keywords: pear, growth curves, yield prediction, random effects.
Resumen: El pronóstico de cosecha es un gran desafío en la producción de peras. Estimar el tamaño de los frutos a partir de curvas de crecimiento permite predecir tanto la cantidad como la calidad de la fruta para cosecha. Este trabajo tuvo como objetivo ajustar modelos mixtos no lineales multiniveles (MMNL) basados en la curva logística para describir el crecimiento de peras “William´s” en el Alto Valle de Río Negro y Neuquén, Argentina. Los modelos ajustados incorporaron diferentes índices termoacumulativos que contemplan los efectos de la temperatura en la fisiología del crecimiento de los frutos. De esta manera, se logra no solo describir el crecimiento de los frutos, sino también se pueden observar las variaciones ambientales a lo largo de las temporadas de crecimiento. El estudio se realizó en perales “William´s” durante 16 temporadas. Se seleccionaron e identificaron al azar numerosos árboles y frutos. A cada fruto se le midió su diámetro ecuatorial semanalmente con un calibre digital electrónico. Los datos climáticos se obtuvieron de la estación meteorológica del INTA Alto Valle y los índices termoacumulativos se calcularon a partir de los datos de temperaturas. Los mejores modelos fueron seleccionados según los criterios de información. El MMNL multinivel permitió discernir y cuantificar las fuentes de variabilidad estocástica en diferentes niveles, lo que permitió obtener mejores criterios de información en comparación con los modelos que solo consideraron un único nivel de variabilidad entre los efectos aleatorios. La incorporación de índices termoacumulativos mejoró notablemente la performance de los modelos obtenidos.
Palabras clave: peras, curvas de crecimiento, pronóstico de producción, efectos aleatorios.
Artículos
Incorporation of environmental covariates to nonlinear mixed models describing fruit growth
Yield prediction is considered vital information for growers as it defines management and marketing strategies. In many commercial orchards yield prediction has become a frequent practice, including total production and fruit size distribution. This information enables farmers to improve packaging, storage and marketing decisions. In marketing, differences of a few millimeters in fruit size have great implications on money returns (Giménez and Tassile, 2015).Until now, there were two main methods for predicting fruit size at harvest: stochastic models and curves. Stochastic models are based on the assumption of a relation between tree fruit-load and final size. Growth curves attempt to describe fruit growth patterns through nonlinear, particularly sigmoid models. These curves provide an upper asymptote indicating maximum reachable sizes, an inflection point showing maximum variation rate and a lower asymptote providing initial fruit size (Alvarez and Boche, 1999).
Different functions have been proposed to describe apple growth. A double sigmoid curve considering three well-defined phases (Magein, 1989), an expolinear model (Lakso et al., 1995), a logistic model (Ortega Farias et al. 1998), and a negative exponential function (Zadravec et al., 2014). As regards pears, fruit growth has been well described by logistic curves in “Williams” cultivar (Bramardi et al., 1997; Giménez, 2012) in which the third parameterization has been proved the most appropriate (Ratkowsky, 1983).
Models fitting growth patterns of a specific cultivar in a certain region are not generalizable to other crop conditions. Final fruit size depends on many factors, such as genotype, crop management, weather, and fruit number (Bramardi et al., 2006; Jiménez and Royo Díaz, 2003). Climate directly and indirectly affects growth, yield and quality of crops (Oliveiora Aparecido et al., 2016).
Several studies have shown the relationship between weather and yield in different crops. For instance, maximum and minimum temperatures affect grain crop yields (Kumar et al., 2014); temperature, precipitation and pollen emission are used to estimate olive production (Galan et al., 2004); meteorological variables are used to predict fruit number per tree in oranges (Pasqua et al., 2007); and degree-days and potential evaporation influence apple diameter (Kaack and Pedersen, 2010).
Incorporating environmental variables into growth curves becomes possible through non-linear mixed models (NLMM), characterizing typical fruit-growth patterns, and contemplating the possibility of incorporating random effects according to clustering levels, modelling the correlation error, and integrating covariates that characterize the growing season. In addition, NLMM are easier to interpret, consider parsimony, and validate regardless of the range of observed data (Pinheiro and Bates, 2000).
Fixed effects of the structural components in NLMM are based on longitudinal data obtained from equatorial diameters of certain fruits throughout their growth cycle. However, in this way, random effects estimation is conditioned to restrictions imposed by the initial fruit selection. If the models have predictive purposes, it is necessary to generalize those to a larger population. This is achieved by considering transversal measurements defined as random samples of equatorial fruit diameters in a reduced number of growth cycle moments (Tassile et al., 2015).
Moreover, error correlation structure needs to be included since randomizing the time factor is not possible for repeated measurements over time on the same fruit.
The inclusion of random parameters on different clustering levels could significantly improve model fit. Furthermore, stochastic residual variability present in a significant magnitude indicates that some fixed effects like genetics, topography, or climate factors have been ignored by the model (Calama and Montero, 2005). Information on these factors is not usually available in growth data, therefore, random effects can approximate these fixed effects, either unknown or difficult to quantify.
Random effects are also useful to obtain Empirical Bayesian Estimators (EBEs). EBEs can examine groups of interest, subpopulations, or reveal the existence of quantifiable fixed effects not considered but detected through associations between these and possible sources of variability in any of the levels considered of the random effects. Environmental covariables arise here as good alternatives to characterize the effects of site and growth cycle.
The literature cites several studies based on NLMM explaining pear growth considering correlated data and random effects (Tassile et al., 2002), blueberry growth analyzing three different cultivars in low temperature environments (Godoy et al., 2008); prediction of crop yield of young eucalyptus plantations and description ofsite quality (Carrero et al., 2008), among others. This study aimed to include random effects and environmental covariates to NLMM for example, explaining pear growthin “Williams” pears in the Upper Valley of Río Negro and Neuquén.
Longitudinal and Transversal data
This study was conducted in seven “Williams” pear orchards located in the upper valley of Río Negro and Neuquén, during 16 seasons for longitudinal data (9 seasons for the modeling base and 7 seasons for the validation base). The orchards are managed under usual regional practices in terms of pruning, sanitary treatments, thinning, and fertilization. Therefore, the models were fit to trees with optimal fruit load preventing fruit load from being considered a factor affecting them.
Considering each orchard and season, some trees were randomly selected and fruits of three size strata (small, medium and large) were carefully chosen. To define size strata, maximum and minimum size of 250 fruits was recorded and size range was calculated and later divided into small, medium and large categories. In total, considering all seasons, 46 trees and 1,599 individualized fruits were measured, from which 17,667 records of equatorial diameter were obtained.
Each fruit was identified and its equatorial diameter was weekly measured with an electronic digital caliper (Essex Stainless Hardened). Measurement moments are referenced as days after full bloom (DAFB), starting 25-30 DAFB with average diameters of 10-15 mm and finishing at harvest time. Two different sides of the equatorial zone of the fruit were considered. Each fallen fruit was replaced by another of similar size and the same tree.
Transversal data were obtained from eight sites that arise from the possible combinations between four seasons and two orchards. In December, at approximately 70 DAFB, some pear trees were selected and equatorial diameter of all their fruits was measured. Then, at harvest, those fruits were collected recording equatorial diameter and weight.
Environmental data
In order to consider potential environmental covariates, daily average temperature was calculated from maximum and minimum temperature recorded from August 1st to January 31st for each of the 16 seasons considered. Data was obtained at the agroclimatic station of INTA Alto Valle.
From daily temperature, 976 thermal accumulation indices (ITAC) were generated for different moments in the season, and 594 ITAC for December in all seasons. These indixes depend on varying criteria and according to accumulation starting point, accumulation completion, and accumulation base temperature.
Data analysis
Model fittings were done using NLME and LMER library of R software (R CORE TEAM; 2018). Likelihood is obtained by Taylor expansion according to the Best Linear Unbiased Estimates (BLUPs) linearization method proposed by Lindstrom and Bates (1990) to obtain NLME estimates with random factors. Models obtained by NLME present nested effects, or a very simple cross-effects structure, allowing direct heteroscedasticity modelling and residual correlation.
Then, the best model was selected based on the Information Criteria Index (ICI). The significance of fixed effects was evaluated by t-tests, based on Wald statistics with degrees of freedom obtained from the Between-Within method.
The NLMER function of the R package lme4 version 1.1-7 was used to obtain estimations. This function calculates likelihood from LAPLACE approximation for multilevel NLME, both nested and crossed.
When necessary, dummy variables were generated to model the fixed effects of the parameters, when it was impossible to obtain them directly from the NLMER function.
Once the estimation process was finished, the variance component structure was evaluated by the ICI, while fixed effects inference was made from likelihood ratio tests (LRT).
Since the obtained models have predictive objectives, the final choice is conditioned by the performance of the predictive criteria (PC) obtained for each candidate model.
The general environmental covariate base was built with a function obtaining representative values for each season based on TCIs. Environmental covariate candidates were selected from the correlations between random effects EBEs at site level and the general environmental covariate. The final selection consists of visual insights into the distribution of candidate environmental covariates and their relation to the EBEs under consideration.
Selected environmental covariates are considered in the construction of new NLME as fixed effects and considering present and/or absent terms in the random effects variance structure.
Fitted model from longitudinal data
The obtained models forecast fruit size as equatorial diameter according to the DAFB, unlike other authors who suggest measuring the fruit weights or volumes (Lakso et al., 1995). In particular, growth is described by a logistic curve in third parameterization, agreeing with Bramardi et al. (1997) and Giménez (2012) who studied pears cv. ‘Williams’ and ‘Packham’s Triumph’ in the same region. Other authors have adjusted double sigmoid curves (Magein, 1989), expolinear models (Lakso et al., 1995), negative exponential functions (Zadravec et al., 2014), or quadratic functions (Atay et al., 2010).
The factors used to fit the logistic model were SITE (as the combination between seasons and orchards); TREES (which were randomly selected in each SITE); FRUIT (which were randomly selected for each TREE from 3 strata of SIZES). Based on this, the proposed model (equation 1) included SIZE as fixed effect for parameters of the structural components and random effect generated for FRUIT, TREE, SIZE*TREE and SITE. The least was incorporated to extend the inference space beyond the observed range.
With
Where
β1 parameter inversely related to the upper asymptote;
β2 parameter that relates the upper asymptote (UA) to the lower asymptote (LA), through the function: eUA /LA;
β3 parameter related to the growth rate from the initial values to the final values;
βi, SIZEkeffect of the jth size for the ith parameter of the fixed effect (i: β1, β2, β3; k: small, medium and big);
bi, SITEjeffect of the jth site for the ith parameter of the random effect with bi, SITEj ~ N ([0,0,0], DTREE);
bi, TREE (SITE)l(k)effect of the ith tree within the jth place of the ith parameter of the random effect with bi, TREE(SITE)l(j) ~ N ([0,0,0], DTREE);
bi, SIZE*TREE(SITE)kl(j) Interaction between the kth size and the lth tree of the jth site for the random effect parameter with bi, SIZE*TREE(SITE)kl(j) ~ N ([0,0,0], DSIZE*TREE );
bi, FRUTOm(jlk) Effect of the mth fruit of the jth site, the kth size and the lth tree of the random effect with bi, FRUTOm(jlk) ~ N ([0,0,0], DFRUIT);
Ɛjlk Random error with Ɛjlk ~ N (0, Ϭ2) y Ϭ2= Ϭ2 Inixni where Inixni is an identity matrix and ni is the number of repeated measurements per fruit
Alternative models are shown in table 1. The ICI values suggested model 2 as the best option. Then, in order to consider the possible correlation, different models were proposed based on Model 2 (table 2). New ICI values suggested model 2b as the best model.
Fixed effects, random effects and ICIs obtained for the fitted models.
Specifications of models 2a, 2b and 2c, and their ICIs.
Specifications of models 2_1, 2_2, 2_3, 2_4, 2_5 and 2_6 and their ICIs.
Transversal data entry
The new candidate models incorporated an additional category in SIZE and a variable called TYPE, discriminating between longitudinal and transversal data (table 3). Using both transversal and longitudinal measurements aimed to leverage strengths of each data source. On one hand, longitudinal measurements contribute to estimating shape parameters. On the other hand, transversal measurements improve variability estimation of random effects, generalizing inference to a broader population.
The best model was Model 2_3, even with poor predictive performance while Model 2_4 presented low significance in some terms. Therefore, Model 2_5 and Model 2_6 were the best candidates. However, inconsistencies between the PC and IC have already been found in other studies modeling error covariance-variance structure (Huang and Meng, 2009; Meng and Huang, 2012). These researches emphasize the purpose of judging the candidate models, i.e. if predictive models are sought, using CI is more convenient, while if predictions are sought, PC constitutes a better option.
In this study, the random effect of the fruit competed with modelling the residual structure of the error when considering serial correlation over time for the same fruit, generating the inconveniences obtained between ICIs and PCs. A similar situation occurred when estimating more than one random effect from fruits singly measured (transversal data). For this, we proposed a modelling strategy generating differential random effects (bi) considering data nature as in Models 2_4, 2_5 and 2_6 (table 4). The best model with the best parsimony was Model 2_10, followed by Model 2_11.
Specifications of models 2_7, 2_8, 2_9, 2_10 and 2_11 and their ICIs.
Environmental covariates
The incorporation of environmental covariates explaining bi at site level was evaluated to increase model predictive capacity. The TCIs were considered at two moments of fruit growth, after full bloom and during the month of December (65 to 95 DAFB). Previous studies have shown the differential effect generated by TCIs, depending on the phenological moment of fruit growth. High temperatures in the early stages of growth produce larger fruits, whereas high temperatures in December result in smaller fruits (Tassile et al., 2005).
Considering all TCIs, a variable selection process was used based on their relationship with the empirical Bayesian estimators of bLONG,3,SITEj and bTRANS,3,SITEj of model 2_10 and model 2_11. A total of three TCIs were selected: i) daily temperature accumulation exceeding 16 degrees from 46 DAFB to 100 DAFB (COV_1) in early stages of fruit growth; ii) daily temperature accumulation exceeding 22 degrees from 122 DAFB to 150 DAFB (COV_2) in December; iii) daily temperature accumulation exceeding 17 degrees from 60 DAFB to 100 DAFB (COV_3) in early stages of fruit growth.
Many models were proposed (table 5), all of them having β1TYPEr, β3TYPEr as fixed effects. The selected model parameters are shown on table 6 and figure 1. The models that include TCIs require information after the common moment of diameter measurement. Measurements are made from 65 DAFB to 79 DAFB, while some TCI contemplated in the proposed models can only be obtained from 95 DAFB onwards. For this reason, forecasting between 80 and 95 days after full bloom, the models with TCI cannot be used, while if the forecast is made after 95 DAFB, models including them can be used. However, we observed that model predictive capacity not including TCIs depends to a great extent on the prediction moment, while models including TCI showed high predictive performances in all situations considered.
Specifications of models 2_12, 2_13, 2_14, 2_15, 2_16, 2_17, 2_18, 2_19, 2_20 and 2_21.
The incorporation of environmental covariates is of the form: βi+COV_i*TCI.
Parameter estimation for fitted models.
ϬbX,1i, ϬbX,2,i and ϬbX,3,i are random effects at SITE level. ϬbX,1ij, ϬbX,2,ij and ϬbX,3, ij are random effects at FRUIT level.
(1) COV_1; (2)COV_2; (3)COV_3; (NS)Not significance; (**)Significance (p<0.01). Test based on Wald statistics with normal asymptotic distribution.
Fruit growth model 2_15 for seasons in the modeling base for longitudinal data. Shows the fruit sizes (mm) as a function of days after full bloom.
The good performance of models contemplating TCI can be explained by temperature effects on cell expansion throughout December and, therefore, in final fruit size obtained, in accordance with Tassile et al. (2005) and Giménez, (2012).
Generally, variables expressing temporal or local spatial fluctuations increase accuracy and reduce NLME prediction bias. The results obtained in this work agreed with those found in other study areas such as forester (Crecente Campo et al., 2010; Calama and Montero, 2005; Newton and Amposah, 2007); in pharmacodynamics (Mandema et al., 1992; Wakefield, 1996; Wahlby, 2002; Ribbing and Jonsson, 2004; Bonate, 2006; Bertrand et al., 2011; Hagenbunch, 2011), in epidemiology (Distiller et al., 2010); in animal production (Furtado Campos et al., 2014). Also, previous results on the differential effect of thermal accumulation on fruit phenological moments were corroborated. This is how in early growth stages, high temperatures stimulate cell growth and fruit potential size, while at the end of the cycle high temperatures negatively affect growth rates (Tassile et al., 2005; Tassile et al., 2006; Rodriguez, 2009).
Multilevel NLMM showed the advantage of discerning and quantifying stochastic variability sources at different levels, obtaining better ICI and CP in comparison to models considering a single level of variability in random effects.
The candidate models resulting from modelling both transversal and longitudinal data were multilevel models with random effects at site and fruit level. In addition, the capacity of some multilevel models without considering TCIs to achieve high ICI was very interesting. This occurred due to the ability of site-level random effects to account for stochastic variability without requiring measurement or identification of the variation source.
The incorporation of TCIs characterizing different moments of fruit growth to the NLMM was favorable.
Fixed effects, random effects and ICIs obtained for the fitted models.
Specifications of models 2a, 2b and 2c, and their ICIs.
Specifications of models 2_1, 2_2, 2_3, 2_4, 2_5 and 2_6 and their ICIs.
Specifications of models 2_7, 2_8, 2_9, 2_10 and 2_11 and their ICIs.
Specifications of models 2_12, 2_13, 2_14, 2_15, 2_16, 2_17, 2_18, 2_19, 2_20 and 2_21.
The incorporation of environmental covariates is of the form: βi+COV_i*TCI.
Parameter estimation for fitted models.
ϬbX,1i, ϬbX,2,i and ϬbX,3,i are random effects at SITE level. ϬbX,1ij, ϬbX,2,ij and ϬbX,3, ij are random effects at FRUIT level.
(1) COV_1; (2)COV_2; (3)COV_3; (NS)Not significance; (**)Significance (p<0.01). Test based on Wald statistics with normal asymptotic distribution.
Fruit growth model 2_15 for seasons in the modeling base for longitudinal data. Shows the fruit sizes (mm) as a function of days after full bloom.