Abstract: In this study, genetic programming (GP) was the systematic method used to identify the structure and parameters of a mathematical model for ethanol fermentation. The mathematical model simulated the effect of temperature on the kinetics of batch ethanol fermentation and helped to find out the optimum temperature for better performance of the process. Saccharomyces cerevisiae CSI-1 growing in cane molasses-based media was the microorganism used in all the experiments. Achieving the model's precision in describing the experimental observations involved the estimation of its structure (non-linear principally) and its constant parameters. The model found describes the fermentation kinetics and showed a fair prediction for dry cell weight (DCW), colony forming units/mL (CFU/mL), residual sucrose (RS), residual glucose (RG), and ethanol concentration (E). The model was used to optimize the operating conditions of the process. The predictions from the model in terms of mean square error (MSE) and sum squared error (SSE) fitted the experimental data well with fitness values in a range of
Keywords: Ethanol fermentation, genetic programming, modeling, Saccharomyces cerevisiae, temperature effect, optimization.
Articles
Modeling the effect of temperature on ethanol fermentation using Saccharomyces cerevisiae CSI-1 by genetic programming
Received: 02 June 2022
Accepted: 07 November 2022
Published: 31 October 2023
For ethanol fermentation to yield maximum results, it is crucial to optimize the operating conditions of the process (Asaithambi et al., 2021; Ciesielski & Grzywacz, 2021; Chen et al., 2021; Estevão et al., 2021; Gao et al., 2022; Li et al., 2021; Rodman & Gerogiorgis, 2020; Shen et al., 2021; Urtubia et al., 2021). Among the key parameters, fermentation temperature plays a significant role in achieving high productivity. As ethanol production involves an exothermic reaction, it impacts the metabolism of the microorganism employed. While tropical regions naturally favor heat, controlling the fermentation temperature within the optimal range of 25-35 °C becomes challenging and economically unviable due to the substantial energy required to prevent heat-induced inactivation of yeast cells (Banat et al., 1992). Therefore, there is a quest for robust microorganisms that can efficiently ferment substrates in hot environments while tolerating high ethanol concentrations. The cost of controlling fermentation heat increases due to cooling requirements (Liu et al., 2019). It is worth noting that industrial ethanol fermentation relies on sugars derived from starchy materials, sugarcane juice, molasses (Lopes et al., 2016; McAloon et al., 2000; Samaniego-Sánchez et al., 2020), and specific substrates for distilled alcoholic beverages (Solís-García et al., 2017). Determining feasible products based on available raw materials and established technologies is essential in industrial zones (Zhao et al., 2020; González-Herrera et al., 2016). In ethanol production, sugarcane addresses the raw material availability concern, but temperature remains a critical factor to be studied (Rivera et al., 2017). Modeling has been employed to elucidate cell growth in relation to temperature (Abunde et al., 2019; Nor-Khaizura et al., 2019; Pereira et al., 2020). However, the outcome of simulation and optimization tools heavily relies on the quality of the mathematical model (Carrillo-Ahumada et al., 2020; Castillo-Santos et al., 2017; Darvishi et al., 2020; Díaz &Tost, 2018; Goelzer et al., 2009; Hebing et al., 2020; Jorayev et al., 2022; Meng et al., 2021; Müller et al., 2020; Rodríguez-Mariano et al., 2015; Salmi et al., 2021; Torralba-Morales et al., 2020; Vignesh & Chandraraj, 2021; Wu et al., 2015). Typically, fermentations rely on ideal laboratory conditions (e.g., synthetic media, stirring devices, heating modes), and it is preferable to consider industrial processes (real fermentation media, steady state, etc.) (de Andres-Toro et al., 1997). Modifying processes introduces new situations, necessitating extensive experimentation to generate data for constructing novel models. Consequently, developing a phenomenological process model becomes challenging due to limited understanding of the physicochemical phenomena and associated kinetic and transport mechanisms (Cheema et al., 2002). Moreover, the nonlinear dynamics of the process further complicate modeling (Feil et al., 2004). Given these challenges, hybrid approaches have emerged, such as genetic programming (GP), an evolutionary artificial intelligence technique for developing mathematical models based on input-output data, in contrast to conventional regression and neural network modeling techniques (Babu & Karthik, 2007). The study and modeling of ethanol fermentation processes have proven effective in improving product quality, enhancing process control, and reducing costs (Fan et al., 2015). genetic programming (GP) has successfully been utilized to model the glucose to gluconic acid bioprocess, resulting in increased overall productivity and improved interaction between dissolved oxygen and fungal mycelia (Babu & Karthik, 2007). In this research, GP was employed to develop a mathematical model that describes the impact of temperature on microorganism growth and ethanol yield. The approach followed the work of Madár et al. (2005), which focused on identifying the structure and parameters of a mathematical model using experimental data. Abonyi (2005) developed a MATLAB toolbox for this purpose, which was utilized in this research without modification. The algorithm selected for parameter and structure identification does not require a pre-defined experimental design and produces satisfactory results with limited data for correlation, as demonstrated in the work of Ramírez-Hernández et al. (2017). This methodology has found applications in various domains such as algorithms, biotechnology, computing, process control, data mining, and modeling (Banzhaf et al., 1998; Dorgo et al., 2021; Kumar et al., 2014). Germec et al. (2020) and Esfahanian et al. (2016) conducted parameter identification based on the Gompertz equation. However, this research focuses on establishing correlations using available data specific to the alcoholic fermentation process, rather than relying on a pre-defined mathematical structure.
Mathematical models play a crucial role in various scientific disciplines as they enable the description and prediction of system behavior under different conditions by utilizing variables. In this research, the symbolic optimization algorithm known as genetic programming (GP) was employed, following the approach proposed by Madár et al. (2005). This method facilitates the identification of both the model's structure and its parameters using experimental data. The key considerations in this approach include the following:
The target vector represents the adjustment function based on the mean square error (MSE) between the calculated data and the measured output values. Here, denotes the decision vector. In "Eq. 2," the parameters include , the number of samples used for model identification; , the experimental output; , the calculated output; k, the sample index; and , the vector of regression variables. In "Eq. 3," the parameters consist of nonlinear functions and model parameters . genetic programming (GP) is a systematic method that employs natural evolution to automatically generate algorithms and expressions for the identification of mathematical models for specific problems (Koza & Poli, 2005). These expressions are encoded as a tree data structure, with functions as nodes and terminals as leaf nodes, allowing the generation of nonlinear input-output models. An orthogonal least squares algorithm is applied to estimate the contribution of tree branches and create a precise model (Brameier & Banzhaf, 2007). To develop the appropriate model for ethanol fermentation and similar systems, the GP MATLAB™ toolbox (available at http://www.mathworks.com/matlabcentral/fileexchange/47197-genetic-programming-matlab-toolbox) (Abonyi, 2005) was utilized (Datta et al., 2019; Grosman & Lewin, 2004; Kummer et al., 2019). The parameters of GP are presented in "Table 1."
In this research, the variables are considered as decision variables. In addition to the results obtained using "Eq. 1," the following adjustment metrics are utilized: the coefficient of determination ("Eq. 4"), which describes the fit between experimental and calculated data; the root mean squared error (RMSE) ("Eq. 5"), and the sum of squared error (SSE) ("Eq. 6"). The objective is to develop a model that can elucidate the impact of temperature on ethanol fermentation productivity, particularly how temperature influences the growth of microorganisms and, consequently, the system's ethanol concentration over time.
The structure of the article is the following: Section 2 shows the experimental and computational methodologies; The results and discussion are shown in Section 3. Finally, some remarks and conclusions are exposed in Section 4.
A multi-objective optimization statement without loss of generality is defined as follows:
subject to: g(θ) ≤ 0, h(θ) ≤ 0 and ≤ θi ≤ θi with . Where is defined as the decision vector, as the objetive vector and , as the inequality and equatilty constraint vectors, respectively; θi , correspond to the lower and upper bounds in the decision space.
Since there is no single solution that is optimal for all objectives, a set of solutions called the Pareto set is defined. Each solution in the Pareto set represents an objective vector on the Pareto front. All solutions on the Pareto front are considered a set of Pareto-optimal and non-dominated solutions.
A design procedure employing multi-objective optimization techniques typically consists of three fundamental steps: 1) stating the multi-objective problem (MOP), 2) conducting the multi-objective optimization (MOO) process, and 3) performing the multi-criteria decision making (MCDM) stage (Meza et al., 2017).
• MOP statement
At this stage, the designer must make decisions regarding the design concept to address the problem, how to evaluate the performance of design alternatives, and which solutions are relevant, practical, or feasible. In the case of ethanol fermentation, the design concept refers to the operating conditions, while the design alternative pertains to specific time and temperature settings. Performance measurement requires the existence of a parametric model that establishes a correlation between the decision variables (which lead to specific design alternatives) and their performance
• MOO process
During this stage, the multi-objective optimization algorithm is implemented for the multi-objective problem (MOP). The algorithm can be ad-hoc or selected from a suitable pool of algorithms available. An algorithm is considered suitable for the problem at hand if it possesses desirable characteristics such as convergence, diversity, and relevance.
• Decision-making stage
Finally, with the approximate Pareto front, the designer will evaluate the trade-offs between conflicting design objectives and consider the design alternatives. The goal is to select a solution that strikes a preferable balance in performance for the specific problem. Procedures and visualization tools play a crucial role in assisting designers, particularly when dealing with four or more design objectives.
Saccharomyces cerevisiae CSI-1 (abbreviated as CSI) yeast was used as the microorganism in this study. A vial of CSI stock cultures, stored at -10 °C, was thawed and used to refresh the cells in tubes containing 10 mL of culture medium prepared with 20 g/L of glucose and 5 g/L of yeast extract (YE). The medium, sterilized for 20 minutes at 121°C prior to inoculation, was then incubated at 37°C for 24 h in a Shel Lab Sl incubator. Subcultures were performed monthly.
The refreshed culture served as the seed to prepare the inoculum using 200 mL of growth medium composed of 30 g/L of glucose and 5 g/L of YE. The inoculum was cultivated in a Shel Lab Sl incubator at a temperature of 37°C for 24 h. The pre-culture was subsequently centrifuged at room temperature using a high-speed centrifuge (Kubota model CR21G) at a speed of 6000×g for 5 min to harvest the cells. The cell pellet was then resuspended in 100 mL of sterilized water and used as the inoculum. The fermentation medium consisted of 5 g/L of YE and 100 g/L of molasses (70 g/L sucrose and 30 g/L glucose). The molasses contained a low concentration of fructose and were not monitored. The media was autoclaved for 20 min at 121°C.
The harvested cells from the inoculum were transferred to the fermenter to initiate the fermentation process. Ethanol fermentations were conducted in a 1 L fermenter with a working volume of 800 mL. The experiment was initiated at temperatures of 30°C, 33°C, 35°C, and 37°C with an agitation speed set at 100 rpm. The optical density, temperature, and pH were monitored and recorded throughout the fermentation process. Samples were withdrawn every 3 h, except at 12 h and 15 h, as these periods corresponded to the logarithmic phase where the monitored parameters were predictable, stable, and reproducible.
The growth of CSI yeast was monitored by measuring the optical density (OD) of cells at a wavelength of 575 nm using a UV-Vis Spectrophotometer and correlated to dry cell weight.
Colony forming units (CFU/mL) were determined using the decimal serial dilutions method (100 - 900 μL) of 101-107 or 108 in 1.5 mL tubes. An aliquot of 100 μL was spread on a PDA plate medium.
Fermentable sugars (glucose and sucrose) and ethanol were analyzed using an enzymatic method with a model BF-5D (Oji scientific instruments Co., Ltd. Japan) analyzer.
This section presents the computational methodology used in this research work. Firstly, the experimental dynamics of the fermentation process were observed using the original experimental data. Subsequently, based on the experimental data and utilizing GP (Banat et al., 1992), a nonlinear mathematical model was identified to represent the variable responses: dry cell weight (DCW), residual glucose (RG), residual sucrose (RS), ethanol (E), and colony forming units/mL (CFU/mL) as functions of time and temperature. Then, to validate the obtained model, correlation metrics between the model and experimental data were performed. Finally, multi-objective optimization was employed to determine if there is a certain correspondence between the previously obtained data and the results provided by optimization.
The experimental dynamics of the fermentation process are presented using the experimental characteristics DCW, RG, RS, E, and CFU/mL (see "Tables 2-6" and "Figures 1-2").
"Table 2" displays the dynamics of DCW as a function of time and temperature. The parameters that are within a limited range are DCW and CFU/mL, which show a strong correlation. It is preferable to have low values of DCW and CFU/mL to achieve high specific productivity. However, DCW or its equivalent CFU/mL correlates with the volumetric productivity, represented by ethanol concentration (E) ("Figure 1"). Ethanol (E) can be considered as a reference for analyzing the coupling of the other parameters. This is the optimal way to track the fermentation kinetics, considering that E is the final product of the substrate metabolism. Three intervals can be identified for E ("Figure 2"): a) , b) and c) . For the first interval a) , high values of DCW, low values of RG, low values of RS, and high values of CFU are required and were obtained as expected. This trend is evident since higher CFU production leads to increased ethanol production.
For the second interval (b) , high and medium values of DCW, low values of RG, high values of RS, and high values of CFU are required. In contrast, for the third interval , low and medium values of DCW, high, medium, and low values of RG, high values of RS, and high, medium, and low values of CFU are observed to be required. This trend is also reasonable because lower CFU results in lower production of E. One solution to this situation is to allow the fermentation to continue for a longer time. However, this approach is unfavorable from an industrial perspective. This situation arises since the yeast acts as a biocatalyst, consuming the substrate according to the reaction rate. Glucose is the first substrate to be consumed due to the diauxic phenomenon, and then sucrose is metabolized once the initial glucose present in the molasses is depleted. At the end of fermentation, the desired product, E, is produced consequently correlated with the consumed substrate, the concentration of cells (DCW), and the operating temperature. "Table 3" shows the dynamics of RG as a function of time and temperature. The decrease in RG is inversely proportional to time, with the most drastic results at . There are similarities in the RG values, specifically at: (a) g/L, (b) RG = 28 g/L, (c) RG = 0.2 g/L and (d) RG = 0 g/L.
For (a) g/L, the operating conditions correspond to: T = 33 °C, t = 0 h and T = 35 °C, t = 0 h. For (b) RG = 28 g/L, the operating conditions correspond to: °C, t = 0 h and T = 37 °C, t = 0 h. For (c) RG = 0.2 g/L, the operating conditions correspond to: T = 30 °C, t = 24 h and T = 35 °C, t = 21 h and T = 37 °C, t = 21 h. Finally, for (d) RG = 0 g/L the operating conditions correspond to: T = 33 °C, t = 24 h and T = 37 °C, t = 24 h. The decrease in glucose concentration is due to the cell's metabolism for cell reproduction, converting glucose into ethanol as a product. Glucose is preferred over sucrose as a substrate because it is thermodynamically favorable.
"Table 4" displays the dynamics of RS as a function of time and temperature. The decrease in RS varies with time, with the most significant changes occurring at . This characteristic shows similar values for all operating conditions. However, the operating conditions T = 33 °C, t = 12 h, and T = 37 °C, t = 24 h do not follow the same trend as the other operating conditions. As part of the normal yeast metabolism of glucose and sucrose, the cell undergoes diauxic shift, where glucose is preferred over sucrose (Peng et al., 2015). Only when glucose is depleted does the yeast start to consume sucrose. It is known that Saccharomyces cerevisiae can utilize simple sugars but are unable to use monosaccharides such as xylose. Therefore, this model does not apply when the substrate comes from lignocellulosic fermentable sugars (Nawaz et al., 2020).
"Table 5" presents the dynamics of ethanol (E) as a function of time and temperature. The increase in E is directly proportional to time, with the most notable changes occurring at . For the time interval at at h, the production of E increases twice that of at h. Some operating conditions that do not present the same trend as the others are the following: (a) E = 0 g/L, (b) E = 0.8 g/L and (c) E = 6 g/L with T = 35 °C, t = 0 h with; with T = 35 °C, t = 3 h and T = 37 °C, t = 6 h respectively. Again, this small difference is due to slight differences in the initial inoculum to the main fermentation.
"Table 6" displays CFU/mL as a function of time and temperature. The increase in CFU/mL is directly proportional to time, but there is a stationary phase observed at . This characteristic exhibits a similar trend across all operating conditions and is often observed in various microorganisms when important nutrients or cofactors are depleted. “Figures 1-2" illustrate the interrelationship among all the fermentation process characteristics in a combined manner.
The characteristics that are in a limited range are DCW and CFU/mL, and their influence on the other characteristics is observed in the production of ethanol ("Figure 1"). Characteristic E (ethanol) can be considered as a reference to analyze the coupling of the other characteristics, for which there are three intervals ("Figure 2"): a) E > 40 g/L, b) g/L and c) g/L. For the first interval a) g/L, high values of DCW, low values of RG, low values of RS and high values of CFU/mL are observed. For the second interval b) g/L, high and medium DCW values, low RG values, high RS values and high CFU/mL values are observed. Finally, for the third interval g/L, low and medium values of DCW, high, medium, and low values of RG, high values of RS and high, medium, and low values of CFU/mL are observed for the kinetics of the fermentation. The situation is due to the consumption of substrate by the yeast obeying the reaction rate. The first substrate consumed is glucose, then the diauxic phenomenon biases the reaction toward sucrose consumption. At the end of the fermentation, the ethanol produced as desired product is a consequence correlated with the substrate consumed and the temperature.
The experimental data were used in the GP MATLAB™ toolbox to obtain the mathematical model that describes the behavior of the fermentation process. The parameters of GP are described in "Table 1." The mathematical models are:
where, the parameter models are: DCW is dry cell weight, CFU is colony forming units, RG is residual glucose, RS is residual sucrose, E is ethanol concentration, t is time and T is temperature considering the following restrictions: if DCW < 0 then DCW = 0; if if then ; if then and then . The mathematical models are compared with the experimental data ("Figure 3"), and it can be observed that the mathematical model fitted the experimental data values fairly.
The fitness is higher than as reported in "Table 7." Therefore, "Eqs. 7-11" adequately represent the experimental data and are used to optimize the characteristics of the fermentation process.
In this research work, it is considered that the ethanol fermentation process presents the following characteristics: (DCW) g/L, (RG) g/L, (RS) g/L, (E) g/L and (Log10CFU/mL). These characteristics are considered as dependent variables of the process. The independent variables of the process are time (t) and temperature (T). The relationship between the dependent and independent variables is shown in "Eqs 8-12". Some bibliographic references that consider a methodology like this work is Ramírez-Hernandez et al. (2017) where they consider: a) set of data obtained experimentally from the dependent variables of independent, b) identification of the structure and parameters of the model and finally c) optimization in numerical simulation of the best operating conditions. For point c) optimization in numerical simulation of the best operating conditions in this research work, the following is considered. The multiobjective optimization problem can be posed by finding the values of X1 and X2 with therefore:
where θ is time (X1) and temperature (X2) subject to the decision variables are and . The definition of the decision variables allows the optimization algorithm to define the search space for potential solutions. In this work, the search space was determined with respect to a previous experimentation ("Tables 2-6"). To obtain potential solutions the MATLAB optimtool/gamultiobj (multiobjective optimization using genetic algorithm) toolbox was used.
"Figures 4a-4b" present the operating conditions and corresponding characteristics obtained from the model and experimental data, showing a correspondence between them. In contrast, "Figures 4c-4d" display the set and Pareto front of operating conditions obtained through the optimization algorithm, along with the characteristics obtained through simulation. Additionally, the maximum values obtained from both experiments and simulation are also indicated.
It is observed that there is a highly aggregated set in the region corresponding to a time interval of 10 to 15 h, but the maximum value is found with the operating conditions that meet °C and . One advantage of simulation is that it allows us to observe operating conditions that could lead to better performance of the process, which may include regions that were not explored in experimentation. The model was evaluated through experimentation and yielded satisfactory results.
These results precisely reveal the desired conditions for an exothermic process like ethanol fermentation. Operating ethanol fermentation above 37°C is crucial for tropical countries due to its significant economic advantages. The advantage lies in the fact that if fermentation is conducted at temperatures higher than 37°C, it becomes easier to control using heat exchangers, especially considering that the cooling water temperature in tropical countries typically ranges from 30-35°C. Moreover, operating at temperatures above 37°C is also an economic advantage for processes with a duration of less than 24 h, as it reduces energy consumption required for cooling equipment. Operating at 37°C is a characteristic of the selected microorganism, which can produce ethanol even at temperatures as high as 45°C. However, operating at such elevated temperatures is not advisable as it negatively affects the viability of the microorganism. Maintaining viability is crucial to reuse the microorganism for multiple cycles of operation.
The values calculated with the mathematical model align well with the experimental data. Thus, the models can be effectively applied to predict the kinetics of the fermentation process, including cell growth and ethanol production from sugarcane molasses across a range of temperatures, demonstrating the accuracy of the proposed model. Finally, the optimization of ethanol fermentation characteristics highlights the identification of feasible operating conditions.
The authors would like to thank Ayaaki Ishizaki, Emeritus Professor of Kyushu University to allow the use of the strain Saccharomyces cereviciae CSI-1 a property of his company Necfer corporation, Japan.
∗ Corresponding author. E-mail address:jcarrillo@unpa.edu.mx (Jesús Carrillo-Ahumada).