A
comparative study on combinations of forecasts and their individual forecasts by means of simulated series

Aline Castelo Branco Mancuso; Liane Werner

Estatística

Received: 24 January 2018

Accepted: 28 March 2018

DOI: https://doi.org/10.4025/actascitechnol.v41i1.41452

Abstract: Over the years, several studies that compare individual forecasts with the combination of forecasts were published. There is, however, no unanimity in the conclusions. Furthermore, methods of combination by regression are poorly explored. This paper presents a comparative study of three methods of combination and their individual forecasts. Based on simulated data, it is evaluated the accuracy of Artificial Neural Networks, ARIMA and exponential smoothing models; calculating the combined forecasts through simple average, minimum variance and regression methods. Four accuracy measurements, MAE, MAPE, RMSE and Theil’s U, were used for choosing the most accurate method. The main contribution is the accuracy of the combination by regression methods.

Keywords: combinations of forecasts, forecasting, accuracy, simulation.

Introduction

Forecasting is a process that comprises uncertainty; in order to minimize that uncertainty, there is a wide range of forecast techniques and models that include these different previsions, a method known as the combination of forecasts (Martins & Werner, 2012). Since Bates and Granger (1969), combination techniques have been exhaustively compared. It is already possible to list several authors concluding that the combination of forecasts is more accurate than the best individual model (Bates & Granger, 1969; Clemen, 1989; Poncela, Rodríguez, Mangas, & Senra, 2011; Martins & Werner, 2012). However, combination alone is not enough, it is necessary to know which techniques to use and how to combine them (Werner & Ribeiro, 2006).

In short, the combination of forecasts consists of using different forecasts in a qualitative or quantitative combined way. The best forecast techniques must be selected, and, for this, it is necessary to survey which techniques show the best accuracy. In the last decade, nonlinear techniques of forecasting gained prominence in the most varied areas of knowledge, more specifically, the forecast by Artificial Neural Networks was widely compared with other individual forecasts.

The goal of this paper is to contribute to the study of the comparison of individual and combined forecasts, so that the accuracy of each method can be evaluated through simulated data. Using statistical software, this study analyzes artificial neural networks techniques, autoregressive integrated moving average (ARIMA) and exponential smoothing (ES); besides their popularity, these methods were chosen for their capability of recognizing patterns and regularities in time series. Combined forecasts were calculated through simple average, minimum variance and regression analysis. As this paper proposes a comparative study of accuracy in different forecast techniques, the comparison is performed directly with accuracy measurements. The used metrics correspond to the mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE), and Theil’s U coefficient.

This paper is structured in four sections, being this introduction the first. Section two presents a brief theoretical reference about the used techniques and models, beyond the methodology. Section three presents the simulation methodological process and the comparison results. And, at last, in section four are the main conclusions of this study.

Material and methods

Forecasting plays an important role in all areas of activity. More accurate forecasts are, therefore, sought, given the damages that a poorly formulated prevision may cause; for example, loss of opportunities for the companies, when the company’s production is below the current demand, or when a high production may generate higher storage costs and depreciation of products. In view of this, combined forecast appeared, supported by the provision of a better prevision than the best individual forecast available.

The method consists of the combination of different techniques aiming to take advantage of information from several individual forecasts, since prevision may be affected by several factors, each technique can offer a distinct contribution on the detected information (Clemen, 1989). Hence, a greater number of information can be considered through combination.

Combination can be performed using qualitative methods, quantitative methods, or both. This paper analyzes quantitative methods of forecasting based on time series. In this section will be discuss three techniques of individual forecasts, three methods of combined forecast, and the accuracy measurements used in this study.

An important feature of time series is that neighboring observations are generally dependent. Therefore, a simple way of studying the behavior of those series is from the analysis and modeling of their dependencies. For this study, three individual forecast techniques are considered: ARIMA, Exponential Smoothing, and Artificial Neural Networks (ANN). Among the reasons for the ARIMA and Exponential Smoothing techniques being here explained is the fact that they are simple, of low cost, and appropriate to carry out prevision (Makridakis, Wheelwright, & Hyndman, 1998; Morettin & Toloi, 2018). Furthermore, the neural networks technique was chosen for presenting advantages over traditional techniques (Paliwal & Kumar, 2009, Raza & Khosravi, 2015).

The most popular model for time series forecast is Auto Regressive Integrated Moving Average (ARIMA), proposed by Box and Jenkins (Xu, Qu, & Hua, 2010). ARIMA models are based on the theory of statistical modeling, in which an algorithm is used to search for an accurate model to present the smallest possible number of parameters, a parsimonious model (Lu & Abourizk, 2009). They are mathematical models that aim to capture the behavior of the series through the correlation between past values. These models result of the combination of three components: autoregressive (AR), integration (I), and moving average (MA). The Box-Jenkins algorithm assists in choosing the best model based on the graphics of autocorrelation function and partial autocorrelation function; yet, the identification of the best model requires the analyst’s experience and subjectivity (Lu & Abourizk, 2009), because the same series of data may present different models. However, compared with other techniques, this methodology still stands out for ease of use and especially for the models’ accuracy. The disadvantage of this model is its linear nature, and that the relation between the variables is nonlinear for most actual problems.

Exponential Smoothing or Exponentially Weighted Volatility is a technique that seeks to value the most recent occurrences. The technique’s popularity is due to its robustness and practicability in applications in which a great number of series must be modeled. Among the available techniques of exponential smoothing, Holt’s linear model (for series presenting the tendency component) and the Holt-Winters’ model (when the series presents both the tendency and seasonal components). Assuming that the extreme values in the series are random fluctuations, the purpose of these techniques is identifying a basic pattern (Morettin & Toloi, 2018).

Artificial Neural Networks (ANN), proposed by McCullock and Pitts in 1943, are computational techniques presenting a mathematical model based on the neural structure of the human brain, which acquires knowledge through experience. According to Khashei and Bijari (2010), ANN is one of the most accurate models of forecasting. They are defined as a distributed and parallel information processing structure. Formed by processing units (nodes, neurons or cells) and interconnected by unidirectional arcs (links, connections or synapses) in one operation of a processing unit.

McCullock and Pitts interpreted the functioning of neurons as a binary circuit. Assuming p input signals (X₁, X₂, …, X_p-1, X_p), each signal is multiplied by a number, or weight (X₁, X₂, …, X_p-1, X_p), which indicates its influence on the output of the unit. Then the additive junction of the signals (linear combination of inputs and weights of the synapses) and the activation function, which processes the signal generated by the additive junction, to generate the neuron output signal. The activation functions are three: threshold function, part threshold function and sigmoid function, which is the type of activation function most used in artificial neural networks, in this last category the logistic function.

Most of these models have some rule of training, in which the weight of their connections is adjusted according to the presented patterns. The way how neurons are organized in a neural network is associated with the type of problem and is an important factor to define the used learning algorithms. According to Othman, Khuan, Radzol, & Mansor (2018), the existing neural networks architectures can be classified in: feedforward or feedback. Artificial neural networks may be seen as a method of nonlinear regression, flexibly used as a parametric, semi- parametric or non-parametric method. The main advantage is the ability of modeling complex series without assuming a prior knowledge about the process of data generation. In addition, they are valuable when the function of the relationship between input and output is not known.

This study uses a feedforward network, for its being widely known and tested. In such networks, neurons are arranged in layers, usually two active layers. Regarding the number of intermediate (hidden) layers, Stathakis (2009) argue that it is sufficient to use a single hidden layer still holds when using regular transfer functions (e.g. sigmoidal) and the purpose of using a second hidden layer is to drastically reduce the total required number of hidden nodes. Then, this number is chosen using common sense, because if the network will be memorizing the patterns and, thus, losing the ability to generalize. When ANN memorizes a certain data set, it presents small forecasting errors for the sample period, but large error for previsions outside the sample.

The combination methods used in this study start from the method proposed by Bates and Granger (1969), who proposed combining forecasts through a linear combination of two objective, unbiased (or properly corrected) forecasts, according to Equation 1.

(1)

Where: C is the combination value, is the value of forecast 1 and is the value of forecast 2, considering a k weight for the first and (1 – k) for the second. A logical choice for k is to issue a greater weight for the forecast with smaller errors and, in this ideal, the authors proposed five ways of calculating weights (Werner & Ribeiro, 2006).

Later, other authors (Newbold & Granger, 1974; Clemen, 1989; Cai, Stander, & Davies, 2011; Wang et al. 2018) adopted that method and the studies in this area advanced. Forecast combination was extended from two to n combined techniques and started to be understood as a structured form of regression (Newbold & Granger, 1974). Table 1 lists three forecast combination methods considered for this study and their respective mathematical equations.

In Table 1, is the combined forecast for period i, with and N is the total of periods (observations), is the value of forecast p for period i so that and P is the number of forecasts (techniques) to be combined. k_p (weight of forecast p) is deduced from Equation 2, developed from the proposition of Bates and Granger (1969) for the calculation of weights. For the combination technique by regression, and are the angular and linear coefficients, respectively, estimated using the ordinary least squares method and assuming that C_i is the observed data for the variable and the forecasts of individual techniques. is the random error (noise) for period i, supposing uncorrelated errors, and that has Gaussian distribution with mean zero and constant variance.

(2)

where:

is the error of forecast p for period i.

Among the propositions of Bates and Granger (1969) for the calculation of weights, Equation 2 was selected as the most accurate, as well as the conclusions obtained from the study by Newbold and Granger (1974).

The choice of forecast model or technique depends, among other criteria, on the desired accuracy degree. With problems of forecasting in time series, an important task is to quantify the quality of the obtained prevision. In order to have an idea of how accurate a forecast technique is, it is necessary to estimate the amount of errors, allowing the comparison of several model structures. Table 2 shows the four accuracy measurements used: MAE (Mean Absolute Error), MAPE (Mean Absolute Perceptual Error), RMSE (Root Mean Square Error), and Theils’ U coefficient.

In Table 2, y_i is the actual value of period i, is the forecast for period i, so that . MAPE measurement is the most used and serves to compare forecasters (Kim & Kim, 2016; Myttenaere et al., 2016). However, RMSE measurement, which stronger penalizes the largest errors, allows the evaluation of a forecaster’s quality related to the data. As for data in which the deterministic component is dominant, Theil’s U statistics is the most suitable. Theil’s U coefficient analyzes the quality of a forecast with values between zero and one; the closer to one, the greater the inequality, and the closer to zero means that the model’s error is smaller than the naïve prediction or indicates that a forecast is better than the trivial forecast (Agostino et al., 2019).

The methodology procedures employed in data analysis were structured in five steps. The goal of this methodology is to simulate stationary time series to compare the accuracy of techniques and methods referenced previously, searching for the ones with the best forecasting performance.

The first step consists of generating data; 500 time series with 200 periods (observations) are generated. Therefore, in r-Project version 2.15.0 (available at www.r-project.org for free), the command ‘ts’ is used to create time series through the sum of ‘rpois’, ‘rbinom’, and ‘rnorm’ commands, which generates random variables with Poisson, Binomial, and Normal distribution, respectively.

Modeling series will be carried out in the second step. For each simulated series in this study, the three individual modeling proposed with 180 periods (training) are carried out, reserving the last 20 observations for accuracy tests. The ARIMA and ES models are adjusted with SPSS Statistics 18.0, which automatically estimates the best models. ANN adjustment is carried out in R-Project with package ‘neuralnet’, which contains the required functions to obtain neural networks (Günther & Fritsch, 2010). The neural network functions require the ‘hidden’ parameter (number of intermediary neurons) to be defined by the user. Ideally, that parameter should be chosen according to the behavior of each series, through the selection of models, in which several values would be tested and the network presenting the best value in a validation set would be chosen. However, given the 500 simulated series, this procedure would require high computational resources, thus, in the scope of this study, four intermediary neurons were defined. Sheela and Deepa (2013) support that this number of neurons allows the creation of networks with reduced dimension, therefore, less favorable to the phenomena of overfitting (loss of ability to generalize). The other parameters fixed in the execution of the ‘neuralnet’ package were: a hidden layer; optimization algorithm using the resilient backpropagation (RPROP) function and logistic activation.

After modeling the series and calculating their respective individual forecasts, for the next 20 periods (test), step three begins, in which forecasts are combined using the three combination methods listed in Table 1, where N = 20 and P = 3. Models are developed in ‘r-Project’. For the combination method by linear regression, function ‘lm’ in the ‘stats’ package is used.

With the last 20 reserved observations (simulated values) and respective forecasts (values from modeling), the errors in forecasts are calculated for the six methods. Based on those forecast errors, in step four, accuracy measurements for each of the three techniques and combinations are calculated.

Referring to the direct comparison of accuracy measurements, the used comparison metrics correspond to the mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE), and Theil’s U coefficient, as shown in Table 2. Finally, in step five, based on the comparison criteria measured in the previous step, the most accurate method is evaluated.

Table 1.
Forecast combination methods.

Table 2.
Accuracy measurements

Results and discussion

For each of the 500 series forecasts were obtained in six different ways: three individual forecasts and three forecast combinations, as previously specified. Each of them was evaluated as to its accuracy by four distinct measurements: MAE, MAPE, RMSE, and Theil’s U. A descriptive analysis of these measurements was carried out for each forecast method, as presented in Table 3.

Focusing primarily on the techniques of individual forecasts (ANN, ARIMA and Exponential Smoothing) in Table 3, it is clear that the three models are similar, and the ANN technique is the most accurate on average Exponential Smoothing (ES) was the model with the highest measurements, being the third in accuracy, but only for decimals.

Among the combination methods (simple average, minimum variance, and by regression), analysis by regression presented the best averages for the measurements, and the simple average was the least accurate of combinations. However, generally, combination through simple average is more accurate than individual techniques; that is, it presented the smallest results, and its forecasts were better than the worst individual forecasts, on average, as described by Jeong and Kim (2009). Overall, there was no divergence between measurements; combination by regression presented the smallest simple average for the four accuracy measurements among the six methods.

On the issue of variability, results are best visualized in Figure 1 t o 4, where boxplots for mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE), and Theil’s U coefficient are, respectively, compared for the six methods.

Graphical analysis of Figures 1 to 4 allows the verification of results from Table 3: the techniques of individual forecasts showed a similar behavior in the four graphs; and combination by regression stood out against the other methods. It is possible to visualize the potential of regression analysis for the combination of forecasts in the four figures. It is still noticeable that, in spite of being less accurate, ARIMA models showed less variability. In general, referring to the different accuracy measurements, the graphical analysis suggests a standard behavior of the methods between the four measurements, with no disagreement in the results.

However, the interest is still in the most accurate forecast for the data series. Searching the method which performs the best forecast through accuracy measurement, the method with the best results was by regression. Table 4 lists the percentage frequencies of the best method for the four measurements. Such frequencies were obtained considering the number of times each method has the best accuracy measurement, emphasizing the previously found results. In Table 4, combination by regression showed the highest frequencies, indicating this method as the most accurate for the four measurements, as in the previous results. However, in a case-by-case analysis, it is possible to highlight neural networks in second place, when the measurements are MAE and MAPE, standing out in approximately 8 and 11% of the series, respectively. These conclusions may also be observed in Table 5, which counts the frequency of series in which there was an agreement between four, three and only two of the accuracy measurements for each method.

In Table 5, out of the 500 studied series, 283 (56.6%) showed the regression method as the most accurate in the four accuracy measurements considered. Other 130 series (26%) also registered regression analysis as the most accurate forecast, but only in three measurements. When the criterion is the most accurate in two of the measurements, 141 series (28.2%) showed this condition and, again, the ANN technique was the second on the list.

For better interpretation, forecast models were graphically presented. Figure 5 illustrates the six methods for this test period (20 last observations) of the first simulated series, as well as their actual values (points).

In Figure 5, the quality of the forecast for each method in the first series may be visualized; those that follow the series fluctuations stand out. Regarding individual forecasts, as evidenced by several studies, the ARIMA and exponential smoothing techniques have less power in capturing the series behavior than modeling via ANN ( Paliwal & Kumar, 2009). The simple average combination and, therefore, in smaller scales, the minimum variance method, suffers the influence from these techniques with less prevision power (overlapped in Figure 5). At this point, combination by regression is distinguished by the way in which it ponders individual forecasts, see Table 1.

Table 3.
Descriptive analysis of measurements.

Figure 1.
Boxplot for mean absolute error. Source: developed by the authors.

Figure 2.
Boxplot for mean absolute percentage error. Source: developed by the authors.

For some professionals, the accuracy of ANN forecasts, confronted with the combinations of minimum variance and simple average, may resume the discussion of which method to employ. Therefore, it must be considered that, although combinations are not always superior to the best individual forecasts, the worst combination will always have a better performance than the worst individual forecast technique.

Figure 3.
Boxplot for root mean square error. Source: developed by the authors.

Figure 4.
Boxplot for Theil’s U coefficient. Source: developed by the authors.

Table 4.
Percentage frequencies of the best method.

Table 5.
Frequency of series in which there was an agreement between four, three and two of the accuracy measures for each method of forecast.

Figure 5.
Forecast chart. Source: developed by the authors.

Conclusion

The generation of 500 time series with 200 elements was carried out. Then, they were modeled in three individual forecasts. Afterwards, combinations were obtained. The comparison to indicate the best method, in terms of accuracy, was performed using four measures of accuracy.

Results showed that the most accurate forecasts are achieved through combination by regression, and there is a consensus between the four accuracy measurements for 56.6% of the simulated series. Secondly, individual forecasts performed via ANN stood out, surpassing other combination methods.

Combination by regression, although not a popular method, stands out for the potential showed in this study.

References

Agostino, I., Araujo, R. S., Noronha, M. O., Fonseca, J. I. S., & Souza, A. M. (2019). Combinação de previsões aplicada à modelagem de operações: um estudo de caso em um terminal portuário, Exacta, 17(1), 99-110. doi:10.5585/exactaep.v17n1.8327.

Bates, J. M., & Granger, C. W. J. (1969). The combining of forecasts. Operational Research Quarterly, 20(4), 451-468. doi: 10.1057/jors.1969.103

Cai, Y., Stander, J., & Davies, N. (2011). A new Bayesian approach to quantile autoregressive time series model estimation and forecasting. Journal of Time Series Analysis, 33(4), 684-698. doi: 10.1111/j.1467-9892.2012.00800.x

Clemen, R. (1989). Combining forecasts: A review and annotated bibliography. International Journal of Forecasting, 5(4), 559-583. doi: 10.1016/0169-2070(89)90012-5

Günther, F., & Fritsch, S. (2010). Neuralnet: training of neural networks. The R Journal, 2(1), 30-38. doi: 10.32614/rj-2010-006

Jeong, D. I., & Kim, Y. (2009). Combining single-value stream flow forecasts: A review and guidelines for selecting techniques. Journal of Hydrology, 377(3-4), 284-299. doi: 10.1016/j.jhydrol.2009.08.028

Khashei, M., & Bijari, M. (2010). A artificial neural network (p,d,q) model for time series forecasting. Expert Systems with Applications, 37(1), 479-489. doi: 10.1016/j.eswa.2009.05.044

Kim, S., & Kim, H. (2016). A new metric of absolute percentage error for intermittent demand forecasts. International Journal of Forecasting, 32(3), 669-679. doi:10.1016/j.ijforecast.2015.12.003

Lu, Y., & Abourizk, S. M. (2009). Automated box-jenkins forecasting modeling. Automation in Construction, 18(5), 547-558. doi: 10.1016/j.autcon.2008.11.007

Makridakis, S., Wheelwright, S., & Hyndman, R. (1998). Forecasting: methods and applications (3rd ed). New York, NY: John Wiley & Sons.

Martins, V. L. M., & Werner, L. (2012). Forecast combination in industrial series: A comparison between individual forecasts and its combinations with and without correlated errors. Expert Systems with Applications, 39(13), 11479-11486. doi: 10.1016/j.eswa.2012.04.007

Myttenaere, A., Golden, B., Le Grand, B., & Rossi, F. (2016). Mean absolute percentage error for regression models. Neurocomputing, 192(5), 38-48. doi: 10.1016/j.neucom.2015.12.114

Morettin, P. A., & Toloi, C. M. C. (2018). Análise de séries temporais: modelos lineares univariados (3a ed). São Paulo, SP: Edgard Blucher.

Newbold, P., & Granger, C. W. J. (1974). Experience with forecasting univariate time series and the combination of forecasts. Journal of the Royal Statistical Society, Series A (General), 137(2), 131-165. doi: 10.2307/2344546

Othman, N. H., Khuan, L. Y., Radzol, A. R. M., & Mansor, W. (2018). Detection of ns1 from sers spectra of adulterated saliva using ANN. Advanced Science Letters, 24(2), 1138-1142. doi: 10.1166/asl.2018.10703

Paliwal, M., & Kumar, U. (2009). A Neural networks and statistical techniques: A review of applications. Expert Systems with Applications, 36(1), 2-17. doi: 10.1016/j.eswa.2007.10.005

Poncela, P., Rodríguez, J., Mangas, R. S., & Senra, E. (2011). Forecast combination through dimension reduction techniques. International Journal of Forecasting, 27(2), 224-237. doi: 10.1016/j.ijforecast.2010.01.012

Raza, M. Q., & Khosravi, A. (2015). A review on artificial intelligence based load demand forecasting techniques for smart grid and buildings. Renewable and Sustainable Energy Reviews, 50, 1352-1372. doi: 10.1016/j.rser.2015.04.065

Sheela, K. G., & Deepa, S. N. (2013). Review on methods to fix number of hidden neurons in neural networks. Mathematical Problems in Engineering, 2013, Article ID 425740. doi: 10.1155/2013/425740

Stathakis, D. (2009). How many hidden layers and nodes? International Journal of Remote Sensing, 30(8). 2133–2147. doi: 10.1080/01431160802549278

Wang, Y., Zhang, N., Tan,Y., Hong,T., Kirschen, D. S., & Kang, C. (2018). Combining probabilistic load forecasts. IEEE Transactions on Smart Grid. doi: 10.1109/TSG.2018.2833869

Werner, L., & Ribeiro, J. L. D. (2006). Modelo composto para prever demanda através da integração de previsões. Produção, 16(3), 493-509. doi: 10.1590/S0103-65132006000300011.

Xu, X., Qu, Y., & Hua, Z. (2010). Forecasting demand of commodities after natural disasters. Expert Systems with Applications, 37(6), 4313-4317. doi: 10.1016/j.eswa.2009.11.069.

Author notes

werner.liane@gmail.com