A hybrid of Bayesian-based global search with Hooke–Jeeves local refinement for multi-objective optimization problems

Linas Litvinasa

Articles

Linas Litvinasa linas.litvinas@mif.vu.lt

Vilnius University, Lituania

A hybrid of Bayesian-based global search with Hooke–Jeeves local refinement for multi-objective optimization problems

Nonlinear Analysis: Modelling and Control, vol. 27, núm. 3, pp. 534-555, 2022

Vilniaus Universitetas

Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

Recepción: 18 Agosto 2021

Revisado: 02 Febrero 2022

Publicación: 28 Marzo 2022

DOI: https://doi.org/10.15388/namc.2022.27.26558

Abstract: The proposed multi-objective optimization algorithm hybridizes random global search with a local refinement algorithm. The global search algorithm mimics the Bayesian multi-objective optimization algorithm. The site of current computation of the objective functions by the proposed algorithm is selected by randomized simulation of the bi-objective selection by the Bayesian-based algorithm. The advantage of the new algorithm is that it avoids the inner complexity of Bayesian algorithms. A version of the Hooke–Jeeves algorithm is adapted for the local refinement of the approximation of the Pareto front. The developed hybrid algorithm is tested under conditions previously applied to test other Bayesian algorithms so that performance could be compared. Other experiments were performed to assess the efficiency of the proposed algorithm under conditions where the previous versions of Bayesian algorithms were not appropriate because of the number of objectives and/or dimensionality of the decision space.

Keywords: global optimization, Bayesian algorithm, Hooke–Jeeves, local refinement.

1 Introduction

The Bayesian approach is one of the most active in the development of methods for non-convex black-box optimization [1, 4, 27]. Despite active research, some challenges remain unresolved; see, e.g., the review in [27]. A serious challenge in widening the application area of Bayesian algorithms is their inner complexity. One way to reduce the complexity could be by the partition-based implementation which was successful in Lipschitz optimization [12,17,19–21,32]. Indeed, the inner complexity of partitionbased Bayesian algorithms for single and multi-objective optimization was proved to be lower than that of the standardly implemented algorithms [25,29,31]. Nevertheless, the complexity of these algorithms still limits the number of evaluations of the objective functions which would be appropriate for solving problems of higher dimensionality and problems with a larger number of objectives. In the present paper a multi-objective optimization algorithm implementing heuristically the ideas related to the Bayesian approach without using a formal model of uncertainty about the objective functions is presented. A version of the bi-objective selection of a site for the current computation of the objective functions is applied following the idea proposed in [24]. Two heuristic criteria are used for the selection of a computation point: an estimate of its distance to the Pareto front and the uncertainty of the estimate. These criteria mimic the stochastic model-based assessments used in [24] which assess the exploitation and exploration character of the search strategy. The balance between exploration and exploitation can be varied rather broadly by accepting an appropriate compromise between the expectation and uncertainty criteria. However, the local refinement of the approximate solutions is frequently more efficient using a local search method than by a locally biased global search method [18, 22, 26]. The hybridization of the single-objective Bayesian global search with local optimization methods is shown to be efficient in [25,30]. In the present paper a local search algorithm for the refinement of the approximation of the Pareto front found by the global search algorithm was applied. A version of the Hooke–Jeeves algorithm is adapted to multi-objective optimization. The proposed hybrid algorithm is tested under conditions previously applied to test other Bayesian algorithms [28, 31] so that performance could be compared. Other experiments were performed to assess the efficiency of the proposed algorithm under conditions where the previous versions of Bayesian algorithms were not appropriate. The results were compared with the results of the popular evolutionary optimization algorithms NSGA2 and NSGA3 [3,7,8] using test problems up to 15 objective functions with up to 30 variables.

2 The proposed hybrid algorithm

Black-box multi-objective minimization problems:

are considered assuming that the feasible region (decision space) is a unit hypercube to which any hyperrectangular region can be rescaled. Ranges of the objective functions are equal and rescaling is provided by the algorithm.

The algorithm consists of two alternating counterparts carried out multiple times: random global search and local refinement of the Pareto front approximation found by the global search algorithm. The Bayesian global optimization strategy would be preferable because of the rational balancing of exploration and exploitation. However, the number of iterations (computations of the objective functions) of the Bayesian algorithms is considerably limited due to their inner computational complexity. The number of iterations is limited to several hundred for the standardly implemented Bayesian algorithms, and several thousand for the Bayesian partition-based algorithms [27]. The idea is to mimic the search strategy of the Bayesian algorithm without using a stochastic model of the objective functions and thus avoiding the basic computational burden. A randomized algorithm is proposed where the criteria for selecting a point for computing the objective functions are similar but much simpler than the Bayesian approach-based criteria [24].

The reduced computational burden allows a considerably larger number of iterations; see the section on testing results. The global search phase interchanges with the local refinement phase. The approximation of the Pareto front is refined by a version of the Hooke–Jeeves algorithm adapted to multi-objective optimization. The process of alternating phases of global and local searches is terminated by a user-defined termination condition.

The considered algorithm is initialized by the predefined number of computations of the objective functions at the random points uniformly distributed in the feasible region

2.1 Global search

The proposed algorithm consists of two alternating counterparts carried out multiple times: global search and local refinement. Global search runs after the initialization or after previous global-local stages and thereafter is altered with local search. Let us consider the current start of global search; the set of points where the objective functions are already computed is denoted by

and the corresponding set of function values is denoted by

where non-dominated points of constitute current Pareto optimal solutions set in objective space and corresponding Pareto optimal solutions set in decision space This kind of global search with the uniform distribution of function evaluation points is a worst-case optimal algorithm in case function evaluation budget is predefined and optimized multi-objective functions are Lipschitz functions [23]. The main disadvantage of this method is that when a decision space dimensionality increases the function evaluation count increases exponentially. However, the worst-case is very specific: the Pareto optimal set of optimized multi-objective function have a single point, and the solutions are represented by a single point [23]. Most optimization problems in engineering and science are not worst-case so some strategies improving the method’s performance by imitating Bayesian search can be applied [2,14,29]. Bayesian approachbased optimization algorithms search new location in decision space for a new function evaluation based on trade-off values of statistically assessed function’s mean value and variance [16,28]. The location of decision space with near-optimal function’s mean value has a chance to be a new optimal point even in the case of little improvement of the function’s evaluation value. Otherwise, a location with a high function’s variance value has a chance of being a new optimal point in the case a vast improvement of the function’s evaluation value. However, the computational burden of searching location in decision space for next function evaluation is limiting a Bayesian multi-objective optimization algorithm’s application area. Coping with such computational burden the implementation based on the rectangular partition of the feasible region is proposed in [29, 31]. The vertices count of hyperrectangle doubles with every new dimension. The algorithm is an iterative hyperrectangle subdivision by bisection. The hyperrectangle is bisected by a hyperplane orthogonal to its longest edges. The function evaluations are computed at intersection points and the evaluation count is only half of the hyperrectangle vertice count. This way rectangular partition-based Bayesian algorithm is not competitive for optimization problems having high dimensional decision space.

Strategies to mimic Bayesian search should combine simplicity and numerical substantiation. Combining both mentioned properties the exploration-exploitation strategy can be used, i.e., let us say new random points (q is parameter value) are uniformly generated in feasible region and for every point bi-objective selection functions are calculated:

where is a generated point’s Euclidean distance to the closest known point and is the closest known point’s .function evaluation vector’s Euclidean distance to the closest current Pareto optimal solution In case when the closest known point’s function evaluation vector is non-dominated Pareto optimal solution the function has zero value. Please note that value is calculated using min-max normalized function values. An ideal and a nadir point values are taken as minimum and maximum function values [3]. Noormalized function values can be expressed by the following formula:

Function value can be viewed as the radius of hypersphere with center point and with no function evaluations inside of it. In case when function has a large value, a large unexplored hypersphere is found, and new function evaluation in its center (point ) is reasonable as this corresponds to the exploration strategy. On the other hand, the second function value can be viewed as the minimal distance of optimized function’s nearest-neighbor interpolation value at point to current Pareto optimal solutions. In the case of a small second objective function value, the generated point’s closest known point has near-optimal function value so the new function evaluation at point is reasonable as this corresponds to the exploitation strategy. The exploration-exploitation strategy is achieved by making new function evaluations at points having minimal trade-off Pareto optimal values of selection functions ( Function has a negative sign because it is maximized. After new function evaluations are made, the sets are updated and reused in next phases or next iterations of the algorithm. Reuse of search information is shown to be efficient for multi-objective optimization problems [11].

To induce function evaluations clustering near Pareto optimal solutions, the global search has two random points generation modes: global uniform generation in all feasible region and local uniform generation near current Pareto optimal solutions, i.e., for all Pareto optimal solutions a series of small hypercubes having center point x. is created, and the above-described procedure is applied to a particular subregion of feasible region Hypercube has a double-sized edge compared to next hypercube, and the shrinking of hypercubes stops when there are no known point solutions in except center point

2.2 Local refinement

The proposed exploration-exploitation global search strategy lacks effective local refinement since randomly generated points lack improvement direction. On the contrary, the Hooke–Jeeves single-objective optimization algorithm searches function improvement direction by taking a step in all decision variables [13]. A multi-objective optimization problem can be reduced to a single-objective optimization problem by scalarization methods such as Chebyshev scalarization [15]. The Chebyshev scalarization requires a weight vector to get a single optimal trade-off solution. To get an approximation of Pareto optimal front, a multiple uniformly generated weight vectors should be used. Although weight vectors are uniformly generated, the resulting Pareto front approximation may be non-uniformly distributed, and human expert intervention may be needed to generate additional weight vectors to get uniformly distributed Pareto front approximation [2,14]. To avoid complicated weight vectors selection, a weight-free approach to multi-objective optimization problems using a modified version of Hooke–Jeeves direct search is presented [5]. In this paper a novel approach for conversion of a multi-objective optimization problem to a single-objective optimization problem without the use of the weight vectors is suggested.

The multi-objective function is converted to a single-objective surrogate function which has a current solution: a decision vector and multi-objective function evaluation vector Initially, the surrogate function has predefined value When the new solution dominates the current solution , the surrogate function value at new location decreases and the current solution is updated Otherwise, when the new solution does not dominate the current solution . the surrogate function value at new location remains the same and the current solution is not updated, i.e.,

For every current Pareto optimal solution a separate surrogate function having Pareto optimal solution as current solution is defined. Every defined surrogate function is optimized using the Hooke–Jeeves optimization algorithm taking as the start point:

Figure 1.

Example Pareto front (a) of minimized bi-objective function F(x) = (f₁(x),f₂(x))^T. Blue circles are the set of current Pareto optimal function evaluations – P. For Pareto optimal solution xP , a minimized surrogate f_s(x) function is defined (f_s(xP) = 0). A surrogate function value at minimal point is negative f_s(x_min) < 0, x_min = argmin_x∈A f_s(x), so the value of the original bi-objective function at the newly found solution F(x_min) (red asterisk) dominates bi-objective function’s value at the initial point F(xP). Function evaluation F(xmin) at solution found by surrogate function minimization xmin = argminx∈A f_s(x) is in the area bounded by the Pareto front and rectangle defined by points: F(xP) and ideal point of minimal function values F* = (f₁^*,f ₂^*) ^T (black dot). On the right (b), surrogate functions are defined for all current Pareto optimal solutions (blue circles). Possible solutions found using surrogate functions minimization cover almost the entire Pareto front except for small parts near minimal values of functions f₁(x) and f₂(x).

In the case of negative value of surrogate function at minimal point argmin the value of original multi-objective function at the newly found solution dominates multi-objective function’s value at Hooke–Jeeves optimization start point i.e., the newly found solution’s value is located closer to the true Pareto front compared with the initial function value . Function evaluation at solution found by surrogate function minimization is in the area bounded by true Pareto front and hyperrectangle defined by points: and ideal point of minimal function values . An example of the bi-objective case is shown in Fig. 1.

2.3 Implementation and pseudo-code

The basic concepts of the proposed optimization algorithm are presented in previous subsections, but a more detailed explanation of the whole picture is still needed. Therefore the pseudo-code of the proposed optimization algorithm is given in Algorithm 1. List of the notations used:

1. a multi-objective optimized function.
2. count of variables.
3. a feasible region of unit hypercube.
4. set of all points in a feasible region A = [0,1]^d.
5. set of all function evaluations {F(x) | x ∈ U_A}.
6. points in a feasible region A = [0,1]d of current non-dominated Pareto optimal solutions.
7. function evaluations of current non-dominated Pareto optimal solutions.
8. maximum allowed number of function F(·) evaluations.
9. maximum iteration number of global search with local refinement.
10. number of initial random solutions.
11. number of randomly generated candidate points in decision space for new function evaluations.
12. of function evaluations get by local generation near current Pareto optimal solutions compared to function evaluations get by global generation in the entire feasible region A.
13. step size parameters of the Hooke–Jeeves optimization algorithm. The full set of step sizes from largest to smallest are calculated using the following expression: where the largest step size is 0.8· and the smallest step size is 0.8 ·
14. Boolean parameter value if it is set to true, the step size parameters and are updated at the second and following iterations of global search with local refinement. For every Pareto optimal solution the value is updated so that the largest step size would be approximately equal to minimal Euclidean distance to another Pareto optimal solution: The value is also updated so that the smallest step size would be smaller than the largest:

Some remarks explaining the pseudo-code of the proposed optimization algorithm will be given. At the first algorithm’s iteration , the local refinement phase optimizes not only the surrogate function but also every single-objective function of the optimized multi-objective function to get limit solutions of Pareto front approximation. Solutions that have been found using the surrogate function or single criteria function optimization are not used to define and optimize the new surrogate function anymore because the surrogate function value at this point is near-optimal already. Instead, a non-dominated Pareto optimal solutions found by the global search phase are used to define and optimize new surrogate functions this way inducing a search in the unexplored area to find new Pareto optimal solutions. The step size parameters and are updated (value of parameter Update must be set to true) so that the Hooke–Jeeves optimization algorithm’s largest step size would be approximately equal to minimal Euclidean distance to another Pareto optimal solution. In the case of a small distance to another Pareto optimal solution, only small adjustments are needed so small step sizes should be used. In the case of a large distance to another Pareto optimal solution, the new distantly located solution is found, therefore more excessive local search is needed during surrogate function optimization so that initially large step sizes should be used. Please note that the global search phase can be reduced to simple uniform distribution of function evaluation points in case when the following parameters are selected:

The global search phase interchanges with the local refinement phase multiple times. The algorithm works until the maximum allowed number of function evaluations or the maximum iteration number of global searches with local refinement is reached. The algorithm returns a set of all function evaluations and a set of non-dominated Pareto optimal solutions.

3 Performance analysis

This section first presents theoretical convergence analysis of the proposed algorithm, and then the performance of the proposed algorithm is evaluated by the numerical experiments. Many optimization problems in engineering and science have very limited function evaluation budget, so the proposed optimization algorithm was tested in case of extremely low function evaluation budget, and the results were compared with the results of Bayesian rooted optimization algorithms [31]. On the other hand, a good performance in case of the high dimensionality of feasible region is a desirable feature, so the optimization algorithm was tested with test suite ZDT having many decision variables (up to 30) [33]. Finally, test suite DTLZ was used in case of many-objective (up to 15) optimization problems [9]. The results of the last two cases were compared with the results of popular evolutionary optimization algorithms NSGA2 and NSGA3 [3,7,8].

3.1 Convergence analysis

To prove the function evaluations in the decision space of the proposed algorithm are everywhere dense, let part of function evaluations get by generation in the entire feasible region have a non-zero value and a number of generated random sample points inside of a unit hypercube value. Let a maximum allowed number of function evaluations and a maximum iteration number of a global search with a local refinement approach infinite then points of function evaluations in a decision space are everywhere dense in the decision space

Let us say the opposite, there remains some hypersphere having inside no points from the set of function evaluations in a decision space. Let the radius value of the hypersphere and a center located at a point in a feasible area. Let us define a hypersphere having a radius value and a center located at the same point hereby this newly defined hypersphere is inside of the hypersphere Let us define another hypersphere having a radius value and a center located at a point which is a point of function evaluations in a decision space During every iteration of the proposed algorithm, a number of random sample points are uniformly generated inside of the hypercube Let us define an event of the case, then one of random sample points hits inside of the hypersphere so the Euclidean distance between points and is less than a radius of the hypersphere , and the remaining generated random points hit inside of the hypersphere so the Euclidean distances between points and are less than a radius of the hypersphere i.e., Such a defined event has a non-zero probability value, and it can be calculated in the following way. Assume that the value of volumes of parts of the hypersphere and the hypersphere inside of the feasible area are noted as Volumes will have non-zero values as well as the value of the unit hypercube volume of the feasible area Then the probability of the defined event has a non-zero constant value of

This means the defined event will definitely occur as a maximum iteration number of a global search with a local refinement approach infinite and so experiment count approaches infinite. Let us say the defined event occurred at an iteration Selection functions are calculated at points and at a point taking points having minimal tradeoff Pareto optimal values as evaluation points of an optimized function selection function is the generated point’s Euclidean distance to the closest known point (as defined by Eq. (1)). Therefore, these inequalities are valid: so the selection function at a point has the biggest value; consequently, a vector of selection functions values at a point will belong to minimal trade-off Pareto optimal values of the selection functions, so the optimized function is evaluated at a point , and a new point is added to the set of function evaluations in the decision space This means a function evaluation point is added to the hypersphere and and . as .. ⊂ ., and this is a contradiction to the statement that there remains some hypersphere . having inside no points from the set of function evaluations in the decision space This means that points of are everywhere dense in the decision space

To prove the convergence of the proposed algorithm, let the optimized multi-objective function consist of Lipschitz-continuous functions .. and is a feasible area of a unit hypercube. Let the part of function evaluations get by generation in the entire feasible region have a non-zero value and a number of generated random sample points inside of the unit hypercube have value, and let a maximum allowed number of function evaluations and a maximum iteration number of a global search with a local refinement approach infinite then to any point on the true Pareto front the proposed algorithm will find a solution with any small error value as an Euclidean distance between the found solution and a point on the true Pareto front

Let us say the opposite, there is a point on the true Pareto front having no solution of the proposed algorithm with an error value This means there is a hypersphere with a radius value and a center located at a point having inside no points from a set of function evaluations in a decision space. But we already proved that points of are everywhere dense in the decision space This means there is a solution inside of the hypersphere and, consequently, inequality is valid, and this is a contradiction to the statement that there is a point on the true Pareto front having no solution of the proposed algorithm with an error value This means to any point on the true Pareto front the proposed algorithm will find a solution with any small error value

The investigation of the rate of convergence is more complex. On the other hand, the performance of the proposed algorithm is evaluated by the numerical experiments presented in the next subsections.

3.2 Numerical experiments having low functions evaluation budget

In case when experiments have low functions evaluation budget, the results of the proposed algorithm were compared with the results of Bayesian rooted optimization algorithms: standard and partition-based implementations of the P-algorithm [31]. Several test problems are solved to illustrate the performance of Bayesian rooted optimization algorithms [31]. The same test problems will be used here.

Frequently used multi-objective non-convex test problems were proposed in [10]. The objective functions are defined as follows:

where and the feasible region is The next bi-objective test problem is two Shekel functions frequently used to evaluate global optimization algorithms [16]:

where the feasible region is

For the comparison performance of the proposed algorithm having low functions evaluation budget, the metrics applied which were used in a recent publication related to Bayesian rooted optimization algorithms [31]. One of the most simple and most popular performance metrics is the number of non-dominated solutions found by an optimization algorithm (NN). The estimation of the distance between the found approximation and the true Pareto front is the so-called generational distance (GD) [6]. GD is computed as the maximum of distances between the found non-dominated solutions and their closest neighbors from the Pareto front. A metric integrating measures of the approximation precision and spread is the so-called epsilon indicator (EI) [34]. EI is computed as the maximum of distances between the true Pareto front and their closest neighbors from found non-dominated solutions. Metrics GD and EI can be expressed by the following equations:

The proposed algorithm was tested with 100 function evaluation budget. The proposed algorithm was run with the following parameters:

The parameter value is selected 10% below function evaluation budget since stop condition is checked once every iteration after global search and local refinement phase is finished. On average function evaluation budget of 100 evaluations is not exceeded, but some algorithm’s runs make more, and some runs make fewer function evaluations. The same parameter selection strategy will be used in this paper. The number of iteration is selected to be the maximum number that cannot be achieved, so it can be noted as infinite More parameters are is the number of initial random function evaluations, and is the coefficient value of expression giving the number of randomly generated candidate points in decision space for new function evaluations. The number of randomly generated candidate points should be high in the case of low functions evaluation budget so such a high value of . is selected. Value of is part of function evaluations get by local generation near current Pareto optimal solutions compared to function evaluations get by global generation in all feasible region The step size parameters of the Hooke–Jeeves optimization algorithm have the following values: The parameter is set to true, so the step size parameters and are updated at second and following iterations of global search with local refinement. The same parameters were used for both test problems except when the parameter’s value was = 4 for problem (4).

Since the proposed algorithm and P-algorithm are stochastic, the test problems were solved 100 times [16, 28]. The mean values and standard deviations of the considered metrics are present in two columns of Table 1. Otherwise, the hyperrectangle partitionbased P-algorithm is deterministic, so its results occupy a single column for each test problem [31]. The proposed algorithm shows decent performance. In case of problem (3) the proposed algorithm gives better NN and EI values than the P-algorithm. In case of problem (4) the proposed algorithm gives the best NN value and gives better GD and EI values than the hyperrectangle partition-based P-algorithm.

For visualization of the proposed algorithm’s operation, the solutions of problems (3) and (4) having the best metric EI value are presented in Fig. 2. The Pareto front is well

Mean values and standard deviations of performance criteria (NN, GD, EI) of the P-algorithm, partition-based implementations of the P-algorithm and the proposed algorithm for test problems (3) and (4).

Table 1.
Mean values and standard deviations of performance criteria (NN, GD, EI) of the P-algorithm, partition-based implementations of the P-algorithm and the proposed algorithm for test problems (3) and (4).

Figure 2.

Pareto front (black line) and Pareto optimal solutions (blue asterisk) found by the proposed algorithm for problem (3) (NN = 14, GD = 0.0401, EI = 0.0782) (a) and problem (4) (NN = 20, GD = 0.0549, EI = 0.0733) (c). The function evaluation points made by the proposed algorithm and line of Pareto optimal solutions in decision space (black line) are accordingly in (b) and (d), where red plus marks N initial random evaluations points, green circle marks the evaluation points made by candidate points generation near current Pareto optimal solutions in the square areas marked by a gray dotted line, magenta triangle marks evaluation points made by candidate points generation in all the feasible region, and blue square marks evaluation points made during surrogate function optimization using the Hooke–Jeeves algorithm.

approximated by Pareto optimal solutions found by the proposed algorithm in both test problems (Figs. 2(a) and 2(c)). Function evaluation points made by global search tend to explore the area with no function evaluations (triangle points) and to cluster new function evaluation points near current Pareto optimal solutions (circle points) (Figs. 2(b) and 2(d)). Function evaluation points made by local refinement (square points) are well clustered near the line of Pareto optimal solutions in decision space (Figs. 2(b) and 2(d)).

3.3 Numerical experiments using test problems having many decision variables

In the case of the high dimensionality of the feasible region, the results of the proposed algorithm were compared with the results of the evolutionary optimization algorithm NSGA2 [3, 8]. The performance of the optimization algorithm was tested with the biobjective test suite ZDT having many decision variables. Biobjective test problems ZDT1, ZDT2, ZDT3 have 30 decision variables, and ZDT4, ZDT6 have 10 decision variables

[33].

For the comparison performance of the proposed algorithm in the case of the high dimensionality of the feasible region, the metrics applied which were used in publications related to the evolutionary optimization algorithm NSGA2 [3, 8]. The estimation of the average distance between the found approximation and the true Pareto front is the average generational distance (GD_avg) [8]. GD_avgis computed as the average of distances between the found non-dominated solutions and their closest neighbors from the Pareto front. A metric integrating measures of the approximation precision and spread is so-called inverted generational distance (IGD_avg) [3]. IGD_avgis computed as the average of distances between the true Pareto front and their closest neighbors from found non-dominated solutions. Metrics GD_avgand IGD_avgcan be expressed by the following equations:

where is a set of non-dominated solutions found by the considered algorithm, and is a set of solutions well representing the Pareto front, i.e., the solutions are sufficiently densely and uniformly distributed over the Pareto front, i.e., uniformly distributed points were used as a set of solutions well representing the Pareto front.

The proposed algorithm was tested with fixed function evaluation budget. The parameter value is selected 10% below the function evaluation budget, so on average the function evaluation budget is not exceeded. The proposed algorithm was run with the following parameters:

The number of iteration is selected to be the maximum number which cannot be achieved. The number of initial random function evaluations is = 100, and the

Mean values and variance of performance criteria GDavg of the real-coded and binary-coded evolutionary optimization algorithm NSGA2 and proposed algorithm for ZDT test problems.

Table 2.
Mean values and variance of performance criteria GDavg of the real-coded and binary-coded evolutionary optimization algorithm NSGA2 and proposed algorithm for ZDT test problems.

Mean values and standard deviation of function evaluation count and performance criteria IGDavg of evolutionary optimization algorithm NSGA2 and the proposed algorithm for ZDT test problems.

Table 3.
Mean values and standard deviation of function evaluation count and performance criteria IGDavg of evolutionary optimization algorithm NSGA2 and the proposed algorithm for ZDT test problems.

coefficient value of number of randomly generated candidate points in decision space for new function evaluations is = 1. The number of randomly generated candidate points should not be high in the case of a large functions evaluation budget as it cause a great burden for the proposed algorithm, so a low coefficient value of is selected. The value of is part of function evaluations get by local generation near current Pareto optimal solutions compared to function evaluations get by generation in the entire feasible region. The step size parameters of the Hooke–Jeeves optimization algorithm have the following values: The step size parameters and are updated at second and following iterations of global search with local refinement as parameter is set to true. The same parameters were used for all ZDT test problems.

Since the proposed algorithm and evolutionary optimization algorithm NSGA2 are stochastic, the test problems were solved multiple times [3,8]. The mean values and variance or standard deviations of the considered metrics are presented in Tables 2 and 3. Note that in this subsection the normalization of function values is performed with ideal and nadir points in the case of IGD_avgmetric calculation; see Eq. (2). Table 2 shows GD_avgmetric results of the proposed algorithm compared to the results of the evolutionary optimization algorithm NSGA2 after 10 experiments having 25000 function evaluation budget [8]. Table 3 shows IGD_avgmetric results of the proposed algorithm compared to the results of the evolutionary optimization algorithm NSGA2 after 51 experiments [3].

The proposed algorithm shows good performance. Table 2 shows GD_avgmetric results; the proposed algorithm has better performance compared to the results of the evolutionary optimization algorithm NSGA2. The proposed algorithm has a lower GD_avgmetric value compared to NSGA2 except in the case of the ZDT2 test problem, where the binary-coded NSGA2 performs better. Table 3 shows IGD_avgmetric results; the proposed algorithm has better performance compared to the results of the evolutionary optimization algorithm NSGA2 in the case of ZDT1, ZDT3 and ZDT6 test problems. The proposed algorithm has worse performance compared to the results of the evolutionary optimization algorithm NSGA2 in the case of ZDT2 and ZDT4 test problems.

3.4 Numerical experiments using many-objective test problems

Many-objective optimization problems having up to 15 objective functions are considered. Representation of Pareto optimal surface for many-objective optimization problems is usually difficult as representation needs exponentially growing points count for the additional objective function. Multiple predefined reference points can be specified, and Pareto-optimal trade-off solutions corresponding to each reference point are found as it is done in the genetic algorithm NSGA3 [7]. In the case of many-objective optimization problems, adequate trade-off Pareto optimal surface approximation may not be found using a few hundred reference points [7]. Reference points selection may be tricky if trade-off solutions with specific properties are needed, so the newly developed optimization algorithm does not need any predefined reference points. The newly developed optimization algorithm was tested with many-objective test suite DTLZ having up to 15 objective functions [9]. The number of variables is where number is for DTLZ1, while number is for DTLZ2, DTLZ3 and DTLZ4 test problems. The results were compared with the results of popular evolutionary optimization algorithms NSGA3 [7].

The proposed algorithm was tuned to find raw solutions since having moderate function evaluation budget an adequate representation of Pareto optimal surface for manyobjective optimization problems usually is impossible. The parameter value is selected to be very large, it can be noted so the number of iteration is specified as a stop condition in Table 4. The proposed algorithm was run with the following parameters:

The number of initial random function evaluations is = 100, and coefficient value of number of randomly generated candidate points in decision space for new function evaluations is = 1. Value of is part of function evaluations get by local generation near current Pareto optimal solutions compared to function evaluations get by generation in all the feasible region. The step size parameters of the Hooke–Jeeves optimization algorithm have the following values: Raw solutions are searched, so step size parameters and update can drastically increase function evaluation budget, so parameters are not updated at second and following iterations since the parameter is set to false.

After a raw trade-off solutions is found, random solutions are selected to be tuned by the surrogate function optimization using the Hooke–Jeeves algorithm. Table 5 shows how many raw solutions are taken for tuning using surrogate function optimization. For every random solution a separate surrogate function having

Mean values of the function evaluation count and the best, median and worst IGDref values obtained for proposed algorithm and NSGA3 on M-objective DTLZ test problems. The number of iterations Imax is specified as the stop condition of the proposed algorithm.

Table 4.
Mean values of the function evaluation count and the best, median and worst IGDref values obtained for proposed algorithm and NSGA3 on M-objective DTLZ test problems. The number of iterations Imax is specified as the stop condition of the proposed algorithm.

Table 5.
Number of reference points used in NSGA3 and raw solutions used in proposed algorithm.

current solution is defined. Every defined surrogate function is optimized using the Hooke–Jeeves optimization algorithm taking as the start point:

where step size parameters are set to = 2 and = 12. All test problems (Table 4) were optimized with the same parameters, except in the case of DTLZ2 and DTLZ4 the proposed algorithm’s step size parameters were set to = 2 and = 4, and step size parameters of raw solutions tuning phase was set to = 2 and = 6 to have larger step sizes, and therefore function evaluation budget is of similar size with NSGA3. Also, in the case of .-objective DTLZ4 test problems having 3 and 5 objective functions, a number of initial random function evaluations was set to 3000, and coefficient value of number of randomly generated candidate points was set to = 0.03. This way enough raw solutions was found, and the needed count of raw solutions is given in Table 5.

For the comparison performance of the proposed algorithm in the case of a manyobjective optimization problem, the metrics was applied which was used in publications related to the evolutionary optimization algorithm NSGA3 [7]. A metric similar to inverted generational distance IGD_avgdefined by (5) equation, but instead of computing the average of distances between the true Pareto front and their closest neighbors from found non-dominated solutions, a new metric IGD_refcompute average of distances between reference points on the true Pareto front and their closest neighbors from found nondominated solutions. Metric IGD_refcan be expressed by the following equation:

where _tunedis non-dominated solutions tuned to reference points, is a set of reference points on the true Pareto front. In the case of NSGA3 a multiple predefined Pareto-optimal reference points .. are specified, and Pareto-optimal trade-off solutions corresponding to each reference point are found [7]. In the case of the newly developed optimization algorithm a predefined reference points selection is not needed. Raw solutions after final iterations of the proposed algorithm are randomly selected and closest points on the true Pareto front are selected as a set of reference points ... Selected non-dominated raw solutions are tuned by the surrogate function optimization using the Hooke– Jeeves algorithm to get a non-dominated solution set _tuned. As raw solutions are selected randomly , so this is equivalent to raw solutions selection by a human expert to get solutions with specific ptoperties. The sizes of sets of raw solutions and reference points are give in Table 5.

Since the proposed algorithm and evolutionary optimization algorithm NSGA3 are stochastic, the test problems were solved 20 times [7]. The best median and worst values of the considered metric IGD_refare presented in Table 4. The results of the proposed algorithm were compared to the results of the evolutionary optimization algorithm NSGA3 [7]. The proposed algorithm gives a good performance compared to the results of the evolutionary optimization algorithm NSGA3. Mean values of function evaluation count are lower in the case of the proposed algorithm except in the case of M-objective DTLZ2 ( 3,5,8) and DTLZ4 ( 3) test problems. The best median and worst IGD_refvalues are better in the case of the proposed algorithm except in case of the Mobjective DTLZ3 ( 3) test problem.

4 Conclusions

The hybrid multi-objective optimization algorithm is proposed combining random global search and local refinement of the found approximation of the Pareto front. The global search algorithm mimics the Bayesian algorithm. The Hooke–Jeeves algorithm is used for local refinement. At the local optimization phase, the multi-objective optimization problem is converted to a single-objective optimization problem by introducing a surrogate function without the use of the weight vectors. The developed algorithm was tested in the case of extremely low functions evaluation budget, and the proposed algorithm gives decent performance comparing results with the results of Bayesian rooted optimization algorithms. Also, the proposed optimization algorithm was tested with many decision variables test suite and with many-objective test suite, and the results of numerical experiments showed good performance compared with the results of popular evolutionary optimization algorithms NSGA2 and NSGA3. In the case of many-objective optimization, the proposed algorithm need no predefined reference points, so raw solutions selection can be done by a human expert to be tuned to final solutions of needed properties. Future plans include the development parallel version of the proposed algorithm to optimize functions with an extreme computational burden. Another research theme of interest is integrating the proposed algorithm execution with human expert decisions to get solutions of needed properties.

References

1. F, Archetti, A, Candelieri. Bayesian Optimization and Data Science, Springer, Cham, 2019.

2. R, Baronas, A, Žilinskas, L, Litvinas. Optimal design of amperometric biosensors applying multi-objective optimization and design visualization, Electrochim. Acta, 211:586–594, 2016.

3. J, Blank, K, Deb. A running performance metric and termination criterion for evaluating evolutionary multi- and many-objective optimization algorithms, in 2020 IEEE Congress on Evolutionary Computation (CEC), IEEE, Piscataway, NJ, 2020, pp. 1–8.

4. J, Cui, B, Yang. Survey on Bayesian optimization methodology and applications, J. Softw., 29(10):3068–3090, 2007 (in Chinese), https://doi.org/10.13328/j.cnki.jos.005607.

5. A.L, Custódio, J.F.A, Madeira. MultiGLODS: global and local multiobjective optimization using direct search, J. Glob. Optim., 72(2):1–23, 2018, https://doi.org/10.1007/ s10898-018-0618-1.

6. K, Deb. Multi-Objective Optimization using Evolutionary Algorithms, Wiley, New York, 2001.

7. K, Deb, H, Jain. An evolutionary many-objective optimization algorithm using referencepoint-based nondominated sorting approach, part i: Solving problems with box constraints, IEEE Transactions on Evolutionary Computation, 18(4):577–601, 2014, https://doi. org/10.1109/TEVC.2013.2281535.

8. K, Deb, A, Pratap, S, Agarwal, T, Meyarivan. A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Trans. Evol. Comput., 6(2):182–197, 2002, https://doi. org/10.1109/4235.996017.

9. K, Deb, L, Thiele, M, Laumanns, E, Zitzler. Scalable test problems for evolutionary multiobjective optimization, in A. Abraham, L. Jain, R. Goldberg (Eds.), Evolutionary Multiobjective Optimization: Theoretical Advances and Applications, Springer, London, 2005, pp. 105–145, https://doi.org/10.1007/1-84628-137-7_6.

10. C.M, Fonseca, P.J, Fleming. On the performance assessment and comparison of stochastic multiobjective optimizers, in H.-M. Voigt, W. Ebeling, I. Rechenberg, H.-P. Schwefel (Eds.), Parallel Problem Solving from Nature—PPSN IV, Springer, Berlin, Heidelberg, 1996, pp. 584– 593, https://doi.org/10.1007/3-540-61723-X_1022.

11. V, Gergel, E, Kozinov. Efficient multicriterial optimization based on intensive reuse of search information, J. Glob. Optim., 71(1):73–90, 2018, https://doi.org/10.1007/ s10898-018-0624-3.

12. D.R, Jones, C.D, Perttunen, B.E, Stuckman. Lipschitzian optimization without the Lipschitz constant, J. Optim. Theory Appl., 79:157–181, 1993, https://doi.org/10.1007/ BF00941892.

13. C.T, Kelley. Iterative Methods for Optimization, SIAM, Philadelphia, 1999.

14. L, Litvinas, R, Baronas, A, Žilinskas. Application of two phase multi-objective optimization to design of biosensors utilizing cyclic substrate conversion, in Proceedings of the 31st European Conference on Modelling and Simulation, ECMS 2017, European Council for Modelling and Simulation, 2017, pp. 469–474, https://doi.org/10.7148/2017-0469.

15. K, Miettinen. Nonlinear Multiobjective Optimization, Springer, Boston, MA, 1998, https://doi.org/10.1007/978-1-4615-5563-6.

16. P.M, Pardalos, A, Žilinskas, J, Žilinskas. Non-Convex Multi-Objective Optimization, Springer, Cham, 2017.

17. R, Paulavicius, Y.D, Sergeyev, D.E, Kvasov, J, Žilinskas. Globally-biased DISIMPL algorithmˇ for expensive global optimization, J. Glob. Optim., 59:1–23, 2014, https://doi.org/ 10.1007/s10898-014-0180-4.

18. R, Paulavicius, Y.D, Sergeyev, D.E, Kvasov, J, Žilinskas. Globally-biased BIRECT algorithmˇ with local accelerators for expensive global optimization, Expert Syst. Appl., 144:113052, 2020, https://doi.org/10.1016/j.eswa.2019.113052.

19. R, Paulavicius, J, Žilinskas. Simplicial Global Optimization. Springer Briefs in Optimization, Springer, New York, 2014, https://doi.org/10.1007/978-1-4614-9093-7.

20. J, Pinter. Global Optimization in Action, Springer, Boston, MA, 1996, https://doi.org/10.1007/978-1-4757-2502-5.

21. Y.D, Sergeyev, D.E, Kvasov. Global search based on efficient diagonal partitions and a set of Lipschitz constants, SIAM J. Optim., 16(3):910–937, 2006, https://doi.org/10.1137/040621132.

22. Y.D, Sergeyev, M.S, Mukhametzhanov, D.E, Kvasov, D, Lera. Derivative-free local tuning and local improvement techniques embedded in the univariate global optimization, J. Optim. Theory Appl., 171(1):186–208, 2016, https://doi.org/10.1007/s10957-0160947-5.

23. A, Žilinskas. On the worst-case optimal multi-objective global optimization, Optim. Lett., 7:1921–1928, 2013, https://doi.org/10.1007/s11590-012-0547-8.

24. A, Žilinskas, J, Calvin. Bi-objective decision making in global optimization based on statistical models, J. Glob. Optim., 74:599–609, 2019, https://doi.org/10.1007/s10898018-0622-5.

25. A, Žilinskas, G, Gimbutiene.˙ A hybrid of Bayesian approach based global search with clustering aided local refinement, Commun. Nonlinear Sci. Numer. Simul., 78:104857, 2019, https://doi.org/10.1016/j.cnsns.2019.104857.

26. C, Zheng, J, Calvin, C, Gotsman. A DIRECT-type global optimization algorithm for image registration, J. Glob. Optim., 79(2):431–445, 2021, https://doi.org/10.1007/ s10898-020-00914-y.

27. A, Zhigljavsky, A, Žilinskas. Bayesian and High-Dimensional Stochastic Optimization, Springer, Cham, 2021, https://doi.org/10.1007/978-3-030-64712-4.

28. A, Žilinskas. A statistical model-based algorithm for ‘black-box’ multi-objective optimisation, Int. J. Syst. Sci., Princ. Appl. Syst. Integr., 45(1):82–93, 2014, https://doi.org/10.1080/00207721.2012.702244.

29. A, Žilinskas, R, Baronas, L, Litvinas, L, Petkevicius. Multi-objective optimization and decisionˇ visualization of batch stirred tank reactor based on spherical catalyst particles, Nonlinear Anal. Model. Control, 24(6):1019–1033, 2019, https://doi.org/10.15388/NA.2019.6.10.

30. A, Žilinskas, L, Litvinas. A hybrid of the simplicial partition-based Bayesian global search with the local descent, Soft Comput., 24(23):17601–17608, 2020, https://doi.org/10.1007/s00500-020-05095-0.

31. A. Žilinskas, L, Litvinas. A partition based Bayesian multi-objective optimization algorithm, in Y.D. Sergeyev, D.E. Kvasov (Eds.), Numerical Computations: Theory and Algorithms, Springer, Cham, 2020, pp. 511–518, https://doi.org/10.1007/978-3-03040616-5_50.

32. A, Žilinskas, J, Žilinskas. Adaptation of a one-step worst-case optimal univariate algorithm of bi-objective Lipschitz optimization to multidimensional problems, Commun. Nonlinear Sci. Numer. Simul., 21(1–3):89–98, 2015, https://doi.org/10.1016/j.cnsns.2014.08.025.

33. E, Zitzler, K, Deb, L, Thiele. Comparison of multiobjective evolutionary algorithms: Empirical results, Evol. Comput., 8(2):173–195, 2000, https://doi.org/10.1162/ 106365600568202.

34. E, Zitzler, L, Thiele, M, Laumanns, C.M, Fonseca, V.G, da Fonseca. Performance assessment of multiobjective optimizers: An analysis and review, IEEE Trans. Evol. Comput., 7(2):117– 132, 2003, https://doi.org/10.1109/TEVC.2003.810758.