Comparison descent directions for Conjugate Gradient Method

Fernando Mesa; Diana Marcela Devia Narváez; German Correa Vélez

Ciencias Básicas

Recepción: 29 Noviembre 2019

Aprobación: 23 Julio 2021

DOI: https://doi.org/10.22517/23447214.24893

Abstract: In the following manuscript we will show as a starting point a theoretical analysis of the gradient method, known as one of the first descent methods, and from this we will identify the strength of the conjugate gradient methods. Taking an objective function, we will determine the values that optimize it by means of different methods, indicating the differences of geometric type that these have. Different systems will be used, in order to serve as a test, obtaining their solution in each case and finding the speed at which they converge in accordance with the conjugate gradient methods proposed by Hestenes-Stiefel and Fletcher-Reeves.

Keywords: Conjugate direction, descent, gradient, iteration, minimization, optimization, quadratic function, solution.

Resumen: En el siguiente manuscrito mostraremos como punto de inicio un análisis teórico del método de gradiente, conocido como unos de los primeros métodos de descenso, y a partir de ello identificar la fortaleza de los métodos del gradiente conjugado. Tomando una función objetivo determinaremos los valores que la optimizan mediante diferentes métodos indicando las diferencias de tipo geométrico que estos tengan. Se usarán distintos sistemas , con el fin de que sirvan de prueba obteniendo en cada caso su solución y encontrando la velocidad en que convergen de conformidad con los métodos de gradiente conjugado propuestos por Hestenes-Stiefel y Fletcher- Reeves.

Palabras clave: Descenso, dirección conjugada, función cuadrática, gradiente, iteración, minimización, optimización, solución..

I. INTRODUCTION

OPTIMIZATION is one of the most important tools in applied mathematics that is used in solving real life problems in different disciplines such as engineering and Biology. Getting the resources available to solve a problem or perform a specific task are used in the best way, is without a doubt the number one objective of optimization. For this, we have unrestricted optimization algorithms, within which we can count on the method of the steepest descent or also called the gradient method, which allows us to optimize a quadratic function, which will be the objective function, using different directions of search that are descending (geometrically). The gradient method is of great importance since its speed of convergence is quite high, together with the possibility of solving problems whose objective functions have associated a large number of dimensions [1].

II. Content

A. Quadratic forms

A quadratic form F can be defined as a scalar map whose domain corresponds to a finite vector space of dimension n, represented by (1):

(1)

Where c is a constant value and

To classify the quadratic form F, we use the eigenvalues i=1,2…n of its Hessian matrix A, and we identify its optimum as follows:

Positive definite quadratic form if . It has a global minimum, represented in fig. 1a.

Quadratic form defined negative if It has a global maximum , represented in fig. 1b.

Positive semi-definite quadratic form if . It has infinite mínimum points, represented in fig. 1c.

Negative semi-definite quadratic form if . It has infinite máximum points, Its graphic representation would correspond to the inverted fig. 1d.

Quadratic form indefinite if such that , . It has a saddle point, represented in fig. 1e.

Fig. 1.
Graphic definition of a quadratic form in [2]

A. Gradient Method

This is a descent method which consists of choosing any point and from this through iterations we can build a sequence of points that is obtained by advancing on the line of maximum descent. This sequence will converge to a point very close to the solution

The problems that we can solve using this method are of the type:

Here F is a quadratic form, continuously differentiable, with an associated positive definite Hessian matrix A.

From the point (initial position) we generate a succession of points given by (2)

(2)

indicates the length of the step, known as the descent parameter (3), and we can find it by minimizing:

(3)

The direction of maximum descent in of the quadratic form F is the gradient given by equation (4)[3]:

(4)

We present the algorithm of this method in the following table I [4]:

TABLE I
GRADIENT METHOD ALGORITHM

B. Conjugated Gradient Method

We frequently find problems represented by systems that are sparse, since they arise when solving equations in partial derivatives in numerical form. It is there when we use this descent method, which helps us save memory by using only null elements, which they are to a large extent those observed in the matrix of coefficients that represents the system [3].

The conjugate gradient method initially requires the construction of an orthogonal base (Gramm-Schmidt method), determining with it the best solution or simply the one that is most efficient. The interesting thing about this method is the way in which the base is built guarantees the orthogonality of each element with respect to the previous one and automatically all the previous ones also satisfy this condition [5].

As a great advantage of this method is its speed of convergence, since it is faster than that of the descent method, as we can see in fig. 2

Fig. 2.
Comparison of descent directions, steepest descent method (Orange), conjugate gradient (Blue) [6]

This method has the same approaches as the previous gradient method, since it is used to solve problems of optimization without restrictions:

and starts from an initial position generating the sequence of points given in (5) :

(5)

The descent parameter that was previously in the (3) is modified by (6):

(6)

Like the direction of descent that will be and is calculated by (7):

(7)

In this part is a scalar known as the parameter of the conjugate gradient, which corresponds to different values depending on the conjugate gradient algorithm that is chosen for the solution of the problem [7].Any conjugate gradient algorithm has a very simple general structure as illustrated below in the table II:

TABLE II
CONJUGATE GRADIENT METHOD ALGORITHM

B. Conjugate Gradient of Fletcher – Reeves

This method is an improvement of the descending gradient, which, using conjugated vectors, seeks to give a solution in fewer iterations, for the use of this method it is necessary to know the first displacement, since from this we can proceed to find the vectors directional that will be conjugated with each other [8].

For this method we proceed with an algorithm exactly like the previous method; however, we define the conjugate gradient parameter in (8), containing the gradient parameters of the quadratic, present and previous function, parameters previously proposed in the conjugate gradient algorithm[9].

(8)

C. Conjugate gradient of Hestenes-Stiefel

The conjugate gradient method has received a lot of attention and has been widely used in recent years. Although the pioneers of this method were Hestenes and Stiefel (1952) [8], the current interest starts from Reid (1971) posing it as an iterative method, which is the way it is most often used in the news [5].

As in the previous case, we reformulate Bk according to (9) to obtain its algorithm as follows:

(9)

III. Analysis And Results

Next, different test systems for the Fletcher-Reeves Conjugate Gradient and Hestenes-Stiefel Conjugate Gradient methods will be presented.

To check the effectiveness of the method, 4 systems of 2 and 3 variables are selected, each of them described by their associated matrices A and b according to the quadratic form proposed in (1).

For the stop criterion is used and as starting point which is a column vector with n rows and all its components equal to one.

Proposed system for n = 2

System for n=3

A. Fletcher-Reeves Conjugate Gradient Test Systems.

The different test systems for the Fletcher-Reeves conjugate gradient method are presented below in Tables III, IV and V.

TABLE III
RESULTS OBTAINED GC FLETCHER-REEVES

TABLE IV
RESULTS OBTAINED FOR n=2

TABLE V
RESULTS OBTAINED FOR n=3

B. Hestenes-Stiefel Conjugate Gradient Test Systems

The different test systems for the Hestenes-Stiefel conjugate gradient method are presented below in Tables VI, VII, VIII and IX.

TABLE VI
RESULTS OBTAINED GC HESTENES-STIEFEL

TABLE VII
RESULTS OBTAINED FOR n=2

TABLE VIII
RESULTS OBTAINED FOR n=3

TABLE IX
RESULTS OBTAINED FOR n=4

C. Comparison of methods

The different simulations corresponding to the descent methods will be shown below.

Test system n = 2

Fig. 3 .
Direction vs. Iterations. Red Fletcher-Reeves, Blue Hestenes-Stiefel.

Test system n=3

Fig. 4.
Direction vs. Iterations. Red Fletcher-Reeves, Blue Hestenes-Stiefel

In the previous graphs (fig.3 and fig. 4) it is possible to observe the behavior of the Fletcher - Reeves and Hestenes-Stiefel methods for spaces of dimensions n = 2 and n = 3. In them we observe that more directions and observation points can be calculated by Fletcher-Reeves, versus the Hestenes-Stiefel results.

Fig. 5.
. Standard Gradient vs. Iterations. Red Fletcher-Reeves, Blue Hestenes-Stiefel.

Fig. 6.
Standard Gradient vs. Iterations. Red Fletcher-Reeves, Blue Hestenes-Stiefel.

In the graphs represented in fig. 5 and fig. 6 the norm for R² and R³ can be seen, concluding that the convergence is faster with Hestenes-Stiefel than with the method proposed by Fletcher-Reeves.

Fig. 7
Difference Standard Gradient vs. Iterations.

Fig. 8.
Difference Standard Gradient vs. Iterations.

Finally, fig. 7 and fig. 8 gather the iteration-to-iteration difference between the standards of the grades in fig. 5 and fig. 6 respectively. The difference is clearly seen in the iteration after the method improved by Hestenes-Stiefel has already found the optimum of the systems proposed for 2 and 3 dimensions.

IV. Conclusions

When the quadratic form has distorted or too eccentric contours, more interactions will be required in order for the Fletcher-Reeves method to converge. This is because by rounding the errors they result in the need for more interactions.

For a greater effectiveness of the Fletcher-Reeves method, we must periodically restart the method from an appreciable number of steps, in which the new search direction corresponds to that of the steep descent.

The results show us that the Fletcher-Reeves method is a better optimization method compared to the different search methods using patterns in particular than the Hestenes-Stiefel method [10].

References

[1]. J. Nocedal and S. J. Wright, “Numerical optimization”, Springer, New York, 2006. doi: https://doi.org/10.1007/978-0-387-40065-5

[2]. M. A. Fontelos, “Fundamentos matemáticos de la Ingeniería”, Librería-Editorial Dykinson, 2007.

[3]. U. Kindelán, “Resolución de sistemas lineales de ecuaciones: Método del gradiente conjugado”, Universidad Politécnica de Madrid, 2007

[4]. A. Katrutsa, M. Botchev, G. Ovchinnikov and I. Oseledets, “How to optimize preconditioners for the conjugate gradient method: a stochastic approach”, Numerical Analysis, 2017.

[5]. A. C. Ledesma, “Resolución de Grandes Sistemas de Ecuaciones Lineales”, Instituto de Geofísica UNAM, 2006.

[6]. B. Calvo and G Santafé, “scmamp: Statistical comparison of multiple algorithms in multiple problems ” The R Journal, (8) 1, 2016.

[7]. N, Andrei, “Scaled conjugate gradient algorithms for unconstrained optimization”, Computational Optimization and Applications 38(3) 401-416, 2007. doi: 10.1007/s10589-007-9055-7

[8]. R. Fletcher and C. M. Reeves, “Function minimization by conjugate gradients”, The computer journal, 7(2), 149-154, 1964.

[9]. F. Mesa, P. P. Cardenas-Alzate and C. A. Rodriguez Varela, “Comparison of the Conjugate Gradient Methods of Liu-Storey and Dai-Yuan”, Contemporary Engineering Sciences, 10(35), 1719, 1726.

[10]. R. Fletcher, “Practical Methods of Optimization”, Wiley, New York, 1980.

Notas de autor

F. Mesa

F. Mesa
Fernando Mesa

university professor of the Mathematics Department of the Technological University of Pereira, Colombia and director of the applied mathematics and education research group. Master of Science (Msc) in physical instrumentation (2007). Member of the plasma laboratory of the National University of Colombia, located in Manizales, Colombia. Member of the group of non-linear differential equations "GEDNOL" at the Technological University of Pereira, Colombia. Director of the research group in Applied Mathematics and Education "GIMAE" at the Technological University of Pereira, Colombia.

D. M. Devia-Narvaez

D. M. Devia-Narvaez
Diana Marcela Devia Narváez

Associated teacher of mathematics and physics at Universidad Tecnologica de Pereira, Colombia. PhD in engineering (2012). Master of Science (Msc) in the faculty of Physics - Cience (2010). Member of plasma laboratory from Universidad Nacional de Colombia located in Manizales, Colombia. Member of nonlinear differential equations group “GEDNOL” at Universidad Tecnológica de Pereira, Colombia. Area of expertise: material processing through assisted plasma techniques, structural characterization mechanical of materials, simulation and modeling of material’s physical properties.

G. Correa-Vélez

G. Correa-Vélez
German Correa Vélez

university lecturer of the department of Mathematics at the Universidad Tecnológica de Pereira, Colombia. Master of Science (Msc) in mathematics (2008). Member of nonlinear differential equations group “GEDNOL” at Universidad Tecnologica de Pereira, Colombia. Member of the research group in Applied Mathematics and Education “GIMAE” at the Universidad Tecnológica de Pereira.

i	x		Q
0	1	1	3	6
1	23.636	37.272	43.636	-21.818
2	4	4	0	0

i	d		alpha		betha
0	3	6	-	-
1	59.504	0.9917	0.4545	11.636


Interactions	System dimensions
2	9
3	14


Interactions	System dimensions
2	2
3	3
4	4
5	5