e04dg:: Minimizing or Maximizing a Function (NAG Toolbox)

nag_opt_uncon_conjgrd_comp (e04dg) minimizes an unconstrained nonlinear function of several variables using a pre-conditioned, limited memory quasi-Newton conjugate gradient method. First derivatives (or an ‘acceptable’ finite difference approximation to them) are required. It is intended for use on large scale problems.

Before calling nag_opt_uncon_conjgrd_comp (e04dg), or the option setting function nag_opt_uncon_conjgrd_option_string (e04dk), function nag_opt_init (e04wb) must be called.

nag_opt_uncon_conjgrd_comp (e04dg) is designed to solve unconstrained minimization problems of the form

\underset{x \in R^{n}}{minimize} F (x) subject to - \infty \leq x \leq \infty,

where

x

is an

n

-element vector.

You must supply an initial estimate of the solution.

For maximum reliability, it is preferable to provide all first partial derivatives. If all of the derivatives cannot be provided, you are recommended to obtain approximate values (using finite differences) by calling nag_opt_estimate_deriv (e04xa) from within objfun.

The method used by nag_opt_uncon_conjgrd_comp (e04dg) is described in Algorithmic Details.

Gill P E and Murray W (1979) Conjugate-gradient methods for large-scale nonlinear optimization Technical Report SOL 79-15 Department of Operations Research, Stanford University

Gill P E, Murray W and Wright M H (1981) Practical Optimization Academic Press

Note: nag_opt_uncon_conjgrd_comp (e04dg) may return useful information for one or more of the following detected errors or warnings.

Errors or warnings detected by the function:

On successful exit (

ifail = 0

) the accuracy of the solution will be as defined by the optional parameter Optimality Tolerance (

default value = ε^{0.8}

).

To evaluate an ‘acceptable’ set of finite difference intervals using nag_opt_estimate_deriv (e04xa) requires

2

function evaluations per variable for a well-scaled problem and up to

6

function evaluations per variable for a badly scaled problem.

This section describes the intermediate printout and final printout produced by nag_opt_uncon_conjgrd_comp (e04dg). You can control the level of printed output (see the description of the optional parameter Print Level). Note that the intermediate printout and final printout are produced only if

Print Level \geq 10

(the default ).

The following line of summary output (

< 80

characters) is produced at every iteration. In all cases, the values of the quantities are those in effect on completion of the given iteration.

Itn	is the iteration count.
Step	is the step $α_{k}$ taken along the computed search direction. On reasonably well-behaved problems, the unit step (i.e., $α_{k} = 1$ ) will be taken as the solution is approached.
Nfun	is the cumulated number of evaluations of the objective function needed for the linesearch. Evaluations needed for the verification of the gradients by finite differences are not included. Nfun is printed as a guide to the amount of work required for the linesearch. nag_opt_uncon_conjgrd_comp (e04dg) will perform at most $11$ function evaluations per iteration.
Objective	is the value of the objective function at $x_{k}$ .
Norm G	is the Euclidean norm of the gradient of the objective function at $x_{k}$ .
Norm X	is the Euclidean norm of $x_{k}$ .
Norm (X(k-1)-X(k))	is the Euclidean norm of $x_{k - 1} - x_{k}$ .

The following describes the printout for each variable.

Variable	gives the name (Varbl) and index $j$ , for $j = 1, 2, \dots, n$ of the variable.
Value	is the value of the variable at the final iteration.
Gradient Value	is the value of the gradient of the objective function with respect to the $j$ th variable at the final iteration.

Numerical values are output with a fixed number of digits; they are not guaranteed to be accurate to this precision.

This example finds a minimum of the function

F = e^{x_{1}} (4 x_{1}^{2} + 2 x_{2}^{2} + 4 x_{1} x_{2} + 2 x_{2} + 1) .

The initial point is

x_{0} = {(- 1.0, 1.0)}^{T},

and

F (x_{0}) = 1.8394

(to five figures).

The optimal solution is

x^{*} = {(0.5, - 1.0)}^{T},

and

F (x^{*}) = 0

.

Note: the remainder of this document is intended for more advanced users. Algorithmic Details contains a detailed description of the algorithm which may be needed in order to understand Optional Parameters. Optional Parameters describes the optional parameters which may be set by calls to nag_opt_uncon_conjgrd_option_string (e04dk).

This section contains a description of the method used by nag_opt_uncon_conjgrd_comp (e04dg).

nag_opt_uncon_conjgrd_comp (e04dg) uses a pre-conditioned conjugate gradient method and is based upon algorithm PLMA as described in Section 4.8.3 of Gill and Murray (1979) and Gill et al. (1981).

The algorithm proceeds as follows:

Let

x_{0}

be a given starting point and let

k

denote the current iteration, starting with

k = 0

. The iteration requires

g_{k}

, the gradient vector evaluated at

x_{k}

, the

k

th estimate of the minimum. At each iteration a vector

p_{k}

(known as the direction of search) is computed and the new estimate

x_{k + 1}

is given by

x_{k} + α_{k} p_{k}

where

α_{k}

(the step length) minimizes the function

F (x_{k} + α_{k} p_{k})

with respect to the scalar

α_{k}

. A choice of initial step

α_{0}

is taken as

α_{0} = \min \{1, 2 \times |F_{k} - F_{est}| / g_{k}^{T} g_{k}\}

where

F_{est}

is a user-supplied estimate of the function value at the solution. If

F_{est}

is not specified, the software always chooses the unit step length for

α_{0}

. Subsequent step length estimates are computed using cubic interpolation with safeguards.

A quasi-Newton method can be used to compute the search direction

p_{k}

by updating the inverse of the approximate Hessian

(H_{k})

and computing

p_{k + 1} = - H_{k + 1} g_{k + 1} .

(1)

The updating formula for the approximate inverse is given by

H_{k + 1} = H_{k} - \frac{1}{y_{k}^{T} s_{k}} (H_{k} y_{k} s_{k}^{T} + s_{k} y_{k}^{T} H_{k}) + \frac{1}{y_{k}^{T} s_{k}} (1 + \frac{y_{k}^{T} H_{k} y_{k}}{y_{k}^{T} s_{k}}) s_{k} s_{k}^{T},

(2)

where

y_{k} = g_{k - 1} - g_{k}

and

s_{k} = x_{k + 1} - x_{k} = α_{k} p_{k}

.

The method used to obtain the search direction is based upon computing

p_{k + 1}

as

- H_{k + 1} g_{k + 1}

where

H_{k + 1}

is a matrix obtained by updating the identity matrix with a limited number of quasi-Newton corrections. The storage of an

n

by

n

matrix is avoided by storing only the vectors that define the rank two corrections – hence the term ‘limited-memory’ quasi-Newton method. The precise method depends upon the number of updating vectors stored. For example, the direction obtained with the ‘one-step’ limited memory update is given by (1) using (2) with

H_{k}

equal to the identity matrix, viz.

p_{k + 1} = - g_{k + 1} + \frac{1}{y_{k}^{T} s_{k}} (s_{k}^{T} g_{k + 1} y_{k} + y_{k}^{T} g_{k + 1} s_{k}) - \frac{s_{k}^{T} g_{k + 1}}{y_{k}^{T} s_{k}} (1 + \frac{y_{k}^{T} y_{k}}{y_{k}^{T} s_{k}}) s_{k} .

Using a limited-memory quasi-Newton formula, such as the one above, guarantees

p_{k + 1}

to be a descent direction if all the inner products

y_{k}^{T} s_{k}

are positive for all vectors

y_{k}

and

s_{k}

used in the updating formula.

Several optional parameters in nag_opt_uncon_conjgrd_comp (e04dg) define choices in the problem specification or the algorithm logic. In order to reduce the number of formal arguments of nag_opt_uncon_conjgrd_comp (e04dg) these optional parameters have associated default values that are appropriate for most problems. Therefore, you need only specify those optional parameters whose values are to be different from their default values.

The remainder of this section can be skipped if you wish to use the default values for all optional parameters.

The following is a list of the optional parameters available. A full description of each optional parameter is provided in Description of the s.

Optional parameters may be specified by calling nag_opt_uncon_conjgrd_option_string (e04dk) before a call to nag_opt_uncon_conjgrd_comp (e04dg).

nag_opt_uncon_conjgrd_option_string (e04dk) can be called to supply options directly, one call being necessary for each optional parameter. For example,

[lwsav, iwsav, rwsav, inform] = e04dk('Print Level = 1', lwsav, iwsav, rwsav);

nag_opt_uncon_conjgrd_option_string (e04dk) should be consulted for a full description of this method of supplying optional parameters.

All optional parameters not specified by you are set to their default values. Optional parameters specified by you are unaltered by nag_opt_uncon_conjgrd_comp (e04dg) (unless they define invalid values) and so remain in effect for subsequent calls unless altered by you.

For each option, we give a summary line, a description of the optional parameter and details of constraints.

The summary line contains:

Keywords and character values are case and white space insensitive.

This special keyword may be used to reset all optional parameters to their default values.

This value of

r

specifies the user-supplied guess of the optimum objective function value

F_{est}

. This value is used to calculate an initial step length

α_{0}

(see Algorithmic Details). If the value of

r

is not specified (the default), then this has the effect of setting

α_{0}

to unity. It should be noted that for badly scaled functions a unit step along the steepest descent direction will often compute the objective function at very large values of

x

.

The parameter defines

ε_{r}

, which is intended to be a measure of the accuracy with which the problem function

F (x)

can be computed. If

r < ε

or

r \geq 1

, the default value is used.

The value of

ε_{r}

should reflect the relative precision of

1 + |F (x)|

; i.e.,

ε_{r}

acts as a relative precision when

|F|

is large, and as an absolute precision when

|F|

is small. For example, if

F (x)

is typically of order

1000

and the first six significant digits are known to be correct, an appropriate value for

ε_{r}

would be

10^{- 6}

. In contrast, if

F (x)

is typically of order

10^{- 4}

and the first six significant digits are known to be correct, an appropriate value for

ε_{r}

would be

10^{- 10}

. The choice of

ε_{r}

can be quite complicated for badly scaled problems; see Chapter 8 of Gill et al. (1981) for a discussion of scaling techniques. The default value is appropriate for most simple functions that are computed with full accuracy. However when the accuracy of the computed function values is known to be significantly worse than full precision, the value of

ε_{r}

should be large enough so that no attempt will be made to distinguish between function values that differ by less than the error inherent in the calculation.

The value of

i

specifies the maximum number of iterations allowed before termination. If

i < 0

, the default value is used.

Problems whose Hessian matrices at the solution contain sets of clustered eigenvalues are likely to be minimized in significantly fewer than

n

iterations. Problems without this property may require anything between

n

and

5 n

iterations, with approximately

2 n

iterations being a common figure for moderately difficult problems.

The value

r

controls the accuracy with which the step

α

taken during each iteration approximates a minimum of the function along the search direction (the smaller the value of

r

, the more accurate the linesearch). The default value

r = 0.9

requests an inaccurate search, and is appropriate for most problems. A more accurate search may be appropriate when it is desirable to reduce the number of iterations – for example, if the objective function is cheap to evaluate. If

r < 0

or

r \geq 1

, the default value is used.

Normally each optional parameter specification is printed as it is supplied. Optional parameter Nolist may be used to suppress the printing and optional parameter List may be used to restore printing.

If

r > 0

, the maximum allowable step length for the linesearch is taken as

\min (\frac{1}{x02am ()}, \frac{r}{‖p_{k}‖})

. If

r \leq 0

, the default value is used.

The parameter

r

specifies the accuracy to which you wish the final iterate to approximate a solution of the problem. Broadly speaking,

r

indicates the number of correct figures desired in the objective function at the solution. For example, if

r

is

10^{- 6}

and termination occurs with

ifail = 0

(see Arguments), then the final point satisfies the termination criteria, where

τ_{F}

represents Optimality Tolerance. If

r < ε_{r}

or

r \geq 1

, the default value is used.

The value

i

controls the amount of printout produced by nag_opt_uncon_conjgrd_comp (e04dg), as indicated below. A detailed description of the printout is given in Description of Printed Output (summary output at each iteration and the final solution).

$i$	Output
$0$	No output.
$1$	The final solution only.
$5$	One line of summary output ( $< 80$ characters; see Description of Printed Output) for each iteration (no printout of the final solution).
$10$	The final solution and one line of summary output for each iteration.

These keywords take effect only if

Verify Level > 0

. They may be used to control the verification of gradient elements computed by objfun. For example, if the first

30

elements of the objective gradient appeared to be correct in an earlier run, so that only element

31

remains questionable, it is reasonable to specify

Start Objective Check at Variable = 31

. If the first

30

variables appear linearly in the objective, so that the corresponding gradient elements are constant, the above choice would also be appropriate.

If

i_{1} \leq 0

or

i_{1} > \max (1, \min (n, i_{2}))

, the default value is used. If

i_{2} \leq 0

or

i_{2} > n

, the default value is used.

These keywords refer to finite difference checks on the gradient elements computed by objfun. Gradients are verified at the user-supplied initial estimate of the solution. The possible choices for

i

are as follows:

$i$	Meaning
$- 1$	No checks are performed.
$0$	Only a ‘cheap’ test will be performed, requiring one call to objfun.
$1$	In addition to the ‘cheap’ test, individual gradient elements will also be checked using a reliable (but more expensive) test.

For example, the objective gradient will be verified if Verify,

Verify = YES

, Verify Gradients, Verify Objective Gradients or

Verify Level = 1

is specified.

(i)	$F_{k - 1} - F_{k} < τ_{F} (1 + \|F_{k}\|)$
(ii)	$‖x_{k - 1} - x_{k}‖ < \sqrt{τ_{F}} (1 + ‖x_{k}‖)$
(iii)	$‖g_{k}‖ \leq \sqrt[3]{τ_{F}} (1 + \|F_{k}\|)$ or $‖g_{k}‖ < ε_{A}$

NAG Toolbox: nag_opt_uncon_conjgrd_comp (e04dg)

▸▿ Contents

Purpose

Syntax

Description

References

Parameters

Compulsory Input Parameters

Optional Input Parameters

Output Parameters

Error Indicators and Warnings

Accuracy

Further Comments

Description of Printed Output

Example

Algorithmic Details

Optional Parameters

Description of the Optional Parameters