Note:this function usesoptional parametersto define choices in the problem specification and in the details of the algorithm. If you wish to use default settings for all of the optional parameters, you need only read Sections 1 to 10 of this document. If, however, you wish to reset some or all of the settings please refer to Section 11 for a detailed description of the algorithm and to Section 12 for a detailed description of the specification of the optional parameters.
e04ggc is a bound-constrained nonlinear least squares trust region solver (BXNL) from the NAG optimization modelling suite aimed for small to medium-scale problems.
The function may be called by the names: e04ggc or nag_opt_handle_solve_bxnl.
3Description
e04ggc computes a solution to the nonlinear least squares problem
(1)
where , are smooth nonlinear functions called residuals,
are weights (by default they are all defined to , see Section 9.2 on how to change them), and the rightmost element represents the regularization term with parameter and power (by default the regularization term is not used, see Section 11 on how to enable it). The constraint elements and are -dimensional vectors defining the bounds on the variables.
Typically in a calibration or data fitting context, the residuals will be defined as the difference between the observed values at
and the values provided by a
nonlinear model , i.e., . If these residuals (errors) follow a Gaussian distribution, then the values of the optimal parameter vector are the maximum likelihood estimates. For a description of the various algorithms implemented for solving this problem see Section 11. It is also recommended that you read Section 2.2.3 in the E04 Chapter Introduction.
e04ggc serves as a solver for problems stored as a handle. The handle points to an internal data structure which defines the problem and serves as a means of communication for functions in the NAG optimization modelling suite. First, the problem handle is initialized by calling e04rac. A nonlinear least square residual objective can be added by calling e04rmc and, optionally, (simple) box constraints can be defined by calling e04rhc. It should be noted that e04ggc internally works with a dense representation of the residual Jacobian even if a sparse structure was defined in the call to e04rmc. Once the problem is fully described, the handle may be passed to the solver e04ggc. When the handle is not needed anymore, e04rzc should be called to destroy it and deallocate the memory held within. For more information refer to the NAG optimization modelling suite in Section 4.1 in the E04 Chapter Introduction.
The algorithm is based on the trust region framework and its behaviour can be modified by various optional parameters (see Section 12) which can be set by e04zmc and e04zpc anytime between the initialization of the handle by e04rac and a call to the solver. Once the solver has finished, options may be modified for the next solve. The solver may be called repeatedly with various starting points and/or optional parameters. The option getter e04znc can be called to retrieve the current value of any option.
Several options might have significant impact on the performance of the solver.
Even though the defaults were chosen to suit the majority of anticipated problems, it is recommended that you experiment with the option settings to find the most suitable set of options for a particular problem, see Sections 11 and 12 for further details.
4References
Adachi S,
Iwata S,
Nakatsukasa Y, and
Takeda A
(2015)
Solving the trust region subproblem by a generalized eigenvalue problem
Technical report, METR 2015-14.
Mathematical Engineering, The University of Tokyo
https://www.keisu.t.u-tokyo.ac.jp/data/2015/METR15-14.pdf
Conn A R, Gould N I M and Toint Ph L (2000) Trust Region Methods SIAM, Philadephia
Gould N I M,
Orban D, and
Toint Ph L
(2003)
GALAHAD, a library of thread-safe Fortran 90 packages for
large-scale nonlinear optimization
ACM Transactions on Mathematical Software (TOMS)29(4)
353–372
Kanzow C,
Yamashita N, and
Fukushima M
(2004)
Levenberg-Marquardt methods with strong local convergence
properties for solving nonlinear equations with convex
constraints
Journal of Computational and Applied Mathematics174
375–397
Lanczos C
(1956)
Applied Analysis
272–280
Prentice Hall, Englewood Cliffs, NJ, USA
Nielsen H B
(1999)
Damping parameter in Marquadt’s Method
Technical report TR IMM-REP-1999-05.
Department of Mathematical Modelling, Technical University of Denmark
http://www2.imm.dtu.dk/documents/ftp/tr99/tr05_99.pdf
Nocedal J and Wright S J (2006) Numerical Optimization (2nd Edition) Springer Series in Operations Research, Springer, New York
5Arguments
1: – void *Input
On entry: the handle to the problem. It needs to be initialized (e.g., by e04rac) and to hold a problem formulation compatible with e04ggc. It must not be changed between calls to the NAG optimization modelling suite.
2: – function, supplied by the userExternal Function
lsqfun must evaluate the value of the nonlinear residuals, , at a specified point .
On entry: , the current number of decision variables,
, in the model.
2: – const doubleInput
On entry: , the vector of variable values at which the residuals, , are to be evaluated.
3: – IntegerInput
On entry: , the current number of residuals in the model.
4: – doubleOutput
On exit: the value of the residual vector, , evaluated at .
5: – Integer *Input/Output
On entry: a non-negative value.
On exit: may be used to indicate that some residuals could not be computed at the requested point. This can be done by setting inform to a negative value. The solver will attempt a rescue procedure and request an alternative point. If the rescue procedure fails, the solver will exit with NE_USER_NAN.
6: – Nag_Comm *
Pointer to structure of type Nag_Comm; the following members are relevant to lsqfun.
user – double *
iuser – Integer *
p – Pointer
The type Pointer will be void *. Before calling e04ggc you may allocate memory and initialize these pointers with various quantities for use by lsqfun when called from e04ggc (see Section 3.1.1 in the Introduction to the NAG Library CL Interface).
3: – function, supplied by the userExternal Function
lsqgrd evaluates the residual gradients, , at a specified point .
On entry: , the current number of decision variables,
, in the model.
2: – const doubleInput
On entry: , the vector of variable values at which the
residual gradients, , are to be evaluated.
3: – IntegerInput
On entry: , the current number of residuals in the model.
4: – IntegerInput
On entry: the number of nonzeros in the first derivative matrix.
If
in the call to e04rmc (recommended use for e04ggc) then
.
5: – doubleInput/Output
On entry: the elements should only be assigned and not referenced.
On exit: the vector containing the nonzero residual gradients
evaluated at ,
where
The elements must be stored in the same order as the defined sparsity pattern
provided in the call to e04rmc.
6: – Integer *Input/Output
On entry: a non-negative value.
On exit: may be used to indicate that the residual gradients could not be computed at the requested point. This can be done by setting inform to a negative value. The solver will attempt a rescue procedure and request an alternative point. If the rescue procedure fails, the solver will exit with NE_USER_NAN.
7: – Nag_Comm *
Pointer to structure of type Nag_Comm; the following members are relevant to lsqgrd.
user – double *
iuser – Integer *
p – Pointer
The type Pointer will be void *. Before calling e04ggc you may allocate memory and initialize these pointers with various quantities for use by lsqgrd when called from e04ggc (see Section 3.1.1 in the Introduction to the NAG Library CL Interface).
4: – function, supplied by the userExternal Function
lsqhes evaluates the residual Hessians,
,
at a specified point .
By default, the optional parameter and lsqhes
is never called. lsqhes may be
specified as NULLFN.
This function will only be called if the optional parameter and
if the model (see Section 11.2)
requires second order
information. Under these circumstances, if you do not provide a valid
lsqhes the solver will terminate with either
NE_DERIV_ERRORS or NE_FAILED_START.
On entry: , the current number of decision variables,
, in the model.
2: – const doubleInput
On entry: , the vector of decision variables at the current iteration.
3: – IntegerInput
On entry: , the current number of residuals in the model.
4: – const doubleInput
On entry:
,
the vector containing the (weighted) residuals at , .
See (1) and Section 9.2.
5: – doubleInput/Output
On entry: the elements should only be assigned and not referenced.
On exit: a dense square (symmetric) matrix containing the weighted sum of residual Hessians,
where
is also a dense square (symmetric) matrix containing the th residual Hessian evaluated at the point . All matrix elements must be
provided: both upper and lower triangular parts.
6: – Integer *Input/Output
On entry: a non-negative value.
On exit: may be used to indicate that one or more elements of the residual Hessian could not be computed at the requested point. This can be done by setting inform to a negative value. The solver will attempt a rescue procedure and if the rescue procedure fails, the solver will exit with NE_USER_NAN.
7: – Nag_Comm *
Pointer to structure of type Nag_Comm; the following members are relevant to lsqhes.
user – double *
iuser – Integer *
p – Pointer
The type Pointer will be void *. Before calling e04ggc you may allocate memory and initialize these pointers with various quantities for use by lsqhes when called from e04ggc (see Section 3.1.1 in the Introduction to the NAG Library CL Interface).
5: – function, supplied by the userExternal Function
lsqhprd evaluates the residual Hessians,
,
at a specified point, , and performs matrix-vector
products with a given vector, ,
returning the dense matrix
.
If you do not supply this function, it may be
specified as NULLFN.
On entry: , the current number of decision variables,
, in the model.
2: – const doubleInput
On entry: , the vector of decision variables at the current iteration.
3: – const doubleInput
On entry: , the vector used to perform the required matrix-vector products.
4: – IntegerInput
On entry: , the current number of residuals in the model.
5: – doubleInput/Output
Note: the th element of the matrix is stored in .
On entry: the elements should only be assigned and not referenced.
On exit:
a dense matrix of size
containing the following matrix-vector products,
6: – Integer *Input/Output
On entry: The first call to lsqhprd will have a non-zero value and can be used to optimize your code in order to avoid recalculations of common quantities when evaluating the Hessians. For all other instances inform will have a value of zero. This notification parameter may be safely ignored if such optimization is not required.
On exit: may be used to indicate that one or more elements of the residual Hessian could not be computed at the requested point. This can be done by setting inform to a negative value. The solver will attempt a rescue procedure and if the rescue procedure fails, the solver will exit with NE_USER_NAN. The value of inform returned on the first call is ignored.
7: – Nag_Comm *
Pointer to structure of type Nag_Comm; the following members are relevant to lsqhprd.
user – double *
iuser – Integer *
p – Pointer
The type Pointer will be void *. Before calling e04ggc you may allocate memory and initialize these pointers with various quantities for use by lsqhprd when called from e04ggc (see Section 3.1.1 in the Introduction to the NAG Library CL Interface).
6: – function, supplied by the userExternal Function
monit is provided to enable monitoring of the progress of the optimization and, if necessary, to halt the optimization process.
If no monitoring is required, monit may be specified as NULLFN.
monit is called at the end of every th step where is controlled by the optional parameter (the default value is , monit is not called).
On entry: , the current number of decision variables,
, in the model.
2: – const doubleInput
On entry: the current best point.
3: – Integer *Input/Output
On entry: a non-negative value.
On exit: may be used to request the solver to stop immediately
by setting inform to a non-zero value in which case it will terminate
with NE_USER_STOP; otherwise, the solver will proceed normally.
4: – const doubleInput
On entry: best objective value computed and various indicators (the values are as described in the main argument rinfo).
5: – const doubleInput
On entry: solver statistics at monitoring steps or at the end of the current iteration (the values are as described in the main argument stats).
6: – Nag_Comm *
Pointer to structure of type Nag_Comm; the following members are relevant to monit.
user – double *
iuser – Integer *
p – Pointer
The type Pointer will be void *. Before calling e04ggc you may allocate memory and initialize these pointers with various quantities for use by monit when called from e04ggc (see Section 3.1.1 in the Introduction to the NAG Library CL Interface).
7: – IntegerInput
On entry: , the current number of decision variables,
, in the model.
8: – doubleInput/Output
On entry: , the initial estimates of the variables, .
On exit: the final values of the variables, .
9: – IntegerInput
On entry: , the current number of residuals in the model.
10: – doubleOutput
On exit: the values of the residuals at the final point given in x.
11: – doubleOutput
On exit: objective value and various indicators at monitoring steps or at the end of the final iteration. The measures are given in the table below:
Norm of the scaled projected gradient at the current iterate, see (8) in Section 11.5
Norm of the step between the current and previous iterate.
Convergence tests result. A scalar value between
indicates whether a convergence test has passed. Specifically,
indicates small objective test passed,
indicates small (scaled) gradient test passed,
indicates small step test passed. In the case where two or more tests passed, they are
accumulated.
Norm of the current iterate . If regularization is requested, then this value was used in the regularization and it might differ from if has fixed or disabled elements.
Reserved for future use.
12: – doubleOutput
On exit: solver statistics at monitoring steps or at the end of the final iteration as given in the table below:
Number of iterations performed.
Total number of calls to the objective function lsqfun.
Total number of calls to the objective gradient function lsqgrd.
Total number of calls to the objective Hessian function lsqhes.
Total time in seconds spent in the solver. It includes time spent in user-supplied subroutines.
Number of calls to the objective function lsqfun required by linesearch steps.
Number of calls to the objective gradient function lsqgrd required by linesearch steps.
Number of calls to the objective function lsqfun required by projected gradient steps.
Number of calls to the objective gradient function lsqgrd required by projected gradient steps.
Number of inner iterations performed, see option .
Number of linesearch iterations performed.
Number of projected gradient iterations performed.
Total number of calls to the objective auxiliary Hessian function lsqhprd.
Reserved for future use.
13: – Nag_Comm *
The NAG communication argument (see Section 3.1.1 in the Introduction to the NAG Library CL Interface).
14: – NagError *Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).
e04ggc returns with NE_NOERROR if the iterates have converged to a point that satisfies the convergence criteria described in Section 11.5.
6Error Indicators and Warnings
NE_ALLOC_FAIL
Dynamic memory allocation failed.
See Section 3.1.2 in the Introduction to the NAG Library CL Interface for further information.
NE_BAD_PARAM
On entry, argument had an illegal value.
NE_DC_MATCH
Data for residual weights not found or is invalid.
Custom residual weights are required, optional parameter , but the weights data is missing, of the wrong expected size or has invalid values. Please refer to Section 9.2.
NE_DERIV_ERRORS
Exact second derivatives needed for tensor model.
Model in the optional parameter requires exact second derivatives but . Provide second derivatives via lsqhes and optionally lsqhprd functions, and set optional parameter .
NE_FAILED_START
The current starting point is unusable.
While trying to evaluate the starting point , either inform was set to a non-zero value within the user-supplied functions, lsqfun, lsqgrd or lsqhes, or an Infinity or NaN was detected in values returned from them.
NE_HANDLE
The supplied handle does not define a valid handle to the data structure for the NAG optimization modelling suite. It has not been properly initialized or it has been corrupted.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
See Section 7.5 in the Introduction to the NAG Library CL Interface for further information.
NE_NO_IMPROVEMENT
The solver was terminated because no further progress could be achieved.
This can indicate that the solver is calculating very small step sizes and is making very little progress. It could also indicate that the problem has been solved to the best numerical accuracy possible given the current scaling.
It can also indicate that a recovery procedure was interrupted due to the user-supplied function lsqgrd being incorrect.
NE_NO_LICENCE
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library CL Interface for further information.
NE_NUM_DIFFICULTIES
Numerical difficulties encountered and solver was terminated.
This error can be caused by an ill-posed or poorly scaled problem.
NE_PHASE
The problem is already being solved.
Unsupported model and method chosen.
The specified model in optional parameter is not supported by the method specified by the optional parameter .
Unsupported option combinations.
The specified combination of values for optional parameters and is not supported.
NE_REF_MATCH
On entry, , expected .
Constraint: nvar must match the current number of variables of the model in the handle.
The information supplied does not match with that previously stored.
On entry, must match that given during the definition of the objective in the handle, i.e., .
There are no decision variables.
nvar must be greater than zero.
NE_SETUP_ERROR
This solver does not support the model defined in the handle.
NE_TIME_LIMIT
The solver terminated after the maximum time allowed was exceeded.
Maximum number of seconds exceeded. Use optional parameter to reset the limit.
NE_TOO_MANY_ITER
Maximum number of iterations reached.
Use optional parameter to reset the limit.
NE_TOO_MANY_MINOR_ITER
Iteration limit reached while solving a subproblem.
Maximum number of iterations reached while trying to solve an auxiliary subproblem.
Line Search failed.
Line Search in the projected gradient direction did not find an acceptable new iterate.
NE_USER_NAN
Invalid number detected in user-supplied function and recovery failed.
Either inform
was set to a non-zero value within one of the user-supplied functions, lsqfun, lsqgrd, lsqhes, or lsqhprd, or an Infinity or NaN was detected in values returned from them and the recovery attempt failed.
NE_USER_STOP
User requested termination during a monitoring step. inform was set to a non-zero value within the user-supplied function monit.
7Accuracy
The accuracy of the solution is determined by
optional parameters
,
,
,
, and
.
If NE_NOERROR on exit, the returned point satisfies
(7),
(8) or
(9) to the defined accuracies.
Please refer to Section 11.5 and the description of the particular options in Section 12.
8Parallelism and Performance
e04ggc is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
e04ggc makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this function. Please also consult the Users' Note for your implementation for any additional implementation-specific information.
9Further Comments
9.1Description of the Printed Output
The solver can print information to give an overview of the problem and the progress of the computation. The output may be sent to two independent file ID which are set by optional parameters and . Optional parameters , , and determine the exposed level of detail. This allows, for example, a detailed log file to be generated while the condensed information is displayed on the screen.
By default (, ), the following sections are printed to the standard output:
Header
Optional parameters list (if )
Problem statistics
Iteration log
Summary
Solution (if )
Header
The header is a message indicating the start of the solver. It should look like:
----------------------------------------------------------------------
E04GG, Nonlinear least squares method for bound-constrained problems
----------------------------------------------------------------------
Optional parameters list
If , a list of the optional parameters and their values is printed before the problem statistics. The list shows all options of the solver, each displayed on one line. Each line contains the option name, its current value and an indicator for how it was set. The options unchanged from their defaults are noted by ‘d’ and the ones you have set are noted by ‘U’. Note that the output format is compatible with the file format expected by e04zpc. The output looks similar to:
Begin of Options
Bxnl Model = Gauss-newton * U
Bxnl Nlls Method = Galahad * d
Bxnl Glob Method = Reg * U
Bxnl Reg Order = Auto * d
Bxnl Tn Method = Min-1-var * d
Bxnl Basereg Pow = 2.00000E+00 * d
Bxnl Basereg Term = 1.00000E-02 * d
Bxnl Iteration Limit = 1000 * d
Bxnl Use Second Derivatives = Yes * U
End of Options
Problem statistics
If , statistics on the problem are printed, for example:
Problem Statistics
No of variables 4 (+2 disabled, +0 fixed)
free (unconstrained) 3
bounded 1
Objective function LeastSquares
No of residuals 16 (+8 disabled)
Iteration log
If , the solver will print a summary line for each step. An iteration is considered successful when it yields a decrease of the objective, either sufficiently close to the decrease predicted by the model or to a given relative threshold. Each line shows the iteration number (Iter), the value of the objective function (error), the absolute and relative norms for the projected gradient (optim) and (rel optim), this last one is used in the convergence test of equation (8). The output looks as follows:
If , each line additionally shows the current value of the trust region radius (Delta), quality of the model (rho), some flags relating to the iteration (S2IF), inner iteration counter (inn it) for the tensor Newton model, the step length taken (step), trust region loop exit status (loop), performed line search type (LS), as well as the projection factor over the constraints (tau). It might look as follows:
Iteration flags column (S2IF) contains four flags related to the iteration. Flag ‘S’ indicates if the trust region iteration was successful (S) or unsuccessful (U). Flag ‘2’ shows if iteration used second-order information: yes (Y), no (N), tensor (T), or approximate (A). Flag ‘I’ indicates iteration type: regular (R) or inner (I). Exit flag of inner solver ‘F’ has three states: subproblem converged (C), not solved (E), or current iteration is inside subproblem or tensor model not used (-). For details on the interpretation of rho and tau, see Section 11.
If Tensor-Newton model is chosen, then details of each inner iteration can be printed by setting , output is similar to:
Note the iteration type flag ‘I’ change under the S2IF column, the output reports on 2 (R) regular iterations where each required 3 (I) inner iterations.
Additionally, if , each iteration produces more information that expands over several lines. This additional information can contain:
Details on the trust region subproblem;
Iteration log for auxiliary iterative methods;
Line Search iteration logs.
The output might look as follows:
*** Solving the trust region subproblem using More-Sorensen ***
A is symmetric positive definite
iter nd sigma sigma_shift
0 3.6778E-01 0.0000E+00 0.0000E+00
nq = 7.7000E+02
1 1.2571E-02 6.4469E-06 6.4469E-06
We're within the trust region radius
Leaving More-Sorensen
Model evaluated successfully: m_k(d) = 2.3089E-08
*** Subproblem solution found ***
Actual reduction (in cost function) = 1.2065E-09
Predicted reduction (in model) = 4.1866E-09
rho returned = 2.8819E-01
Successful step -- Delta staying at 1.2570E-02
Summary
Once the solver finishes, a summary is produced:
-------------------------------------------------------------------------------
Status: converged, an optimal solution was found
small (scaled) projected gradient norm
-------------------------------------------------------------------------------
Value of the objective 2.17328E-06
Norm of projected gradient 1.51989E-08
Norm of scaled projected gradient 7.29019E-06
Norm of step 4.98107E-04
Iterations 80
Inner iterations 0
LS iterations 0
PG iterations 0
Function evaluations 81
Gradient evaluations 81
Hessian evaluations (objhes) 0
Hessian evaluations (objhprd) 0
LS function calls 0
LS gradient calls 0
PG function calls 0
PG gradient calls 0
Optionally, if , the timings are printed:
Timing
Total time spent 2.43 sec
-------------------------------------------------------------------------------
Solution
If , the values of the primal variables are printed, furthermore if the problem is constrained, the dual variables are also reported, see Lagrangian Multipliers in e04kfc and the dual variables storage format described in Section 3.1 in e04svc. It might look as follows:
Primal variables:
idx Lower bound Value Upper bound
1 0.00000E+00 4.58516E-01 1.00000E+00
2 -inf 3.05448E+00 inf
3 -inf 4.65146E+00 inf
4 Disabled NaN Disabled
Box bounds dual variables:
idx Lower bound Value Upper bound Value
1 0.00000E+00 0.00000E+00 1.00000E+00 9.52218E-10
2 -inf 4.66962E-11 inf 0.00000E+00
3 -inf 6.55098E-11 inf 0.00000E+00
4 Disabled NaN Disabled NaN
9.2Residual Weights
A typical use for weights in the least square fitting context is to account for uncertainity in the observed data, , by setting the weights to
The idea behind this choice is to give less importance (small weight) to measurements which have large variance.
In order to use weights,
1.request to use weights by setting the optional parameter (this will request the solver to query the handle for an array of weights), and
2.store the weights array in the handle. This is done by calling e04rxc with the command and passing the array length and weights array, and , respectively. Weights are required for each residual and all weights must be positive.
These steps must be done after the handle is initialized (via e.g., e04rac) but before calling the solver e04ggc. The stored weights in the handle will only be accessed if , otherwise all weights are assumed to be and the handle is not queried for residual weights.
If the solver is expecting to use weights but they are not provided, or the array length is wrong or have non-positive values, then the solver will terminate with NE_DC_MATCH.
9.3Internal Changes
Internal changes have been made to this function as follows:
At Mark 27.1:
Added support for underdetermined problems. Solver can export covariance matrix at a solution point.
For details of all known issues which have been reported for the NAG Library please refer to the Known Issues.
10Example
In this example we solve the Lanczos-3 Lanczos (1956) problem, a nonlinear least squares regression. The regression problem consists of observations, , to be fitted over the parameter model
and the residuals for the regression are
where
The following bounds are defined on the variables
The initial guess is .
The expected solution is with residual norm close to .
This section contains a short description of the underlying algorithms used in e04ggc, a bound-constrained nonlinear least squares (BXNL) solver that uses a model-based trust region framework adapted to exploit the least squares problem structure. It is based on a collaborative work between NAG and the STFC Rutherford Appleton Laboratory. For further details, see Gould et al. (2017) and references therein.
11.1Trust Region Algorithm
In this section, we are interested in generic nonlinear least squares problems of the form
(2)
where , , are smooth nonlinear functions called redisuals, are the weights (by default they are all defined to , see Section 9.2 on how to change them), and the rightmost element represents the optional regularization term of parameter and power . The constraint elements and are -dimensional vectors defining the bounds on the variables. For the rest of this chapter, and without any loss of generality, it is assumed that weights are all set to the default value of and are excluded from the formulae. e04ggc is an iterative framework for solving (2) which consists of a variety of algorithms that solve the trust region subproblem. The fundamental ideas of the framework follow.
At each point , the algorithm builds a model of the function at the next step, , which we refer to as (see Section 11.2).
Once the model has been formed, the candidate for the next point is found by solving a suitable subproblem (see Section 11.3).
Let
be the Euclidean projection operator over the feasible set, then the quantity
is used to assess the quality of the proposed step. If it is sufficiently large we accept the step and is set to
;
if not, the trust region radius is reduced and the resulting new trust region
subproblem is solved. If the step is very successful ( is close to ), is increased.
Under certain circumstances, it is deemed that the projection of the current point with the trust region step will not produce a successful point and
the new step is calculated using a convenient line search step.
This process continues until a point is found that satisfies the stopping criteria described in Section 11.5. More precisely, it can be described as:
BXNL Algorithm
1.Initialization
Set , make a feasible initial guess and set trust region radius .
2.Iterationk
(i)Stop if is a solution point (see stopping criteria in Section 11.5).
(ii)Solve the trust region subproblem with and provide step .
(iii)Project the new point into the bound constraints.
(iv)Evaluate the objective at the new point.
(v)Update the trust region radius based on the ratio .
(vi)If the objective has decreased sufficiently (successful step) choose . Return to 2(i).
(vii)Assess the severity of the projection for the new point using the ratio .
(viii)If is close to , either return to 2(ii) with or try to perform a line search in the direction .
If successful set and . Return to 2(i).
(ix)Take a projected gradient step by performing a line search in the direction , set and .
Note: the use of the regularization term in (2) is optional and is not used by default. To enable regularization please refer to the optional parameters
,
,
, and
.
11.2Models
A vital component of the algorithm is the choice of model employed. There are four choices available which are controlled by the optional parameter .
Gauss–Newton
This option specifies to the solver that it use the Gauss–Newton model. For this case, is replaced by its first-order Taylor approximation, . The model is, therefore, given by
Quasi–Newton
This option specifies to use a Newton type model. For this case, the model is taken to be the second-order Taylor approximation of the objective function . For this choice of model the gradient and Hessian are and . The model is given by
If the second derivatives of are not available (i.e., the optional parameter ), then the method approximates the matrix . If , the flag ‘2’ in the iteration log will display (A), see Iteration log in Section 9.1.
Hybrid
This option specifies to the solver that it use the hybrid model. In practice the Gauss–Newton model tends to work well far away from the solution, whereas the Newton model performs better once it is near to the minimum (particularly if the residual is large at the solution). This option tells the solver to switch between the previous two models, picking the model that is most appropriate for the step. In particular, it starts by using and switches to when it considers it is close enough to the solution. If, in subsequent iterations, it fails to get a decrease in the function value, then the algorithm interprets this as being not sufficiently close to the solution and switches back to using the Gauss–Newton model.
Tensor–Newton
This option specifies to the solver that it use the tensor model. The model is based on a second-order Taylor approximation to the residual, , where is the th row of . The tensor model used is
(3)
11.3Subproblems
The next point is estimated by finding a step, , that minimizes the model chosen in , subject to a globalization strategy. e04ggc supports the use of two such strategies: trust region or regularization, these can be set using the optional parameter or respectively. If , or , then the model is quadratic and the available methods to solve the subproblem are described in the next two subsections. If the , then the model is not quadratic and the methods available are described in Section 11.3.3.
11.3.1Trust region method
The methods mentioned in this subsection are only used when , or and . The trust region subproblem to solve is
(4)
The next step is taken to be the solution of the previous problem and the method used to solve it is selected using the optional parameter . The methods available are:
Powell's dogleg method
Approximates the solution to (4) by using Powell's dogleg method. This takes, as the step, a linear combination of the Gauss–Newton step and the steepest descent step.
Generalized eigenvalue problem (AINT)
Solves the trust region subproblem using the trust region solver of Adachi, Iwata, Nakatsukasa, and Takeda (AINT). This reformulates and solves the problem (4) as a generalized eigenvalue problem. See Adachi et al. (2015) for more details.
Moré–Sorensen Method
This method solves (4) using a variant of the Moré–Sorensen method. In particular, it implements Algorithm 7.3.6 of Conn et al. (2000).
GALAHAD's DTRS method
Solves (4) by converting the problem into the form
where is a diagonal matrix from the eigendecomposition, , of the derivatives of either or . The vectors and gather the rest of the transformation involving and . This is solved by performing an eigendecomposition of the Hessian in the model and calling the GALAHAD function DTRS. For further details see Gould et al. (2003).
11.3.2Regularization
The methods mentioned in this subsection are only used when , or and . The regularized subproblem to solve is
(5)
The next step to take is the solution to the previous problem. The methods provided to solve (5) are
Solve by linear system
This option estimates the step by solving a shifted linear system. Currently it only supports quadratic regularization () and it can be set using the optional parameter . The default value of the optional parameter is and automatically selects . If the solver terminates with NE_NO_IMPROVEMENT.
This method is used when .
GALAHAD's DRQS method
This method solves the regularized problem by first converting it to the form
where is a diagonal matrix from the eigendecomposition, , of the derivatives of either
or
. The vectors and gather the rest of the transformation involving and
,
and is the regularization order chosen using the optional parameter .
The problem is solved by performing an eigendecomposition of the Hessian in the model and calling the GALAHAD function DRQS. For further details see Gould et al. (2003).
11.3.3Tensor Newton subproblem
This section describes the regularized methods used to solve the non-quadratic tensor model (3) subproblem, i.e., the step subproblem when . The schemes implemented find the next step by solving
(6)
Note that (6) is also a sum-of-squares problem and, as such, can be solved by recursively calling e04ggc. In this context, the iterations performed by the recursive call to the solver are called inner iterations, otherwise they are called regular or outer iterations. When , the iteration type is shown under the flag ‘I’ of the ‘iteration flags’ column while the inner iteration count is shown under the column ‘inn it’ of the Iteration log (see Section 9.1). The method used to solve (6) can be chosen by the optional parameter and the implemented methods are:
Implicit solve
This method will solve recursively problem (6) using a quadratic model. GALAHAD function DRQS is used to estimate the next step by solving the
(implicitly) regularized quadratic subproblem
where is a quasi-Newton model of (3) and the regularization power is . The problem can be viewed as having two regularization terms, is the (fixed) regularization term from the outer iteration and is the regularization term of the inner iteration which is free to be updated as required by the solver.
This method is used when .
Tacit solve with additional terms
This method will solve recursively problem (6) using a hybrid model, tacitly reformulating the problem to incorporate additional residuals (see Section 11.3.4). This is implemented by internally setting
,
and the
. The subproblem to determine the next step has parameters and is of the form
This subproblem is solved using GALAHAD function DRQS and is used when .
Tacit solve with one additional term
This method will solve recursively problem (6) using a hybrid model, tacitly reformulating the problem to incorporate one additional residual (see Section 11.3.4). This is implemented by internally setting
,
and the
. As in the previous method, the subproblem to determine the next step is solved using GALAHAD function DRQS.
This method is used when .
Explicit solve with additional terms
This method will expand the search space with additional parameters and solve recursively a regularized variant of problem (6) using a hybrid model (see Section 11.3.4). This is implemented by internally expanding the search space and setting
,
and the
. Analogous to the previous methods, the subproblem to determine the next step is solved using GALAHAD function DRQS.
This method is used when .
Explicit solve with one additional term
This method will expand the search space with an additional parameter and solve recursively a regularized variant of problem (6) using a hybrid model (see Section 11.3.4). This is implemented by internally adding an additional residual term and setting
,
and the
. Analogous to the previous methods, the subproblem to determine the next step is solved using GALAHAD function DRQS.
This method is used when .
11.3.4Incorporating the regularizer
The method used to incorporate the regularization specified by and in problem (2) is defined using the optional parameter . The implemented choices are:
None
Sets and solves the non-regularized variant of the problem, this is the default.
Reformulation using DoF
Solves a nonlinear least squares problem with additional degrees of freedom.
The new residual objective function, , is defined as
where denotes the th component of the iterate and is provided using optional parameter . This option requires that the (base) regularization power in (2) be , i.e., (the default value).
This method is used when .
Reformulation adding 1 DoF
Solves a nonlinear least squares problem with one additional degree of freedom. The new residual objective function , is defined as
Analogous to the previous case, is defined using the optional parameter . When using this option, it is recommended that the (base) regularization power in (2) be , i.e., .
This method is used when .
11.4Bound Constraints
e04ggc handles the bound constraints by projecting candidate points into the feasible set. The implemented framework is an adaptation of Algorithm 3.12 described in Kanzow et al. (2004), where the Levenberg–Marquardt step is replaced by a trust region (TR) step. The framework consists of three major steps. It first attempts a projected TR step and, if unsuccessful, attempts a Wolfe-type line search step along the projected TR step direction, otherwise, it defaults to a projected gradient step with an Armijo-type line search, specifically,
TR step
The trust region loop needs to be interrupted if the proposed steps, , lead to points outside of the feasible set, i.e., are orthogonal with respect to the active bounds. This is monitored by the ratio , where is the Euclidean projection operator over the feasible set. provides a convenient way to assess how severe the projection is, if then the step, , is indeed orthogonal to the active space and does not provide a suitable search direction so the loop is terminated. On the contrary, if then has components that are not orthogonal to the active set that can be explored.
The TR step is taken when it is deemed that it makes enough progress in decreasing the error.
LS step
This step is attempted when the TR step is unsuccessful but the direction is considered of descent and a viable search direction in the sense that
with the TR step, and . A weak Wolfe-type line search along this direction is performed to find the next point. During the line search the intermediary candidates are projected into the feasible set and kept feasible, for details see Section 11.3 in e04kfc.
PG step
The projected gradient (PG) step is only taken if both the TR step and the LS step were unsuccessful.
It consists of an Armijo-type line search along the projected gradient direction, , for more details on this method refer to
Section 11.2 in e04kfc.
11.5Stopping Criteria
The solver considers that it has found a solution and stops when at least one of the following three conditions is met within the defined absolute or relative tolerances
,
(7)
(8)
(9)
Where is the projected gradient (see PG step in Section 11.4)
and is reported in the column optim of the output while the left-hand side of (8) is reported in the column rel optim, see Iteration log in Section 9.1.
If the problem is unconstrained, then the projected gradient reduces to the gradient and the convergence tests are done over the gradient norm. The stopping tolerances can be changed using the optional parameters
,
,
,
, and
. see Section 12 for details. If these parameters are set too small in relation to the complexity and scaling of the problem, the function can terminate with NE_NO_IMPROVEMENT, NE_TOO_MANY_ITER or NE_TOO_MANY_MINOR_ITER.
11.6A Note About Lagrangian Multipliers
It is often useful to have access to the Lagrangian multipliers (dual variables) associated with the constraints if there are any defined. In the case where only simple bounds are present, the multipliers directly relate to the values of the gradient at the solution. The multipliers of the active bounds are the absolute values of the associated elements of the gradient. The multipliers of the inactive bounds are always zero.
The multipliers based on the final gradient value can be retrieved by calling e04rxc with the command string cmdstrDual Variables. The format is the same as for other functions, see Section 3.1 in e04svc in e04svc.
Note that if the problem has not fully converged, the provided approximation might be quite crude.
12Optional Parameters
Several optional parameters in e04ggc define choices in the problem specification or the algorithm logic. In order to reduce the number of formal arguments of e04ggc these optional parameters have associated default values that are appropriate for most problems. Therefore, you need only specify those optional parameters whose values are to be different from their default values.
The remainder of this section can be skipped if you wish to use the default values for all optional parameters.
The optional parameters can be changed by calling e04zmc anytime between the initialization of the handle and the call to the solver. Modification of the optional parameters during intermediate monitoring stops is not allowed. Once the solver finishes, the optional parameters can be altered again for the next solve.
For each option, we give a summary line, a description of the optional parameter and details of constraints.
The summary line contains:
the keywords, where the minimum abbreviation of each keyword is underlined;
a parameter value,
where the letters , and denote options that take character, integer and real values respectively;
the default value, where the symbol is a generic notation for machine precision (see X02AJC).
All options accept the value to return single options to their default states.
Keywords and character values are case and white space insensitive.
Defaults
This special keyword may be used to reset all optional parameters to their default values. Any value given with this keyword will be ignored.
Bxnl Basereg Pow
Default
This parameter defines the regularization power in (1) and for the tensor Newton subproblem (when ). Some values are restricted depending on the type of regularization specified, see
for more details.
Constraint: .
Bxnl Basereg Term
Default
This parameter defines the regularization term in (1) and for the tensor Newton subproblem (when ).
Constraint: .
Bxnl Basereg Type
Default
This parameter specifies the method used to incorporate the regularizer into (1) and optionally into the tensor Newton subproblem (when and ).
The option reformulates the original problem by expanding it with degrees of freedom that is subsequently solved. For the case the residual vector is extended with a new term of the form ; for this method a value of is recommended.
If then the regularization power term must be , that is . For further details see Section 11.3.
Constraint: , or .
Bxnl Save Covariance Matrix
Default
This parameter indicates to the solver to store the covariance matrix into the handle.
If then
the lower triangle part of the covariance matrix is stored in packed column order
(see Section 3.4.2 in the F07 Chapter Introduction) into the handle and can be retrieved via e04rxc using
with
.
In the special case where ,
only the diagonal elements of the covariance matrix are stored in the handle and can be retrieved via e04rxc using
with
.
Similarly, if
then the lower triangle part of the matrix
is stored in packed column order into the handle and can be retrieved via e04rxc using
with
.
Limitations: If the number of enabled residuals is not greater than the number of enabled variables, or
the pseudo-inverse of could not be calculated, then the
covariance matrix (variance vector) is not stored in the handle and will not be available.
For more information on how the covariance matrix is estimated, see e04ycc.
Constraint: , , or .
Bxnl Stop Abs Tol Fun
Default
This parameter specifies the relative tolerance for the error test, specifically, it sets the value of of equation (7) in Section 11.5. Setting to a large value may cause the solver to stop prematurely with a suboptimal solution.
Constraint: .
Bxnl Stop Abs Tol Grd
Default
This parameter specifies the relative tolerance for the gradient test, specifically, it sets the value of of equation (8) in Section 11.5. Setting to a large value may cause the solver to stop prematurely with a suboptimal solution.
Constraint: .
Bxnl Stop Rel Tol Fun
Default
This parameter specifies the relative tolerance for the error test, specifically, it sets the value of of equation (7) in Section 11.5. Setting to a large value may cause the solver to stop prematurely with a suboptimal solution.
Constraint: .
Bxnl Stop Rel Tol Grd
Default
This parameter specifies the relative tolerance for the gradient test, specifically, it sets the value of of equation (8) in Section 11.5. Setting to a large value may cause the solver to stop prematurely with a suboptimal solution.
Constraint: .
Bxnl Stop Step Tol
Default
Specifies the stopping tolerance for the step length test, specifically, it sets the value for of equation (9) in Section 11.5. Setting to a large value may cause the solver to stop prematurely with a suboptimal solution.
Under certain circumstances, e.g., when in doubt of the quality of the first- or second-order derivatives, in the event of the solver exiting with a successful step length test, it is recommended to verify that either the error or the gradient norm is acceptably small.
Constraint: .
Bxnl Reg Order
Default
This parameter specifies the order of the regularization in (5) used when .
Some values for are restricted depending on the method chosen in , see Section 11.3.2 for more details.
Constraint: , or .
Bxnl Glob Method
Default
This parameter specifies the globalization method used to estimate the next step . It also determines the class of subproblem to solve. The trust region subproblem finds the step by minimizing the specified model withing a given radius. On the other hand, when , the problem is reformulated by adding an aditional regularization term and minimized in order to find the next step . See Section 11.3 for more details.
Constraint: or .
Bxnl Nlls Method
Default
This parameter defines the method used to estimate the next step in . It only applies to , or . When the globalization technique chosen is trust region () the methods for available are Powell's dogleg method, a generalized eigenvalue method (AINT) Adachi et al. (2015), a variant of Moré–Sorensen's method, and GALAHAD's DTRS method. Otherwise, when the globalization method chosen is via regularization () the methods available are comprised by a linear system solver and GALAHAD's DRQS method. See Section 11.3 for more details.
Constraint: , , , or .
Bxnl Model
Default
This parameter specifies which model is used to approximate the objective function and estimate the next point that reduces the error. This is one of the most important optional parameters and should be chosen according to the problem characteristics. The models are briefly described in Section 11.2.
Constraint: , , or .
Bxnl Tn Method
Default
This parameter specifies how to solve the subproblem and find the next step for the tensor Newton model, . The subproblems are solved using a range of regularization schemes. See Section 11.3.3.
Constraint: , , , or .
Bxnl Use Second Derivatives
Default
This parameter indicates whether the weighted sum of residual Hessians are available through the call-back lsqhes. If and the specified model in requires user-suppied second derivatives, then the solver will terminate with NE_DERIV_ERRORS.
Constraint: or .
Bxnl Use Weights
Default
This parameter indicates whether to use a weighted nonlinear least square model. If then the weights in (2) must be supplied by you via e04rxc. If weights are to be used, then all elements must be provided, see Section 9.2. If the solver is expecting to use weights but they are not provided or have non-positive values, then the solver will terminate with NE_DC_MATCH.
Constraint: or .
Bxnl Iteration Limit
Default
This parameter specifies the maximum amount of major iterations the solver is alloted. If this limit is reached, then the solver will terminate with NE_TOO_MANY_ITER.
Constraint: .
Bxnl Monitor Frequency
Default
If , the user-supplied function monit will be called at the end of every th step for monitoring purposes.
Constraint: .
Bxnl Print Header
Default
This parameter defines, in number of iterations, the frequency with which to print the iteration log header.
Constraint: .
Infinite Bound Size
Default
This defines the ‘infinite’ bound in the definition of the problem constraints. Any upper bound greater than or equal to will be regarded as (and similarly any lower bound less than or equal to will be regarded as ). Note that a modification of this optional parameter does not influence constraints which have already been defined; only the constraints formulated after the change will be affected.
Constraint: .
Monitoring File
Default
(See Section 3.1.1 in the Introduction to the NAG Library CL Interface for further information on NAG data types.)
If , the
Nag_FileID number (as returned from x04acc) for the secondary (monitoring) output. If , no secondary output is provided. The information output to this file ID is controlled by .
Constraint: .
Monitoring Level
Default
This parameter sets the amount of information detail that will be printed by the solver to the secondary output. The meaning of the levels is the same as for .
Constraint: .
Print File
Default
(See Section 3.1.1 in the Introduction to the NAG Library CL Interface for further information on NAG data types.)
If , the
Nag_FileID number (as returned from x04acc, stdout as the default) for the primary output of the solver. If , the primary output is completely turned off independently of other settings. The information output to this unit is controlled by .
Constraint: .
Print Level
Default
This parameter defines how detailed information should be printed by the solver to the primary and secondary output.
Output
No output from the solver.
The Header and Summary.
, , ,
Additionally, the Iteration log.
Constraint: .
Print Options
Default
If , a listing of optional parameters will be printed to the primary output and is always printed to the secondary output.
Constraint: or .
Print Solution
Default
If , the final values of the primal variables are printed on the primary and secondary outputs.
If or , in addition to the primal variables, the final values of the dual variables are printed on the primary and secondary outputs.
Constraint: , , or .
Stats Time
Default
This parameter turns on timing. This might be helpful for a choice of different solving approaches. It is possible to choose between CPU and wall clock time. Choice is equivalent to .
Constraint: , , or .
Time Limit
Default
A limit to the number of seconds that the solver can use to solve one problem. If at the end of an iteration this limit is exceeded, the solver will terminate with NE_TIME_LIMIT.