f04qaf solves sparse nonsymmetric equations, sparse linear least squares problems and sparse damped linear least squares problems, using a Lanczos algorithm.
The routine may be called by the names f04qaf or nagf_linsys_real_gen_sparse_lsqsol.
3Description
f04qaf can be used to solve a system of linear equations
(1)
where is an sparse nonsymmetric matrix, or can be used to solve linear least squares problems, so that f04qaf minimizes the value given by
(2)
where is an sparse matrix and denotes the Euclidean length of so that . A damping argument, , may be included in the least squares problem in which case f04qaf minimizes the value given by
(3)
is supplied as the argument damp and should of course be zero if the solution to problems (1) or (2) is required. Minimizing in (3) is often called ridge regression.
f04qaf is based upon algorithm LSQR (see Paige and Saunders (1982a) and Paige and Saunders (1982b)) and solves the problems by an algorithm based upon the Lanczos process. The routine does not require explicitly, but is specified via aprod which must perform the operations and for a given -element vector and element vector . A argument to aprod specifies which of the two operations is required on a given entry.
The routine also returns estimates of the standard errors of the sample regression coefficients (, for ) given by the diagonal elements of the estimated variance-covariance matrix . When problem (2) is being solved and is of full rank, then is given by
and when problem (3) is being solved then is given by
Let denote the matrix
(4)
let denote the residual vector
(5)
corresponding to an iterate , so that is the function being minimized, and let denote the Frobenius (Euclidean) norm of . Then the routine accepts as a solution if it is estimated that one of the following two conditions is satisfied:
(6)
(7)
where and are user-supplied tolerances which estimate the relative errors in and respectively. Condition (6) is appropriate for compatible problems where, in theory, we expect the residual to be zero and will be satisfied by an acceptable solution to a compatible problem. Condition (7) is appropriate for incompatible systems where we do not expect the residual to be zero and is based on the observation that, in theory,
when is a solution to the least squares problem, and so (7) will be satisfied by an acceptable solution to a linear least squares problem.
The routine also includes a test to prevent convergence to solutions, , with unacceptably large elements. This can happen if is nearly singular or is nearly rank deficient. If we let the singular values of be
then the condition number of is defined as
where is the smallest nonzero singular value of and hence is the rank of . When , then is rank deficient, the least squares solution is not unique and f04qaf will normally converge to the minimal length solution. In practice will not have exactly zero singular values, but may instead have small singular values that we wish to regard as zero.
The routine provides for this possibility by terminating if
(8)
where is a user-supplied limit on the condition number of . For problem (1) termination with this condition indicates that is nearly singular and for problem (2) indicates that is nearly rank deficient and so has near linear dependencies in its columns. In this case inspection of , and , which are all returned by the routine, will indicate whether or not an acceptable solution has been found. Condition (8), perhaps in conjunction with , can be used to try and ‘regularize’ least squares solutions. A full discussion of the stopping criteria is given in Section 6 of Paige and Saunders (1982a).
Introduction of a nonzero damping argument tends to reduce the size of the computed solution and to make its components less sensitive to changes in the data, and f04qaf is applicable when a value of is known a priori. To have an effect, should normally be at least where is the machine precision. For further discussion see Paige and Saunders (1982b) and the references given there.
Whenever possible the matrix should be scaled so that the relative errors in the elements of are all of comparable size. Such a scaling helps to prevent the least squares problem from being unnecessarily sensitive to data errors and will normally reduce the number of iterations required. At the very least, in the absence of better information, the columns of should be scaled to have roughly equal column length.
4References
Paige C C and Saunders M A (1982a) LSQR: An algorithm for sparse linear equations and sparse least squares ACM Trans. Math. Software8 43–71
Paige C C and Saunders M A (1982b) Algorithm 583 LSQR: Sparse linear equations and least squares problems ACM Trans. Math. Software8 195–209
On exit: the estimates of the standard errors of the components of . Thus contains an estimate of , where is the th diagonal element of the estimated variance-covariance matrix . The estimates returned in se will be the lower bounds on the actual estimated standard errors, but will usually be correct to at least one significant figure.
6: – Subroutine, supplied by the user.External Procedure
aprod must perform the operations and for given vectors and .
On exit: may be used as a flag to indicate a failure in the computation of or . If mode is negative on exit from aprod, f04qaf will exit immediately with ifail set to mode.
aprod must either be a module subprogram USEd by, or declared as EXTERNAL in, the (sub)program from which f04qaf is called. Arguments denoted as Input must not be changed by this procedure.
Note:aprod should not return floating-point NaN (Not a Number) or infinity values, since these are not handled by f04qaf. If your code inadvertently does return any NaNs or infinities, f04qaf is likely to produce unexpected results.
7: – Real (Kind=nag_wp)Input
On entry: the value . If either problem (1) or problem (2) is to be solved, damp must be supplied as zero.
8: – Real (Kind=nag_wp)Input
On entry: the tolerance, , of the convergence criteria (6) and (7); it should be an estimate of the largest relative error in the elements of . For example, if the elements of are correct to about significant figures, then atol should be set to about . If atol is supplied as less than , where is the machine precision, the value is used instead of atol.
9: – Real (Kind=nag_wp)Input
On entry: the tolerance, , of the convergence criterion (6); it should be an estimate of the largest relative error in the elements of . For example, if the elements of are correct to about significant figures, then btol should be set to about . If btol is supplied as less than , the value is used instead of btol.
10: – Real (Kind=nag_wp)Input
On entry: the value of equation (8); it should be an upper limit on the condition number of . conlim should not normally be chosen much larger than . If conlim is supplied as zero, the value is used instead of conlim.
11: – IntegerInput/Output
On entry: an upper limit on the number of iterations. If , the value n is used in place of itnlim, but for ill-conditioned problems a higher value of itnlim is likely to be necessary.
On exit: unchanged unless on entry, in which case it is set to n.
12: – IntegerInput
On entry: the level of printing from f04qaf. If , then no printing occurs, but otherwise messages will be output on the current advisory message unit (see x04abf). A description of the printed output is given in Section 9.1. The level of printing is determined as follows:
No printing.
A brief summary is printed just prior to return from f04qaf.
A summary line is printed periodically to monitor the progress of f04qaf, together with a brief summary just prior to return from f04qaf.
On exit: an estimate of for the residual, , of (5) corresponding to the solution returned in x. Note that is the function being minimized.
17: – Real (Kind=nag_wp)Output
On exit: an estimate of the corresponding to the solution returned in x.
18: – Real (Kind=nag_wp)Output
On exit: an estimate of for the solution returned in x.
19: – Real (Kind=nag_wp) arrayWorkspace
20: – Real (Kind=nag_wp) arrayUser Workspace
ruser is not used by f04qaf, but is passed directly to aprod and may be used to pass information to this routine.
21: – IntegerInput
On entry: the dimension of the array ruser as declared in the (sub)program from which f04qaf is called.
22: – Integer arrayUser Workspace
iuser is not used by f04qaf, but is passed directly to aprod and may be used to pass information to this routine.
23: – IntegerInput
On entry: the dimension of the array iuser as declared in the (sub)program from which f04qaf is called.
24: – IntegerOutput
On exit: the reason for termination of f04qaf.
The exact solution is . No iterations are performed in this case.
The termination criterion of (6) has been satisfied with and as the values supplied in atol and btol respectively.
The termination criterion of (7) has been satisfied with as the value supplied in atol.
The termination criterion of (6) has been satisfied with and/or as the value , where is the machine precision. One or both of the values supplied in atol and btol must have been less than and was too small for this machine.
The termination criterion of (7) has been satisfied with as the value . The value supplied in atol must have been less than and was too small for this machine.
The values , and correspond to failure with , and respectively (see Section 6) and when ifail is negative inform will be set to the same negative value.
25: – IntegerInput/Output
On entry: ifail must be set to , or to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of means that an error message is printed while a value of means that it is not.
If halting is not appropriate, the value or is recommended. If message printing is undesirable, then the value is recommended. Otherwise, the value is recommended. When the value or is used it is essential to test the value of ifail on exit.
On exit: unless the routine detects an error or a warning has been flagged (see Section 6).
6Error Indicators and Warnings
If on entry or , explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
On entry, .
Constraint: .
On entry, .
Constraint: .
Termination criteria not satisfied, but the condition number bound supplied in conlim has been met.
The condition of (8) has been satisfied for the value of supplied in conlim. If this failure is unexpected you should check that aprod is working correctly. Although conditions (6) or (7) have not been satisfied, the values returned in rnorm, arnorm and xnorm may nevertheless indicate that an acceptable solution has been reached.
Termination criteria not satisfied, but the condition number is bounded by /x02ajf.
The condition of (8) has been satisfied for the value , where is the machine precision. The matrix is nearly singular or rank deficient and the problem is too ill-conditioned for this machine. If this failure is unexpected, you should check that aprod is working correctly.
Maximum number of iterations reached.
The limit on the number of iterations has been reached. The number of iterations required by f04qaf and the condition of the matrix can depend strongly on the scaling of the problem. Poor scaling of the rows and columns of should be avoided whenever possible.
Background information to multithreading can be found in the Multithreading documentation.
f04qaf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.
9Further Comments
The time taken by f04qaf is likely to be principally determined by the time taken in aprod, which is called twice on each iteration, once with and once with . The time taken per iteration by the remaining operations in f04qaf is approximately proportional to .
The Lanczos process will usually converge more quickly if is pre-conditioned by a nonsingular matrix that approximates in some sense and is also chosen so that equations of the form can efficiently be solved for . For a discussion of preconditioning, see the F11 Chapter Introduction. In the context of f04qaf, problem (1) is equivalent to
Note that the normal matrix so that the preconditioning is equivalent to the preconditioning of the normal matrix .
Pre-conditioning can be incorporated into f04qaf simply by coding aprod to compute and in place of and respectively, and then solving the equations for on return from f04qaf. The quantity should be computed by solving for and then computing , and should be computed by solving for and then forming .
9.1Description of the Printed Output
When , then f04qaf will produce output (except in the case where the routine fails with ) on the advisory message channel (see x04abf).
When then a summary line is printed periodically giving the following information:
Output
Meaning
ITN
Iteration number, .
X(1)
The first element of the current iterate .
FUNCTION
The current value of the function, , being minimized.
COMPAT
An estimate of , where is the residual corresponding to . This value should converge to zero (in theory) if and only if the problem is compatible. COMPAT decreases monotonically.
INCOMPAT
An estimate of which should converge to zero if and only if at the solution is nonzero. INCOMPAT is not usually monotonic.
NRM(ABAR)
A monotonically increasing estimate of .
COND(ABAR)
A monotonically increasing estimate of the condition number .
10Example
This example solves the linear least squares problem
where is the matrix and is the element vector given by
with .
Such a problem can arise by considering the Neumann problem on a rectangle
where is the boundary of the rectangle, and discretizing as illustrated below with the square mesh
Figure 1
The symmetric part of represents the difference equations and the final row comes from the normalizing condition. The example program has at all the internal mesh points, but apart from this is written in a general manner so that the number of rows (NROWS) and columns (NCOLS) in the grid can readily be altered.