f11grc is a setup function, the first in a suite of three functions for the iterative solution of a complex Hermitian system of simultaneous linear equations. f11grc must be called before f11gsc, the iterative solver. The third function in the suite, f11gtc, can be used to return additional information about the computation.

These three functions are suitable for the solution of large sparse complex Hermitian systems of equations.

2 Specification

#include <nag.h>

void

f11grc (Nag_SparseSym_Method method, Nag_SparseSym_PrecType precon, Nag_SparseSym_Bisection sigcmp, Nag_NormType norm, Nag_SparseSym_Weight weight, Integer iterm, Integer n, double tol, Integer maxitn, double anorm, double sigmax, double sigtol, Integer maxits, Integer monit, Integer *lwreq, Complex work[], Integer lwork, NagError *fail)

The function may be called by the names: f11grc, nag_sparse_complex_herm_basic_setup or nag_sparse_herm_basic_setup.

3 Description

The suite consisting of the functions:

f11grc,
f11gsc,
f11gtc,

is designed to solve the complex Hermitian system of simultaneous linear equations

A x = b

of order

n

, where

n

is large and the matrix of the coefficients

A

is sparse.

f11grc is a setup function which must be called before the iterative solver f11gsc. f11gtc, the third function in the suite, can be used to return additional information about the computation. Either of two methods can be used:

1.Conjugate Gradient Method (CG)
For this method (see Hestenes and Stiefel (1952), Golub and Van Loan (1996), Barrett et al. (1994) and Dias da Cunha and Hopkins (1994)), the matrix $A$ should ideally be positive definite. The application of the Conjugate Gradient method to indefinite matrices may lead to failure or to lack of convergence.
2.Lanczos Method (SYMMLQ)
This method, based upon the algorithm SYMMLQ (see Paige and Saunders (1975) and Barrett et al. (1994)), is suitable for both positive definite and indefinite matrices. It is more robust than the Conjugate Gradient method but less efficient when $A$ is positive definite.

Both CG and SYMMLQ methods start from the residual

r_{0} = b - A x_{0}

, where

x_{0}

is an initial estimate for the solution (often

x_{0} = 0

), and generate an orthogonal basis for the Krylov subspace

span \{A^{k} r_{0}\}

, for

k = 0, 1, \dots

, by means of three-term recurrence relations (see Golub and Van Loan (1996)). A sequence of real symmetric tridiagonal matrices

\{T_{k}\}

is also generated. Here and in the following, the index

k

denotes the iteration count. The resulting real symmetric tridiagonal systems of equations are usually more easily solved than the original problem. A sequence of solution iterates

\{x_{k}\}

is thus generated such that the sequence of the norms of the residuals

\{‖r_{k}‖\}

converges to a required tolerance. Note that, in general, the convergence is not monotonic.

In exact arithmetic, after

n

iterations, this process is equivalent to an orthogonal reduction of

A

to real symmetric tridiagonal form,

T_{n} = Q^{H} A Q

; the solution

x_{n}

would thus achieve exact convergence. In finite-precision arithmetic, cancellation and round-off errors accumulate causing loss of orthogonality. These methods must therefore be viewed as genuinely iterative methods, able to converge to a solution within a prescribed tolerance.

The orthogonal basis is not formed explicitly in either method. The basic difference between the two methods lies in the method of solution of the resulting real symmetric tridiagonal systems of equations: the conjugate gradient method is equivalent to carrying out an

L D L^{H}

(Cholesky) factorization whereas the Lanczos method (SYMMLQ) uses an

L Q

factorization.

Faster convergence can be achieved using a preconditioner (see Golub and Van Loan (1996) and Barrett et al. (1994)). A preconditioner maps the original system of equations onto a different system, say

\bar{A} \bar{x} = \bar{b},

(1)

with, hopefully, better characteristics with respect to its speed of convergence: for example, the condition number of the matrix of the coefficients can be improved or eigenvalues in its spectrum can be made to coalesce. An orthogonal basis for the Krylov subspace

span \{{\bar{A}}^{k} {\bar{r}}_{0}\}

, for

k = 0, 1, \dots

, is generated and the solution proceeds as outlined above. The algorithms used are such that the solution and residual iterates of the original system are produced, not their preconditioned counterparts. Note that an unsuitable preconditioner or no preconditioning at all may result in a very slow rate, or lack, of convergence. However, preconditioning involves a trade-off between the reduction in the number of iterations required for convergence and the additional computational costs per iteration. Also, setting up a preconditioner may involve non-negligible overheads.

A preconditioner must be Hermitian and positive definite, i.e., representable by

M = E E^{H}

, where

M

is nonsingular, and such that

\bar{A} = E^{- 1} A E^{- H} \sim I_{n}

in (1), where

I_{n}

is the identity matrix of order

n

. Also, we can define

\bar{r} = E^{- 1} r

and

\bar{x} = E^{H} x

. These are formal definitions, used only in the design of the algorithms; in practice, only the means to compute the matrix-vector products

v = A u

and to solve the preconditioning equations

M v = u

are required, that is, explicit information about

M

E

or their inverses is not required at any stage.

The first termination criterion

{‖r_{k}‖}_{p} \leq τ ({‖b‖}_{p} + {‖A‖}_{p} \times {‖x_{k}‖}_{p})

(2)

is available for both conjugate gradient and Lanczos (SYMMLQ) methods. In (2),

p = 1, \infty or 2

and

τ

denotes a user-specified tolerance subject to

\max (10, \sqrt{n}) ε \leq τ < 1

, where

ε

is the machine precision. Facilities are provided for the estimation of the norm of the matrix of the coefficients

{‖A‖}_{1} = {‖A‖}_{\infty}

, when this is not known in advance, used in (2), by applying Higham's method (see Higham (1988)). Note that

{‖A‖}_{2}

cannot be estimated internally. This criterion uses an error bound derived from backward error analysis to ensure that the computed solution is the exact solution of a problem as close to the original as the termination tolerance requires. Termination criteria employing bounds derived from forward error analysis could be used, but any such criteria would require information about the condition number

κ (A)

which is not easily obtainable.

The second termination criterion

{‖{\bar{r}}_{k}‖}_{2} \leq τ \max (1.0, {‖b‖}_{2} / {‖r_{0}‖}_{2}) ({‖{\bar{r}}_{0}‖}_{2} + σ_{1} (\bar{A}) \times {‖Δ {\bar{x}}_{k}‖}_{2})

(3)

is available only for the Lanczos method (SYMMLQ). In (3),

σ_{1} (\bar{A}) = {‖\bar{A}‖}_{2}

is the largest singular value of the (preconditioned) iteration matrix

\bar{A}

. This termination criterion monitors the progress of the solution of the preconditioned system of equations and is less expensive to apply than criterion (2). When

σ_{1} (\bar{A})

is not supplied, facilities are provided for its estimation by

σ_{1} (\bar{A}) \sim \max_{k} σ_{1} (T_{k})

. The interlacing property

σ_{1} (T_{k - 1}) \leq σ_{1} (T_{k})

and Gerschgorin's theorem provide lower and upper bounds from which

σ_{1} (T_{k})

can be easily computed by bisection. Alternatively, the less expensive estimate

σ_{1} (\bar{A}) \sim \max_{k} {‖T_{k}‖}_{1}

can be used, where

σ_{1} (\bar{A}) \leq {‖T_{k}‖}_{1}

by Gerschgorin's theorem. Note that only order of magnitude estimates are required by the termination criterion.

Termination criterion (2) is the recommended choice, despite its (small) additional costs per iteration when using the Lanczos method (SYMMLQ). Also, if the norm of the initial estimate is much larger than the norm of the solution, that is, if

‖x_{0}‖ ≫ ‖x‖

, a dramatic loss of significant digits could result in complete lack of convergence. The use of criterion (2) will enable the detection of such a situation, and the iteration will be restarted at a suitable point. No such restart facilities are provided for criterion (3).

Optionally, a vector

w

of user-specified weights can be used in the computation of the vector norms in termination criterion (2), i.e.,

{‖v‖}_{p}^{(w)} = {‖v^{(w)}‖}_{p}

, where

{(v^{(w)})}_{i} = w_{i} v_{i}

, for

i = 1, 2, \dots, n

. Note that the use of weights increases the computational costs.

The sequence of calls to the functions comprising the suite is enforced: first, the setup function f11grc must be called, followed by the solver f11gsc. f11gtc can be called either when f11gsc is carrying out a monitoring step or after f11gsc has completed its tasks. Incorrect sequencing will raise an error condition.

4 References

Barrett R, Berry M, Chan T F, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C and Van der Vorst H (1994) Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods SIAM, Philadelphia

Dias da Cunha R and Hopkins T (1994) PIM 1.1 — the parallel iterative method package for systems of linear equations user's guide — Fortran 77 version Technical Report Computing Laboratory, University of Kent at Canterbury, Kent, UK

Golub G H and Van Loan C F (1996) Matrix Computations (3rd Edition) Johns Hopkins University Press, Baltimore

Hestenes M and Stiefel E (1952) Methods of conjugate gradients for solving linear systems J. Res. Nat. Bur. Stand. 49 409–436

Higham N J (1988) FORTRAN codes for estimating the one-norm of a real or complex matrix, with applications to condition estimation ACM Trans. Math. Software 14 381–396

Paige C C and Saunders M A (1975) Solution of sparse indefinite systems of linear equations SIAM J. Numer. Anal. 12 617–629

5 Arguments

1: $method$ – Nag_SparseSym_Method Input

On entry: the iterative method to be used.

$method = Nag_SparseSym_CG$: Conjugate gradient method.
$method = Nag_SparseSym_SYMMLQ$: Lanczos method (SYMMLQ).

Constraint:

method = Nag_SparseSym_CG

Nag_SparseSym_SYMMLQ

2: $precon$ – Nag_SparseSym_PrecType Input

On entry: determines whether preconditioning is used.

$precon = Nag_SparseSym_NoPrec$: No preconditioning.
$precon = Nag_SparseSym_Prec$: Preconditioning.

Constraint:

precon = Nag_SparseSym_NoPrec

Nag_SparseSym_Prec

3: $sigcmp$ – Nag_SparseSym_Bisection Input

On entry: determines whether an estimate of

σ_{1} (\bar{A}) = {‖E^{- 1} A E^{- H}‖}_{2}

, the largest singular value of the preconditioned matrix of the coefficients, is to be computed using the bisection method on the sequence of tridiagonal matrices

\{T_{k}\}

generated during the iteration. Note that

\bar{A} = A

when a preconditioner is not used.

sigmax > 0.0

(see below), i.e., when

σ_{1} (\bar{A})

is supplied, the value of sigcmp is ignored.

$sigcmp = Nag_SparseSym_Bisect$: $σ_{1} (\bar{A})$ is to be computed using the bisection method.
$sigcmp = Nag_SparseSym_NoBisect$: The bisection method is not used.
If the termination criterion (3) is used, requiring $σ_{1} (\bar{A})$ , an inexpensive estimate is computed and used (see Section 3).

Suggested value:

sigcmp = Nag_SparseSym_NoBisect

Constraint:

sigcmp = Nag_SparseSym_Bisect

Nag_SparseSym_NoBisect

4: $norm$ – Nag_NormType Input

On entry: defines the matrix and vector norm to be used in the termination criteria.

$norm = Nag_OneNorm$: Use the $l_{1}$ norm.
$norm = Nag_InfNorm$: Use the $l_{\infty}$ norm.
$norm = Nag_TwoNorm$: Use the $l_{2}$ norm.

Suggested values:

if $iterm = 1$ , $norm = Nag_InfNorm$ ;
if $iterm = 2$ , $norm = Nag_TwoNorm$ .

Constraints:

if $iterm = 1$ , $norm = Nag_OneNorm$ , $Nag_InfNorm$ or $Nag_TwoNorm$ ;
if $iterm = 2$ , $norm = Nag_TwoNorm$ .

5: $weight$ – Nag_SparseSym_Weight Input

On entry: specifies whether a vector

w

of user-supplied weights is to be used in the vector norms used in the computation of termination criterion (2) (

iterm = 1

{‖v‖}_{p}^{(w)} = {‖v^{(w)}‖}_{p}

, where

v_{i}^{(w)} = w_{i} v_{i}

, for

i = 1, 2, \dots, n

. The suffix

p = 1, 2, \infty

denotes the vector norm used, as specified by the argument norm. Note that weights cannot be used when

iterm = 2

, i.e., when criterion (3) is used.

$weight = Nag_SparseSym_Weighted$: User-supplied weights are to be used and must be supplied on initial entry to f11gsc.
$weight = Nag_SparseSym_UnWeighted$: All weights are implicitly set equal to one. Weights do not need to be supplied on initial entry to f11gsc.

Suggested value:

weight = Nag_SparseSym_UnWeighted

Constraints:

if $iterm = 1$ , $weight = Nag_SparseSym_Weighted$ or $Nag_SparseSym_UnWeighted$ ;
if $iterm = 2$ , $weight = Nag_SparseSym_UnWeighted$ .

6: $iterm$ – Integer Input

On entry: defines the termination criterion to be used.

$iterm = 1$: Use the termination criterion defined in (2) (both conjugate gradient and Lanczos (SYMMLQ) methods).
$iterm = 2$: Use the termination criterion defined in (3) (Lanczos method (SYMMLQ) only).

Suggested value:

iterm = 1

Constraints:

if $method = Nag_SparseSym_CG$ , $iterm = 1$ ;
if $method = Nag_SparseSym_SYMMLQ$ , $iterm = 1$ or $2$ .

7: $n$ – Integer Input

On entry:

n

, the order of the matrix

A

Constraint:

n > 0

8: $tol$ – double Input

On entry: the tolerance

τ

for the termination criterion.

tol \leq 0.0

τ = \max (\sqrt{ε}, \sqrt{n} ε)

is used, where

ε

is the machine precision.

Otherwise

τ = \max (tol, 10 ε, \sqrt{n} ε)

is used.

Constraint:

tol < 1.0

9: $maxitn$ – Integer Input

On entry: the maximum number of iterations.

Constraint:

maxitn > 0

10: $anorm$ – double Input

On entry: if

anorm > 0.0

, the value of

{‖A‖}_{p}

to be used in the termination criterion (2) (

iterm = 1

anorm \leq 0.0

iterm = 1

and

norm = Nag_OneNorm

Nag_InfNorm

{‖A‖}_{1} = {‖A‖}_{\infty}

is estimated internally by f11gsc.

iterm = 2

, anorm is not referenced.

Constraint: if

iterm = 1

and

norm = Nag_TwoNorm

anorm > 0.0

11: $sigmax$ – double Input

On entry: if

sigmax > 0.0

, the value of

σ_{1} (\bar{A}) = {‖E^{- 1} A E^{- H}‖}_{2}

sigmax \leq 0.0

σ_{1} (\bar{A})

is estimated by f11gsc when either

sigcmp = Nag_SparseSym_Bisect

or termination criterion (3) (

iterm = 2

) is employed, though it will be used only in the latter case.

Otherwise, sigmax is not referenced.

12: $sigtol$ – double Input

On entry: the tolerance used in assessing the convergence of the estimate of

σ_{1} (\bar{A}) = {‖\bar{A}‖}_{2}

when the bisection method is used.

sigtol \leq 0.0

, the default value

sigtol = 0.01

is used. The actual value used is

\max (sigtol, ε)

sigcmp = Nag_SparseSym_NoBisect

sigmax > 0.0

, sigtol is not referenced.

Suggested value:

sigtol = 0.01

should be sufficient in most cases.

Constraint: if

sigcmp = Nag_SparseSym_Bisect

and

sigmax \leq 0.0

sigtol < 1.0

13: $maxits$ – Integer Input

On entry: the maximum iteration number

k = maxits

for which

σ_{1} (T_{k})

is computed by bisection (see also Section 3). If

sigcmp = Nag_SparseSym_NoBisect

sigmax > 0.0

, maxits is not referenced.

Suggested value:

maxits = \min (10, n)

when sigtol is of the order of its default value

(0.01)

Constraint: if

sigcmp = Nag_SparseSym_Bisect

and

sigmax \leq 0.0

1 \leq maxits \leq maxitn

14: $monit$ – Integer Input

On entry: if

monit > 0

, the frequency at which a monitoring step is executed by f11gsc: the current solution and residual iterates will be returned by f11gsc and a call to f11gtc made possible every monit iterations, starting from iteration number monit. Otherwise, no monitoring takes place. There are some additional computational costs involved in monitoring the solution and residual vectors when the Lanczos method (SYMMLQ) is used.

Constraint:

monit \leq maxitn

15: $lwreq$ – Integer * Output

On exit: the minimum amount of workspace required by f11gsc. (See also Section 5 in f11gsc.)

16: $work [lwork]$ – Complex Communication Array

On exit: the array work is initialized by f11grc. It must not be modified before calling the next function in the suite, namely f11gsc.

17: $lwork$ – Integer Input

On entry: the dimension of the array work.

Constraint:

lwork \geq 120

Note: although the minimum value of lwork ensures the correct functioning of f11grc, a larger value is required by the other functions in the suite, namely f11gsc and f11gtc. The required value is as follows:

Method	Requirements
CG	$lwork = 120 + 5 n + p$
SYMMLQ	$lwork = 120 + 6 n + p$

where

$p = 2 \times (maxits + 1)$ , when an estimate of $σ_{1} (A)$ (sigmax) is computed;
$p = 0$ , otherwise.

18: $fail$ – NagError * Input/Output

The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

6 Error Indicators and Warnings

NE_ALLOC_FAIL: Dynamic memory allocation failed.
See Section 3.1.2 in the Introduction to the NAG Library CL Interface for further information.
NE_BAD_PARAM: On entry, argument $〈value〉$ had an illegal value.
NE_CONSTRAINT: On entry, $maxits = 〈value〉$ , $maxitn = 〈value〉$ , $sigcmp = 〈value〉$ , and $sigmax = 〈value〉$ .
Constraint: if $sigcmp = Nag_SparseSym_Bisect$ and $sigmax \leq 0.0$ , $1 \leq maxits \leq maxitn$ .

On entry, $norm = 〈value〉$ , $iterm = 〈value〉$ , and $anorm = 〈value〉$ .
Constraint: if $iterm = 1$ and $norm = Nag_TwoNorm$ or $Nag_InfNorm$ , $anorm > 0.0$ .
NE_ENUM_INT: On entry, $iterm = 〈value〉$ and $method = 〈value〉$ .
Constraint: if $method = Nag_SparseSym_CG$ , $iterm = 1$ . If $method = Nag_SparseSym_SYMMLQ$ , $iterm = 1$ or $2$ .

On entry, $iterm = 〈value〉$ and $norm = 〈value〉$ .
Constraint: if $iterm = 2$ , $norm = Nag_TwoNorm$ or $Nag_InfNorm$ .

On entry, $iterm = 〈value〉$ and $weight = 〈value〉$ .
Constraint: if $iterm = 2$ , $weight = Nag_SparseSym_UnWeighted$ .
NE_ENUM_REAL_2: On entry, $sigcmp = 〈value〉$ , $sigtol = 〈value〉$ , and $sigmax = 〈value〉$ .
Constraint: if $sigcmp = Nag_SparseSym_Bisect$ and $sigmax \leq 0.0$ , $sigtol < 1.0$ .
NE_INT: On entry, $lwork = 〈value〉$ .
Constraint: $lwork \geq 120$ .

On entry, $maxitn = 〈value〉$ .
Constraint: $maxitn > 0$ .

On entry, $n = 〈value〉$ .
Constraint: $n > 0$ .
NE_INT_2: On entry, $monit = 〈value〉$ and $maxitn = 〈value〉$ .
Constraint: $monit \leq maxitn$ .
NE_INTERNAL_ERROR: An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
See Section 7.5 in the Introduction to the NAG Library CL Interface for further information.
NE_NO_LICENCE: Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library CL Interface for further information.
NE_OUT_OF_SEQUENCE: f11grc has been called out of sequence: either f11grc has been called twice or f11gsc has not terminated its current task.
NE_REAL: On entry, $tol = 〈value〉$ .
Constraint: $tol < 1.0$ .

7 Accuracy

Not applicable.

8 Parallelism and Performance

f11grc is not threaded in any implementation.

9 Further Comments

When

σ_{1} (\bar{A})

is not supplied (

sigmax \leq 0.0

) but it is required, it is estimated by f11gsc using either of the two methods described in Section 3, as specified by the argument sigcmp. In particular, if

sigcmp = Nag_SparseSym_Bisect

, then the computation of

σ_{1} (\bar{A})

is deemed to have converged when the differences between three successive values of

σ_{1} (T_{k})

differ, in a relative sense, by less than the tolerance sigtol, i.e., when

\max (\frac{|σ_{1}^{(k)} - σ_{1}^{(k - 1)}|}{σ_{1}^{(k)}}, \frac{|σ_{1}^{(k)} - σ_{1}^{(k - 2)}|}{σ_{1}^{(k)}}) \leq sigtol .

The computation of

σ_{1} (\bar{A})

is also terminated when the iteration count exceeds the maximum value allowed, i.e.,

k \geq maxits

Bisection is increasingly expensive with increasing iteration count. A reasonably large value of sigtol, of the order of the suggested value, is recommended and an excessive value of maxits should be avoided. Under these conditions,

σ_{1} (\bar{A})

usually converges within very few iterations.

10 Example

This example solves a complex Hermitian system of simultaneous linear equations using the conjugate gradient method, where the matrix of the coefficients

A

, has a random sparsity pattern. An incomplete Cholesky preconditioner is used (f11jac and f11jbc).

Interfaces: FL CL AD

NAG CL Interface Introduction

F11 (Sparse) Chapter Contents

F11 (Sparse) Chapter Introduction

f11gr: FL CL AD

NAG CL Interfacef11grc (complex_​herm_​basic_​setup)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG CL Interface
f11grc (complex_herm_basic_setup)