NAG Library Routine Document

f01brf (real_gen_sparse_lu)

1
Purpose

f01brf factorizes a real sparse matrix. The routine either forms the LU factorization of a permutation of the entire matrix, or, optionally, first permutes the matrix to block lower triangular form and then only factorizes the diagonal blocks.

2
Specification

Fortran Interface
Subroutine f01brf ( n, nz, a, licn, irn, lirn, icn, pivot, ikeep, iw, w, lblock, grow, abort, idisp, ifail)
Integer, Intent (In):: n, nz, licn, lirn
Integer, Intent (Inout):: irn(lirn), icn(licn), ifail
Integer, Intent (Out):: ikeep(5*n), iw(8*n), idisp(10)
Real (Kind=nag_wp), Intent (In):: pivot
Real (Kind=nag_wp), Intent (Inout):: a(licn)
Real (Kind=nag_wp), Intent (Out):: w(n)
Logical, Intent (In):: lblock, grow, abort(4)
C Header Interface
#include <nagmk26.h>
void  f01brf_ (const Integer *n, const Integer *nz, double a[], const Integer *licn, Integer irn[], const Integer *lirn, Integer icn[], const double *pivot, Integer ikeep[], Integer iw[], double w[], const logical *lblock, const logical *grow, const logical abort[], Integer idisp[], Integer *ifail)

3
Description

Given a real sparse matrix A, f01brf may be used to obtain the LU factorization of a permutation of A,
PAQ=LU  
where P and Q are permutation matrices, L is unit lower triangular and U is upper triangular. The routine uses a sparse variant of Gaussian elimination, and the pivotal strategy is designed to compromise between maintaining sparsity and controlling loss of accuracy through round-off.
Optionally the routine first permutes the matrix into block lower triangular form and then only factorizes the diagonal blocks. For some matrices this gives a considerable saving in storage and execution time.
Extensive data checks are made; duplicated nonzeros can be accumulated.
The factorization is intended to be used by f04axf to solve sparse systems of linear equations Ax=b or ATx=b. If several matrices of the same sparsity pattern are to be factorized, f01bsf should be used for the second and subsequent matrices.
The method is fully described in Duff (1977).
A more recent algorithm for the same calculation is provided by f11mef.

4
References

Duff I S (1977) MA28 – a set of Fortran subroutines for sparse unsymmetric linear equations AERE Report R8730 HMSO

5
Arguments

1:     n – IntegerInput
On entry: n, the order of the matrix A.
Constraint: n>0.
2:     nz – IntegerInput
On entry: the number of nonzero elements in the matrix A.
Constraint: nz>0.
3:     alicn – Real (Kind=nag_wp) arrayInput/Output
On entry: ai, for i=1,2,,nz, must contain the nonzero elements of the sparse matrix A. They can be in any order since f01brf will reorder them.
On exit: the nonzero elements in the LU factorization. The array must not be changed by you between a call of f01brf and a call of f04axf.
4:     licn – IntegerInput
On entry: the dimension of the arrays a and icn as declared in the (sub)program from which f01brf is called. Since the factorization is returned in a and icn, licn should be large enough to accommodate this and should ordinarily be 2 to 4 times as large as nz.
Constraint: licnnz.
5:     irnlirn – Integer arrayInput/Output
On entry: irni, for i=1,2,,nz, must contain the row index of the nonzero element stored in ai.
On exit: irn is overwritten and is not needed for subsequent calls of f01bsf or f04axf.
6:     lirn – IntegerInput
On entry: the dimension of the array irn as declared in the (sub)program from which f01brf is called. It need not be as large as licn; normally it will not need to be very much greater than nz.
Constraint: lirnnz.
7:     icnlicn – Integer arrayCommunication Array
icni, for i=1,2,,nz, must contain, on entry, the column index of the nonzero element stored in ai. icn contains, on exit, the column indices of the nonzero elements in the factorization. The array must not be changed by you between a call of f01brf and subsequent calls of f01bsf or f04axf.
8:     pivot – Real (Kind=nag_wp)Input
On entry: should have a value in the range 0.0pivot0.9999 and is used to control the choice of pivots. If pivot<0.0, the value 0.0 is assumed, and if pivot>0.9999, the value 0.9999 is assumed. When searching a row for a pivot, any element is excluded which is less than pivot times the largest of those elements in the row available as pivots. Thus decreasing pivot biases the algorithm to maintaining sparsity at the expense of stability.
Suggested value: pivot=0.1 has been found to work well on test examples.
9:     ikeep5×n – Integer arrayCommunication Array
On exit: indexing information about the factorization.
You must not change ikeep between a call of f01brf and subsequent calls to f01bsf or f04axf.
10:   iw8×n – Integer arrayWorkspace
11:   wn – Real (Kind=nag_wp) arrayOutput
On exit: if grow=.TRUE., w1 contains an estimate (an upper bound) of the increase in size of elements encountered during the factorization (see grow); the rest of the array is used as workspace.
If grow=.FALSE., the array is not used.
12:   lblock – LogicalInput
On entry: if lblock=.TRUE., the matrix is preordered into block lower triangular form before the LU factorization is performed; otherwise the entire matrix is factorized.
Suggested value: lblock=.TRUE. unless the matrix is known to be irreducible, or is singular and an upper bound on the rank is required.
13:   grow – LogicalInput
On entry: if grow=.TRUE., then on exit w1 contains an estimate (an upper bound) of the increase in size of elements encountered during the factorization. If the matrix is well-scaled (see Section 9.2), then a high value for w1 indicates that the LU factorization may be inaccurate and you should be wary of the results and perhaps increase the argument pivot for subsequent runs (see Section 7).
Suggested value: grow=.TRUE..
14:   abort4 – Logical arrayInput
On entry: if abort1=.TRUE., f01brf will exit immediately on detecting a structural singularity (one that depends on the pattern of nonzeros) and return ifail=1; otherwise it will complete the factorization (see Section 9.3).
If abort2=.TRUE., f01brf will exit immediately on detecting a numerical singularity (one that depends on the numerical values) and return ifail=2; otherwise it will complete the factorization (see Section 9.3).
If abort3=.TRUE., f01brf will exit immediately (with ifail=5) when the arrays a and icn are filled up by the previously factorized, active and unfactorized parts of the matrix; otherwise it continues so that better guidance on necessary array sizes can be given in idisp6 and idisp7, and will exit with ifail in the range 4 to 6. Note that there is always an immediate error exit if the array irn is too small.
If abort4=.TRUE., f01brf exits immediately (with ifail=13) if it finds duplicate elements in the input matrix.
If abort4=.FALSE., f01brf proceeds using a value equal to the sum of the duplicate elements. In either case details of each duplicate element are output on the current advisory message unit (see x04abf), unless suppressed by the value of ifail on entry.
Suggested values:
  • abort1=.TRUE.;
  • abort2=.TRUE.;
  • abort3=.FALSE.;
  • abort4=.TRUE..
15:   idisp10 – Integer arrayCommunication Array
On exit: contains information about the factorization.
idisp1 and idisp2 indicate the position in arrays a and icn of the first and last elements in the LU factorization of the diagonal blocks. (idisp2 gives the number of nonzeros in the factorization.) idisp1 and idisp2 must not be changed by you between a call of f01brf and subsequent calls to f01bsf or f04axf.
idisp3 and idisp4 monitor the adequacy of ‘elbow room’ in the arrays irn and a (and icn) respectively, by giving the number of times that the data in these arrays has been compressed during the factorization to release more storage. If either idisp3 or idisp4 is quite large (say greater than 10), it will probably pay you to increase the size of the corresponding array(s) for subsequent runs. If either is very low or zero, then you can perhaps save storage by reducing the size of the corresponding array(s).
idisp5, when lblock=.FALSE., gives an upper bound on the rank of the matrix; when lblock=.TRUE., gives an upper bound on the sum of the ranks of the lower triangular blocks.
idisp6 and idisp7 give the minimum size of arrays irn and a (and icn) respectively which would enable a successful run on an identical matrix (but some ‘elbow-room’ should be allowed – see Section 9).
idisp8 to 10 are only used if lblock=.TRUE..
  • idisp8 gives the structural rank of the matrix.
  • idisp9 gives the number of diagonal blocks.
  • idisp10 gives the size of the largest diagonal block.
You must not change idisp between a call of f01brf and subsequent calls to f01bsf or f04axf.
16:   ifail – IntegerInput/Output
For this routine, the normal use of ifail is extended to control the printing of error and warning messages as well as specifying hard or soft failure (see Section 3.4 in How to Use the NAG Library and its Documentation).
On entry: ifail must be set to a value with the decimal expansion cba, where each of the decimal digits c, b and a must have a value of 0 or 1.
a=0 specifies hard failure, otherwise soft failure;
b=0 suppresses error messages, otherwise error messages will be printed (see Section 6);
c=0 suppresses warning messages, otherwise warning messages will be printed (see Section 6).
The recommended value for inexperienced users is 110 (i.e., hard failure with all messages printed).
On exit: ifail=0 unless the routine detects an error or a warning has been flagged (see Section 6).

6
Error Indicators and Warnings

If on entry ifail=0 or -1, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
ifail=1
Matrix is structurally singular – decomposition aborted.
Matrix is structurally singular – decomposition aborted RANK=value.
ifail=2
Matrix is numerically singular – decomposition aborted.
ifail=3
lirn too small. Decomposition aborted at stage value in block value with first row value and last row value.
lirn too small. Decomposition aborted at stage value in block value with first row value and last row value. To continue set lirn to at least value.
ifail=4
licn much too small. Decomposition aborted at stage value in block value with first row value and last row value.
licn much too small. Decomposition aborted at stage value in block value with first row value and last row value. To continue set licn to at least value.
ifail=5
licn too small. Decomposition aborted at stage value in block value with first row value and last row value.
licn too small. Decomposition aborted at stage value in block value with first row value and last row value. To continue set licn to at least value.
licn too small. For successful decomposition set licn to at least value.
ifail=6
licn and lirn too small. Decomposition aborted at stage value in block value with first row value and last row value.
licn and lirn too small. Decomposition aborted at stage value in block value with first row value and last row value. To continue set lirn to at least value.
ifail=7
licn not big enough for permutation – increase by value.
ifail=8
On entry, n=value.
Constraint: n>0.
ifail=9
On entry, nz=value.
Constraint: nz>0.
ifail=10
On entry, licn=value and nz=value.
Constraint: licnnz.
ifail=11
On entry, lirn=value and nz=value.
Constraint: lirnnz.
ifail=12
On entry, irnI or icnI is out of range: I=value, aI=value irnI=value, icnI=value.
ifail=13
On entry, duplicate elements found – see advisory messages.
ifail=-1
Matrix is structurally singular – decomposition completed.
ifail=-2
Matrix is numerically singular – decomposition completed.
ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
See Section 3.9 in How to Use the NAG Library and its Documentation for further information.
ifail=-399
Your licence key may have expired or may not have been installed correctly.
See Section 3.8 in How to Use the NAG Library and its Documentation for further information.
ifail=-999
Dynamic memory allocation failed.
See Section 3.7 in How to Use the NAG Library and its Documentation for further information.

7
Accuracy

The factorization obtained is exact for a perturbed matrix whose i,jth element differs from aij by less than 3ερmij where ε is the machine precision, ρ is the growth value returned in w1 if grow=.TRUE., and mij the number of Gaussian elimination operations applied to element i,j. The value of mij is not greater than n and is usually much less. Small ρ values therefore guarantee accurate results, but unfortunately large ρ values may give a very pessimistic indication of accuracy.

8
Parallelism and Performance

f01brf is not threaded in any implementation.

9
Further Comments

9.1
Timing

The time required may be estimated very roughly from the number τ of nonzeros in the factorized form (output as idisp2) and for f01brf and its associates is
f01brf: 5τ2/n units
f01bsf: τ2/n units
f04axf: 2τ units
where our unit is the time for the inner loop of a full matrix code (e.g., solving a full set of equations takes about 13n3 units). Note that the faster f01bsf time makes it well worthwhile to use this for a sequence of problems with the same pattern.
It should be appreciated that τ varies widely from problem to problem. For network problems it may be little greater than nz, the number of nonzeros in A; for discretization of two-dimensional and three-dimensional partial differential equations it may be about 3n log2n  and 12n5/3, respectively.
The time taken by f01brf to find the block lower triangular form (lblock=.TRUE.) is typically 515% of the time taken by the routine when it is not found (lblock=.FALSE.). If the matrix is irreducible (idisp9=1 after a call with lblock=.TRUE.) then this time is wasted. Otherwise, particularly if the largest block is small (idisp10n), the consequent savings are likely to be greater.
The time taken to estimate growth (grow=.TRUE.) is typically under 20% of the overall time.
The overall time may be substantially increased if there is inadequate ‘elbow-room’ in the arrays a, irn and icn. When the sizes of the arrays are minimal (idisp6 and idisp7) it can execute as much as three times slower. Values of idisp3 and idisp4 greater than about 10 indicate that it may be worthwhile to increase array sizes.

9.2
Scaling

The use of a relative pivot tolerance pivot essentially presupposes that the matrix is well-scaled, i.e., that the matrix elements are broadly comparable in size. Practical problems are often naturally well-scaled but particular care is needed for problems containing mixed types of variables (for example millimetres and neutron fluxes).

9.3
Singular and Rectangular Systems

It is envisaged that f01brf will almost always be called for square nonsingular matrices and that singularity indicates an error condition. However, even if the matrix is singular it is possible to complete the factorization. It is even possible for f04axf to solve a set of equations whose matrix is singular provided the set is consistent.
Two forms of singularity are possible. If the matrix would be singular for any values of the nonzeros (e.g., if it has a whole row of zeros), then we say it is structurally singular, and continue only if abort1=.FALSE.. If the matrix is nonsingular by virtue of the particular values of the nonzeros, then we say that it is numerically singular and continue only if abort2=.FALSE., in which case an upper bound on the rank of the matrix is returned in idisp5 when lblock=.FALSE..
Rectangular matrices may be treated by setting n to the larger of the number of rows and numbers of columns and setting abort1=.FALSE..
Note:  the soft failure option should be used (last digit of ifail=1) if you wish to factorize singular matrices with abort1 or abort2 set to .FALSE..

9.4
Duplicated Nonzeros

The matrix A may consist of a sum of contributions from different sub-systems (for example finite elements). In such cases you may rely on f01brf to perform assembly, since duplicated elements are summed.

9.5
Determinant

The following code may be used to compute the determinant of A (as the real variable DET) after a call of f01brf:
   det = 1.0
   id = idisp(1)
   Do 10 i = 1, n
      idg = id + ikeep(3*n+i)
      det = det*a(idg)
      If (ikeep(n+i).ne.i)det = -det
      If (ikeep(2*n+i).ne.i)det = -det
      id = id + ikeep(i)
10 Continue

10
Example

This example factorizes the real sparse matrix:
5 0 0 0 0 0 0 2 -1 2 0 0 0 0 3 0 0 0 -2 0 0 1 1 0 -1 0 0 -1 2 -3 -1 -1 0 0 0 6 .  
This example program simply prints out some information about the factorization as returned by f01brf in w1 and idisp. Normally the call of f01brf would be followed by a call of f04axf (see Section 10 in f04axf).

10.1
Program Text

Program Text (f01brfe.f90)

10.2
Program Data

Program Data (f01brfe.d)

10.3
Program Results

Program Results (f01brfe.r)