d03edf solves seven-diagonal systems of linear equations which arise from the discretization of an elliptic partial differential equation on a rectangular region. This routine uses a multigrid technique.

2 Specification

Fortran Interface

Subroutine d03edf (

ngx, ngy, lda, a, rhs, ub, maxit, acc, us, u, iout, numit, ifail)

Integer, Intent (In)	::	ngx, ngy, lda, maxit, iout
Integer, Intent (Inout)	::	ifail
Integer, Intent (Out)	::	numit
Real (Kind=nag_wp), Intent (In)	::	acc
Real (Kind=nag_wp), Intent (Inout)	::	a(lda,7), rhs(lda), ub(ngx*ngy)
Real (Kind=nag_wp), Intent (Out)	::	us(lda), u(lda)

C Header Interface

#include <nag.h>

void	d03edf_ (const Integer ngx, const Integer ngy, const Integer lda, double a[], double rhs[], double ub[], const Integer maxit, const double acc, double us[], double u[], const Integer iout, Integer numit, Integer ifail)

The routine may be called by the names d03edf or nagf_pde_dim2_ellip_mgrid.

3 Description

d03edf solves, by multigrid iteration, the seven-point scheme

\begin{array}{clclcl} A_{i, j}^{6} u_{i - 1, j + 1} & + & A_{i, j}^{7} u_{i, j + 1} \\ + & A_{i, j}^{3} u_{i - 1, j} & + & A_{i, j}^{4} u_{i j} & + & A_{i, j}^{5} u_{i + 1, j} \\ + & A_{i, j}^{1} u_{i, j - 1} & + & A_{i, j}^{2} u_{i + 1, j - 1} = f_{i j}, i = 1, 2, \dots, n_{x} ​ and ​ j = 1, 2, \dots, n_{y}, \end{array}

which arises from the discretization of an elliptic partial differential equation of the form

α (x, y) U_{x x} + β (x, y) U_{x y} + γ (x, y) U_{y y} + δ (x, y) U_{x} + ε (x, y) U_{y} + ϕ (x, y) U = ψ (x, y)

and its boundary conditions, defined on a rectangular region. This we write in matrix form as

A u = f .

The algorithm is described in separate reports by Wesseling (1982a), Wesseling (1982b) and McCarthy (1983).

Systems of linear equations, matching the seven-point stencil defined above, are solved by a multigrid iteration. An initial estimate of the solution must be provided by you. A zero guess may be supplied if no better approximation is available.

A ‘smoother’ based on incomplete Crout decomposition is used to eliminate the high frequency components of the error. A restriction operator is then used to map the system on to a sequence of coarser grids. The errors are then smoothed and prolongated (mapped onto successively finer grids). When the finest cycle is reached, the approximation to the solution is corrected. The cycle is repeated for maxit iterations or until the required accuracy, acc, is reached.

d03edf will automatically determine the number

l

of possible coarse grids, ‘levels’ of the multigrid scheme, for a particular problem. In other words, d03edf determines the maximum integer

l

so that

n_{x}

and

n_{y}

can be expressed in the form

n_{x} = m 2^{l - 1} + 1, n_{y} = n 2^{l - 1} + 1,   with m \geq 2 ​ and ​ n \geq 2 .

It should be noted that the rate of convergence improves significantly with the number of levels used (see McCarthy (1983)), so that

n_{x}

and

n_{y}

should be carefully chosen so that

n_{x} - 1

and

n_{y} - 1

have factors of the form

2^{l}

, with

l

as large as possible. For good convergence the integer

l

should be at least

2

d03edf has been found to be robust in application, but being an iterative method the problem of divergence can arise. For a strictly diagonally dominant matrix

A

| A_{i j}^{4} | > \sum_{k \neq 4} | A_{i j}^{k} |, i = 1, 2, \dots, n_{x} ​ and ​ j = 1, 2, \dots, n_{y}

no such problem is foreseen. The diagonal dominance of

A

is not a necessary condition, but should this condition be strongly violated then divergence may occur. The quickest test is to try the routine.

4 References

McCarthy G J (1983) Investigation into the multigrid code MGD1 Report AERE-R 10889 Harwell

Wesseling P (1982a) MGD1 – a robust and efficient multigrid method Multigrid Methods. Lecture Notes in Mathematics 960 614–630 Springer–Verlag

Wesseling P (1982b) Theoretical aspects of a multigrid method SIAM J. Sci. Statist. Comput. 3 387–407

5 Arguments

1: $ngx$ – Integer Input

On entry: the number of interior grid points in the

x

-direction,

n_{x}

ngx - 1

should preferably be divisible by as high a power of

2

as possible.

Constraint:

ngx \geq 3

2: $ngy$ – Integer Input

On entry: the number of interior grid points in the

y

-direction,

n_{y}

ngy - 1

should preferably be divisible by as high a power of

2

as possible.

Constraint:

ngy \geq 3

3: $lda$ – Integer Input

On entry: the first dimension of the array a, which must also be a lower bound for the dimension of the arrays rhs, us and u as declared in the (sub)program from which d03edf is called. It is always sufficient to set

lda \geq (4 \times (ngx + 1) \times (ngy + 1)) / 3

, but slightly smaller values may be permitted, depending on the values of ngx and ngy. If on entry, lda is too small, an error message gives the minimum permitted value. (lda must be large enough to allow space for the coarse-grid approximations.)

4: $a (lda, 7)$ – Real (Kind=nag_wp) array Input/Output

On entry:

a (i + (j - 1) \times ngx, k)

must be set to

a_{i j}^{k}

, for

i = 1, 2, \dots, ngx

j = 1, 2, \dots, ngy

and

k = 1, 2, \dots, 7

On exit: is overwritten.

5: $rhs (lda)$ – Real (Kind=nag_wp) array Input/Output

On entry:

rhs (i + (j - 1) \times ngx)

must be set to

f_{i j}

, for

i = 1, 2, \dots, ngx

and

j = 1, 2, \dots, ngy

On exit: the first

ngx \times ngy

elements are unchanged and the rest of the array is used as workspace.

6: $ub (ngx \times ngy)$ – Real (Kind=nag_wp) array Input/Output

On entry:

ub (i + (j - 1) \times ngx)

must be set to the initial estimate for the solution

u_{i j}

On exit: the corresponding component of the residual

r = f - a u

7: $maxit$ – Integer Input

On entry: the maximum permitted number of multigrid iterations. If

maxit = 0

, no multigrid iterations are performed, but the coarse-grid approximations and incomplete Crout decompositions are computed, and may be output if iout is set accordingly.

Constraint:

maxit \geq 0

8: $acc$ – Real (Kind=nag_wp) Input

On entry: the required tolerance for convergence of the residual

2

-norm:

{‖ r ‖}_{2} = \sqrt{\sum_{k = 1}^{ngx \times ngy} {(r_{k})}^{2}}

where

r = f - A u

and

u

is the computed solution. Note that the norm is not scaled by the number of equations. The routine will stop after fewer than maxit iterations if the residual

2

-norm is less than the specified tolerance. (If

maxit > 0

, at least one iteration is always performed.)

If on entry

acc = 0.0

, the machine precision is used as a default value for the tolerance; if

acc > 0.0

, but acc is less than the machine precision, the routine will stop when the residual

2

-norm is less than the machine precision and ifail will be set to

4

Constraint:

acc \geq 0.0

9: $us (lda)$ – Real (Kind=nag_wp) array Output

On exit: the residual

2

-norm, stored in element

us (1)

10: $u (lda)$ – Real (Kind=nag_wp) array Output

On exit: the computed solution

u_{i j}

is returned in

u (i + (j - 1) \times ngx)

, for

i = 1, 2, \dots, ngx

and

j = 1, 2, \dots, ngy

11: $iout$ – Integer Input

On entry: controls the output of printed information to the current advisory message unit (see x04abf):

$iout = 0$: No output.
$iout = 1$: The solution $u_{i j}$ , for $i = 1, 2, \dots, ngx$ and $j = 1, 2, \dots, ngy$ .
$iout = 2$: The residual $2$ -norm after each iteration, with the reduction factor over the previous iteration.
$iout = 3$: As for $iout = 1$ and $iout = 2$ .
$iout = 4$: As for $iout = 3$ , plus the final residual (as returned in ub).
$iout = 5$: As for $iout = 4$ , plus the initial elements of a and rhs.
$iout = 6$: As for $iout = 5$ , plus the Galerkin coarse grid approximations.
$iout = 7$: As for $iout = 6$ , plus the incomplete Crout decompositions.
$iout = 8$: As for $iout = 7$ , plus the residual after each iteration.

The elements

a (p, k)

, the Galerkin coarse grid approximations and the incomplete Crout decompositions are output in the format:

Y-index $= j$
X-index $= i a (p, 1) a (p, 2) a (p, 3) a (p, 4) a (p, 5) a (p, 6) a (p, 7)$
where $p = i + (j - 1) \times ngx$ , for $i = 1, 2, \dots, ngx$ and $j = 1, 2, \dots, ngy$ .

The vectors

u (p)

ub (p)

rhs (p)

are output in matrix form with ngy rows and ngx columns. Where

ngx > 10

, the ngx values for a given

j

value are produced in rows of

10

. Values of

iout > 4

may yield considerable amounts of output.

Constraint:

0 \leq iout \leq 8

12: $numit$ – Integer Output

On exit: the number of iterations performed.

13: $ifail$ – Integer Input/Output

On entry: ifail must be set to

0

−1

1

to set behaviour on detection of an error; these values have no effect when no error is detected.

A value of

0

causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of

−1

means that an error message is printed while a value of

1

means that it is not.

If halting is not appropriate, the value

−1

1

is recommended. If message printing is undesirable, then the value

1

is recommended. Otherwise, the value

0

is recommended. When the value $- 1$ or $1$ is used it is essential to test the value of ifail on exit.

On exit:

ifail = 0

unless the routine detects an error or a warning has been flagged (see Section 6).

6 Error Indicators and Warnings

If on entry

ifail = 0

−1

, explanatory error messages are output on the current error message unit (as defined by x04aaf).

Errors or warnings detected by the routine:

$ifail = 1$: On entry, $acc = ⟨ value ⟩$ .
Constraint: $acc \geq 0.0$ .

On entry, $iout = ⟨ value ⟩$ .
Constraint: $0 \leq iout \leq 8$ .

On entry, $lda = ⟨ value ⟩$ .
Constraint: lda must be at least $⟨ value ⟩$ .

On entry, $maxit = ⟨ value ⟩$ .
Constraint: $maxit \geq 0$ .

On entry, $ngx = ⟨ value ⟩$ .
Constraint: $ngx \geq 3$ .

On entry, $ngy = ⟨ value ⟩$ .
Constraint: $ngy \geq 3$ .

$ifail = 2$: After maxit iterations the residual norm is not less than the tolerance $maxit = ⟨ value ⟩$ , residual norm $= ⟨ value ⟩$ , tolerance $= ⟨ value ⟩$ . The residual norm has decreased at each iteration after the first.

$ifail = 3$: After maxit iterations the residual norm is not less than the tolerance $maxit = ⟨ value ⟩$ , residual norm $= ⟨ value ⟩$ , tolerance $= ⟨ value ⟩$ . The residual norm increased at one or more iterations after the first.

$ifail = 4$: On entry, acc is less than machine precision. The routine terminated because the residual norm is less than machine precision. residual norm $= ⟨ value ⟩$ , machine precision $= ⟨ value ⟩$ and $acc = ⟨ value ⟩$ .

$ifail = - 99$: An unexpected error has been triggered by this routine. Please contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.

$ifail = - 399$: Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.

$ifail = - 999$: Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

7 Accuracy

See acc (Section 5).

8 Parallelism and Performance

Background information to multithreading can be found in the Multithreading documentation.

d03edf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.

Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9 Further Comments

The rate of convergence of this routine is strongly dependent upon the number of levels,

l

, in the multigrid scheme, and thus the choice of ngx and ngy is very important. You are advised to experiment with different values of ngx and ngy to see the effect they have on the rate of convergence; for example, using a value such as

ngx = 65

(

= 2^{6} + 1

) followed by

ngx = 64

(for which

l = 1

10 Example

The program solves the elliptic partial differential equation

U_{x x} - α U_{x y} + U_{y y} = −4, α = 1.7

on the unit square

0 \leq x, y \leq 1

, with boundary conditions

U = 0 ​ on ​ {\begin{array}{cl} x = 0, & (0 \leq y \leq 1) \\ y = 0, & (0 \leq x \leq 1) \\ y = 1, & (0 \leq x \leq 1) \end{array} U = 1 ​ on ​ x = 1, 0 \leq y \leq 1 .

For the equation to be elliptic,

α

must be less than

2

The equation is discretized on a square grid with mesh spacing

h

in both directions using the following approximations:

Figure 1

\begin{array}{lcl} U_{x x} & ≃ & \frac{1}{h^{2}} (U_{E} - 2 U_{O} + U_{W}) \\ U_{y y} & ≃ & \frac{1}{h^{2}} (U_{N} - 2 U_{O} + U_{S}) \\ U_{x y} & ≃ & \frac{1}{2 h^{2}} (U_{N} - U_{NW} + U_{E} - 2 U_{O} + U_{W} - U_{SE} + U_{S}) . \end{array}

Thus the following equations are solved:

\begin{array}{crclcl} \frac{1}{2} α u_{i - 1, j + 1} & + & (1 - \frac{1}{2} α) u_{i, j + 1} \\ + & (1 - \frac{1}{2} α) u_{i + 1, j} & + & (- 4 + α) u_{i j} & + & (1 - \frac{1}{2} α) u_{i + 1, j} \\ + & (1 - \frac{1}{2} α) u_{i, j - 1} & + & \frac{1}{2} α u_{i + 1, j - 1} = −4 h^{2} \end{array}

d03ed: FL CL CPP AD PY MB

NAG FL Interfaced03edf (dim2_​ellip_​mgrid)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG FL Interface
d03edf (dim2_ellip_mgrid)