g01dd calculates Shapiro and Wilk's

W

statistic and its significance level for testing Normality.

Syntax

C#
public static void g01dd( double[] x, int n, bool calwts, double[] a, out double w, out double pw, out int ifail )

Visual Basic
Public Shared Sub g01dd ( _ x As Double(), _ n As Integer, _ calwts As Boolean, _ a As Double(), _ <OutAttribute> ByRef w As Double, _ <OutAttribute> ByRef pw As Double, _ <OutAttribute> ByRef ifail As Integer _ )

Visual Basic

Public Shared Sub g01dd ( _
	x As Double(), _
	n As Integer, _
	calwts As Boolean, _
	a As Double(), _
	<OutAttribute> ByRef w As Double, _
	<OutAttribute> ByRef pw As Double, _
	<OutAttribute> ByRef ifail As Integer _
)

Visual C++
public: static void g01dd( array<double>^ x, int n, bool calwts, array<double>^ a, [OutAttribute] double% w, [OutAttribute] double% pw, [OutAttribute] int% ifail )

F#
static member g01dd : x : float[] * n : int * calwts : bool * a : float[] * w : float byref * pw : float byref * ifail : int byref -> unit

Parameters

x: Type: array<System..::..Double>[]()[][]
An array of size [n]
On entry: the ordered sample values, $x_{i}$ , for $i = 1, 2, \dots, n$ .

n: Type: System..::..Int32
On entry: $n$ , the sample size.

Constraint: $3 \leq n \leq 5000$ .

calwts: Type: System..::..Boolean
On entry: must be set to true if you wish g01dd to calculate the elements of a.
calwts should be set to false if you have saved the values in a from a previous call to g01dd.

If in doubt, set calwts equal to true.

a: Type: array<System..::..Double>[]()[][]
An array of size [n]
On entry: if calwts has been set to false then before entry a must contain the $n$ weights as calculated in a previous call to g01dd, otherwise a need not be set.
On exit: the $n$ weights required to calculate $w$ .

w: Type: System..::..Double%
On exit: the value of the statistic, $w$ .

pw: Type: System..::..Double%
On exit: the significance level of $w$ .

ifail: Type: System..::..Int32%
On exit: $ifail = 0$ unless the method detects an error or a warning has been flagged (see [Error Indicators and Warnings]).

Description

g01dd calculates Shapiro and Wilk's

W

statistic and its significance level for any sample size between

3

and

5000

. It is an adaptation of the Applied Statistics Algorithm AS R94, see Royston (1995). The full description of the theory behind this algorithm is given in Royston (1992).

Given a set of observations

x_{1}, x_{2}, \dots, x_{n}

sorted into either ascending or descending order ( (M01CAF not in this release) may be used to sort the data) this method calculates the value of Shapiro and Wilk's

W

statistic defined as:

W = \frac{{(\sum_{i = 1}^{n} a_{i} x_{i})}^{2}}{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}},

where

\bar{x} = \frac{1}{n} \sum_{1}^{n} x_{i}

is the sample mean and

a_{i}

, for

i = 1, 2, \dots, n

, are a set of ‘weights’ whose values depend only on the sample size

n

On exit, the values of

a_{i}

, for

i = 1, 2, \dots, n

, are only of interest should you wish to call the method again to calculate

w

and its significance level for a different sample of the same size.

It is recommended that the method is used in conjunction with a Normal

(Q - Q)

plot of the data. Methods g01da and g01db can be used to obtain the required Normal scores.

References

Royston J P (1982) Algorithm AS 181: the

W

test for normality Appl. Statist. 31 176–180

Royston J P (1986) A remark on AS 181: the

W

test for normality Appl. Statist. 35 232–234

Royston J P (1992) Approximating the Shapiro–Wilk's

W

test for non-normality Statistics & Computing 2 117–119

Royston J P (1995) A remark on AS R94: A remark on Algorithm AS 181: the

W

test for normality Appl. Statist. 44(4) 547–551

Error Indicators and Warnings

Errors or warnings detected by the method:

$ifail = 1$

On entry,

n < 3

$ifail = 2$

On entry,

n > 5000

$ifail = 3$

On entry,

the elements in x are not in ascending or descending order or are all equal.

$ifail = -9000$: An error occured, see message report.
$ifail = -8000$: Negative dimension for array $〈value〉$
$ifail = -6000$: Invalid Parameters $〈value〉$

Accuracy

There may be a loss of significant figures for large

n

Parallelism and Performance

None.

Further Comments

The time taken by g01dd depends roughly linearly on the value of

n

For very small samples the power of the test may not be very high.

The contents of the array a should not be modified between calls to g01dd for a given sample size, unless calwts is reset to true before each call of g01dd.

The Shapiro and Wilk's

W

test is very sensitive to ties. If the data has been rounded the test can be improved by using Sheppard's correction to adjust the sum of squares about the mean. This produces an adjusted value of

w

W A = W \frac{\sum {x_{(i)} - \bar{x}}^{2}}{\{\sum_{i = 1}^{n} {x_{(i)} = \bar{x}}^{2} - \frac{n - 1}{12} ω^{2}\}},

where

ω

is the rounding width.

W A

can be compared with a standard Normal distribution, but a further approximation is given by Royston (1986).

n > 5000

, a value for w and pw is returned, but its accuracy may not be acceptable. See [References] for more details.

Example

This example tests the following two samples (each of size

20

) for Normality.

Sample Number	Data
1	$0.11$ , $7.87$ , $4.61$ , $10.14$ , $7.95$ , $3.14$ , $0.46$ , $4.43$ , $0.21$ , $4.75$ , $0.71$ , $1.52$ , $3.24$ , $0.93$ , $0.42$ , $4.97$ , $9.53$ , $4.55$ , $0.47$ , $6.66$
2	$1.36$ , $1.14$ , $2.92$ , $2.55$ , $1.46$ , $1.06$ , $5.27$ , $- 1.11$ , $3.48$ , $1.10$ , $0.88$ , $- 0.51$ , $1.46$ , $0.52$ , $6.20$ , $1.69$ , $0.08$ , $3.67$ , $2.81$ , $3.49$

The elements of a are calculated only in the first call of g01dd, and are re-used in the second call.

Example program (C#): g01dde.cs

Example program data: g01dde.d

Example program results: g01dde.r

Syntax

Parameters

Description

References

Error Indicators and Warnings

Accuracy

Parallelism and Performance

Further Comments

Example

See Also