Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_stat_normal_scores_approx (g01db)

## Purpose

nag_stat_normal_scores_approx (g01db) calculates an approximation to the set of Normal Scores, i.e., the expected values of an ordered set of independent observations from a Normal distribution with mean $0.0$ and standard deviation $1.0$.

## Syntax

[pp, ifail] = g01db(n)
[pp, ifail] = nag_stat_normal_scores_approx(n)

## Description

nag_stat_normal_scores_approx (g01db) is an adaptation of the Applied Statistics Algorithm AS $177.3$, see Royston (1982). If you are particularly concerned with the accuracy with which nag_stat_normal_scores_approx (g01db) computes the expected values of the order statistics (see Accuracy), then nag_stat_normal_scores_exact (g01da) which is more accurate should be used instead at a cost of increased storage and computing time.
Let ${x}_{\left(1\right)},{x}_{\left(2\right)},\dots ,{x}_{\left(n\right)}$ be the order statistics from a random sample of size $n$ from the standard Normal distribution. Defining
 $Pr,n=Φ-Exr$
and
 $Qr,n=r-ε n+γ , r= 1,2,…,n,$
where $E\left({x}_{\left(r\right)}\right)$ is the expected value of ${x}_{\left(r\right)}$, the current function approximates the Normal upper tail area corresponding to $E\left({x}_{\left(r\right)}\right)$ as,
 $P~r,n=Qr,n+δ1nQr,nλ+δ2nQr,n 2λ-Cr,n.$
for $\mathit{r}=1,2,3$, and $r\ge 4$. Estimates of $\epsilon$, $\gamma$, ${\delta }_{1}$, ${\delta }_{2}$ and $\lambda$ are obtained. A small correction ${C}_{r,n}$ to ${\stackrel{~}{P}}_{r,n}$ is necessary when $r\le 7$ and $n\le 20$.
The approximation to $E\left({X}_{\left(r\right)}\right)$ is thus given by
 $E x r = - Φ-1 P ~ r , n , r =1,2,…,n .$
Values of the inverse Normal probability integral ${\Phi }^{-1}$ are obtained from nag_stat_inv_cdf_normal (g01fa).

## References

Royston J P (1982) Algorithm AS 177: expected normal order statistics (exact and approximate) Appl. Statist. 31 161–165

## Parameters

### Compulsory Input Parameters

1:     $\mathrm{n}$int64int32nag_int scalar
$n$, the size of the sample.
Constraint: ${\mathbf{n}}\ge 1$.

None.

### Output Parameters

1:     $\mathrm{pp}\left({\mathbf{n}}\right)$ – double array
The Normal scores. ${\mathbf{pp}}\left(\mathit{i}\right)$ contains the value $E\left({x}_{\left(\mathit{i}\right)}\right)$, for $\mathit{i}=1,2,\dots ,n$.
2:     $\mathrm{ifail}$int64int32nag_int scalar
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see Error Indicators and Warnings).

## Error Indicators and Warnings

Errors or warnings detected by the function:
${\mathbf{ifail}}=1$
 On entry, ${\mathbf{n}}<1$.
${\mathbf{ifail}}=-99$
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.

## Accuracy

For $n\le 2000$, the maximum error is $0.0001$, but nag_stat_normal_scores_approx (g01db) is usually accurate to $5$ or $6$ decimal places. For $n$ up to $5000$, comparison with the exact scores calculated by nag_stat_normal_scores_exact (g01da) shows that the maximum error is $0.001$.

The time taken by nag_stat_normal_scores_approx (g01db) is proportional to $n$.

## Example

A program to calculate the expected values of the order statistics for a sample of size $10$.
```function g01db_example

fprintf('g01db example results\n\n');

n = int64(10);
[pp, ifail] = g01db(n);

fprintf('Sample size = %5d\n', n);
fprintf('Normal scores\n');
fprintf('              %10.4f%10.4f%10.4f%10.4f%10.4f\n',pp(1:n));

g01db_plot;

function g01db_plot
% This produces a Q-Q plot for a randomly generated set of data.
% The normal scores have been calculated using g01db and the sample
% quantiles obtained by sorting the observed data using m01ca.

% Initialize the Mersenne Twister generator
seed = [int64(6324213)];
genid = int64(3);
subid = int64(0);

[state, ifail] = g05kf( ...
genid, subid, seed);

% Generate 50 variates from a Student t distribution with 5 df
n  = int64(50);
df = int64(5);
[state, x, ifail] = g05sn( ...
n, df, state);
% Sort x
m1 = int64(1);
[x, ifail] = m01ca(x, m1, 'Ascending');

% Calculate normal scores
[pp, ifail] = g01db(n);

fig1 = figure;
hold on
plot(pp,x,'+r',[-2.25 2.25],[-2.25 2.25],'black');
axis([-2.5 2.5 -3 3]);
xlabel('Normal scores');
ylabel('sample Quantiles');
title({'Q-Q plot for a random set of data','using exact normal scores'});
hold off;
```
```g01db example results

Sample size =    10
Normal scores
-1.5388   -1.0014   -0.6561   -0.3757   -0.1227
0.1227    0.3757    0.6561    1.0014    1.5388
``` This shows a Q-Q plot for a randomly generated set of data. The normal scores have been calculated using nag_stat_normal_scores_approx (g01db) and the sample quantiles obtained by sorting the observed data using nag_sort_realvec_sort (m01ca). A reference line at $y=x$ is also shown.