NAG Library Function Document

nag_rank_regsn (g08rac)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

▸▿ 10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

1 Purpose

nag_rank_regsn (g08rac) calculates the parameter estimates, score statistics and their variance-covariance matrices for the linear model using a likelihood based on the ranks of the observations.

2 Specification

#include <nag.h>

#include <nagg08.h>

void	nag_rank_regsn (Nag_OrderType order, Integer ns, const Integer nv[], const double y[], Integer p, const double x[], Integer pdx, Integer idist, Integer nmax, double tol, double prvr[], Integer pdparvar, Integer irank[], double zin[], double eta[], double vapvec[], double parest[], NagError *fail)

3 Description

Analysis of data can be made by replacing observations by their ranks. The analysis produces inference for regression arguments arising from the following model.

For random variables

Y_{1}, Y_{2}, \dots, Y_{n}

we assume that, after an arbitrary monotone increasing differentiable transformation,

h (.)

, the model

h (Y_{i}) = x_{i}^{T} β + ε_{i}

(1)

holds, where

x_{i}

is a known vector of explanatory variables and

β

is a vector of

p

unknown regression coefficients. The

ε_{i}

are random variables assumed to be independent and identically distributed with a completely known distribution which can be one of the following: Normal, logistic, extreme value or double-exponential. In Pettitt (1982) an estimate for

β

is proposed as

\hat{β} = M X^{T} a

with estimated variance-covariance matrix

M

. The statistics

a

and

M

depend on the ranks

r_{i}

of the observations

Y_{i}

and the density chosen for

ε_{i}

The matrix

X

is the

n

p

matrix of explanatory variables. It is assumed that

X

is of rank

p

and that a column or a linear combination of columns of

X

is not equal to the column vector of

1

or a multiple of it. This means that a constant term cannot be included in the model (1). The statistics

a

and

M

are found as follows. Let

ε_{i}

have pdf

f (ε)

and let

g = - f^{'} / f

. Let

W_{1}, W_{2}, \dots, W_{n}

be order statistics for a random sample of size

n

with the density

f (.)

. Define

Z_{i} = g (W_{i})

, then

a_{i} = E (Z_{r_{i}})

. To define

M

we need

M^{- 1} = X^{T} (B - A) X

, where

B

is an

n

n

diagonal matrix with

B_{i i} = E (g^{'} (W_{r_{i}}))

and

A

is a symmetric matrix with

A_{i j} = cov (Z_{r_{i}}, Z_{r_{j}})

. In the case of the Normal distribution, the

Z_{1} < \dots < Z_{n}

are standard Normal order statistics and

E (g^{'} (W_{i})) = 1

, for

i = 1, 2, \dots, n

The analysis can also deal with ties in the data. Two observations are adjudged to be tied if

|Y_{i} - Y_{j}| < tol

, where tol is a user-supplied tolerance level.

Various statistics can be found from the analysis:

(a)	The score statistic $X^{T} a$ . This statistic is used to test the hypothesis $H_{0} : β = 0$ , see (e).
(b)	The estimated variance-covariance matrix $X^{T} (B - A) X$ of the score statistic in (a).
(c)	The estimate $\hat{β} = M X^{T} a$ .
(d)	The estimated variance-covariance matrix $M = {(X^{T} (B - A) X)}^{- 1}$ of the estimate $\hat{β}$ .
(e)	The $χ^{2}$ statistic $Q = {\hat{β}}^{T} M^{- 1} \hat{β} = a^{T} X {(X^{T} (B - A) X)}^{- 1} X^{T} a$ used to test $H_{0} : β = 0$ . Under $H_{0}$ , $Q$ has an approximate $χ^{2}$ -distribution with $p$ degrees of freedom.
(f)	The standard errors $M_{i i}^{1 / 2}$ of the estimates given in (c).
(g)	Approximate $z$ -statistics, i.e., $Z_{i} = {\hat{β}}_{i} / s e ({\hat{β}}_{i})$ for testing $H_{0} : β_{i} = 0$ . For $i = 1, 2, \dots, n$ , $Z_{i}$ has an approximate $N (0, 1)$ distribution.

In many situations, more than one sample of observations will be available. In this case we assume the model

h_{k} (Y_{k}) = X_{k}^{T} β + e_{k}, k = 1, 2, \dots, ns,

where ns is the number of samples. In an obvious manner,

Y_{k}

and

X_{k}

are the vector of observations and the design matrix for the

k

th sample respectively. Note that the arbitrary transformation

h_{k}

can be assumed different for each sample since observations are ranked within the sample.

The earlier analysis can be extended to give a combined estimate of

β

\hat{β} = D d

, where

D^{- 1} = \sum_{k = 1}^{ns} X_{k}^{T} (B_{k} - A_{k}) X_{k}

and

d = \sum_{k = 1}^{ns} X_{k}^{T} a_{k},

with

a_{k}

B_{k}

and

A_{k}

defined as

a

B

and

A

above but for the

k

th sample.

The remaining statistics are calculated as for the one sample case.

4 References

Pettitt A N (1982) Inference for the linear model using a likelihood based on ranks J. Roy. Statist. Soc. Ser. B 44 234–243

5 Arguments

1: $order$ – Nag_OrderTypeInput

On entry: the order argument specifies the two-dimensional storage scheme being used, i.e., row-major ordering or column-major ordering. C language defined storage is specified by

order = Nag_RowMajor

. See Section 3.2.1.3 in the Essential Introduction for a more detailed explanation of the use of this argument.

Constraint:

order = Nag_RowMajor

Nag_ColMajor

2: $ns$ – IntegerInput

On entry: the number of samples.

Constraint:

ns \geq 1

3: $nv [ns]$ – const IntegerInput

On entry: the number of observations in the

i

th sample, for

i = 1, 2, \dots, ns

Constraint:

nv [i] \geq 1

, for

i = 0, 1, \dots, ns - 1

4: $y [\dim]$ – const doubleInput

Note: the dimension, dim, of the array y must be at least

(\sum_{i = 1}^{ns} nv [i - 1])

On entry: the observations in each sample. Specifically,

y [\sum_{k = 1}^{i - 1} nv [k - 1] + j - 1]

must contain the

j

th observation in the

i

th sample.

5: $p$ – IntegerInput

On entry: the number of parameters to be fitted.

Constraint:

p \geq 1

6: $x [\dim]$ – const doubleInput

Note: the dimension, dim, of the array x must be at least

$\max (1, pdx \times p)$ when $order = Nag_ColMajor$ ;
$\max (1, (\sum_{i = 1}^{ns} nv [i - 1]) \times pdx)$ when $order = Nag_RowMajor$ .

Where

X (i, j)

appears in this document, it refers to the array element

$x [(j - 1) \times pdx + i - 1]$ when $order = Nag_ColMajor$ ;
$x [(i - 1) \times pdx + j - 1]$ when $order = Nag_RowMajor$ .

On entry: the design matrices for each sample. Specifically,

X (\sum_{k = 1}^{i - 1} nv [k - 1] + j, l)

must contain the value of the

l

th explanatory variable for the

j

th observation in the

i

th sample.

Constraint:

x

must not contain a column with all elements equal.

7: $pdx$ – IntegerInput

On entry: the stride separating row or column elements (depending on the value of order) in the array x.

Constraints:

if $order = Nag_ColMajor$ , $pdx \geq (\sum_{i = 1}^{ns} nv [i - 1])$ ;
if $order = Nag_RowMajor$ , $pdx \geq p$ .

8: $idist$ – IntegerInput

On entry: the error distribution to be used in the analysis.

$idist = 1$: Normal.
$idist = 2$: Logistic.
$idist = 3$: Extreme value.
$idist = 4$: Double-exponential.

Constraint:

1 \leq idist \leq 4

9: $nmax$ – IntegerInput

On entry: the value of the largest sample size.

Constraint:

nmax = \max_{1 \leq i \leq ns} (nv [i - 1])

and

nmax > p

10: $tol$ – doubleInput

On entry: the tolerance for judging whether two observations are tied. Thus, observations

Y_{i}

and

Y_{j}

are adjudged to be tied if

|Y_{i} - Y_{j}| < tol

Constraint:

tol > 0.0

11: $prvr [\dim]$ – doubleOutput

Note: the dimension, dim, of the array prvr must be at least

$\max (1, pdparvar \times p)$ when $order = Nag_ColMajor$ ;
$\max (1, p + 1 \times pdparvar)$ when $order = Nag_RowMajor$ .

Where

PRVR (i, j)

appears in this document, it refers to the array element

$prvr [(j - 1) \times pdparvar + i - 1]$ when $order = Nag_ColMajor$ ;
$prvr [(i - 1) \times pdparvar + j - 1]$ when $order = Nag_RowMajor$ .

On exit: the variance-covariance matrices of the score statistics and the parameter estimates, the former being stored in the upper triangle and the latter in the lower triangle. Thus for

1 \leq i \leq j \leq p

PRVR (i, j)

contains an estimate of the covariance between the

i

th and

j

th score statistics. For

1 \leq j \leq i \leq p - 1

PRVR (i + 1, j)

contains an estimate of the covariance between the

i

th and

j

th parameter estimates.

12: $pdparvar$ – IntegerInput

On entry: the stride separating row or column elements (depending on the value of order) in the array prvr.

Constraints:

if $order = Nag_ColMajor$ , $pdparvar \geq p + 1$ ;
if $order = Nag_RowMajor$ , $pdparvar \geq p$ .

13: $irank [nmax]$ – IntegerOutput

On exit: for the one sample case, irank contains the ranks of the observations.

14: $zin [nmax]$ – doubleOutput

On exit: for the one sample case, zin contains the expected values of the function

g (.)

of the order statistics.

15: $eta [nmax]$ – doubleOutput

On exit: for the one sample case, eta contains the expected values of the function

g' (.)

of the order statistics.

16: $vapvec [nmax \times (nmax + 1) / 2]$ – doubleOutput

On exit: for the one sample case, vapvec contains the upper triangle of the variance-covariance matrix of the function

g (.)

of the order statistics stored column-wise.

17: $parest [4 \times p + 1]$ – doubleOutput

On exit: the statistics calculated by the function.

The first p components of parest contain the score statistics.

The next p elements contain the parameter estimates.

parest [2 \times p]

contains the value of the

χ^{2}

statistic.

The next p elements of parest contain the standard errors of the parameter estimates.

Finally, the remaining p elements of parest contain the

z

-statistics.

18: $fail$ – NagError *Input/Output

The NAG error argument (see Section 3.6 in the Essential Introduction).

6 Error Indicators and Warnings

NE_ALLOC_FAIL: Dynamic memory allocation failed.
See Section 3.2.1.2 in the Essential Introduction for further information.
NE_BAD_PARAM: On entry, argument $〈value〉$ had an illegal value.
NE_INT: On entry, idist is outside the range $1$ to $4$ : $idist = 〈value〉$ .

On entry, $ns = 〈value〉$ .
Constraint: $ns \geq 1$ .

On entry, $p = 〈value〉$ .
Constraint: $p \geq 1$ .

On entry, $pdparvar = 〈value〉$ .
Constraint: $pdparvar > 0$ .

On entry, $pdx = 〈value〉$ .
Constraint: $pdx > 0$ .
NE_INT_2: On entry, $nmax = 〈value〉$ and $p = 〈value〉$ .
Constraint: $nmax > p$ .

On entry, $pdparvar = 〈value〉$ and $p = 〈value〉$ .
Constraint: $pdparvar \geq p$ .

On entry, $pdparvar = 〈value〉$ and $p = 〈value〉$ .
Constraint: $pdparvar \geq p + 1$ .

On entry, $pdx = 〈value〉$ and $p = 〈value〉$ .
Constraint: $pdx \geq p$ .

On entry, $pdx = 〈value〉$ and sum $nv [i - 1] = 〈value〉$ .
Constraint: $pdx \geq$ the sum of $nv [i - 1]$ .
NE_INT_ARRAY: On entry, $nv [〈value〉] = 〈value〉$ .
Constraint: $nv [i] \geq 1$ , for $i = 0, 1, \dots, ns - 1$ .
NE_INT_ARRAY_ELEM_CONS: On entry $M = 〈value〉$ .
Constraint: $M$ elements of array $nv > 0$ .
NE_INTERNAL_ERROR: An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.

An unexpected error has been triggered by this function. Please contact NAG.
See Section 3.6.6 in the Essential Introduction for further information.
NE_MAT_ILL_DEFINED: The matrix $X^{T} (B - A) X$ is either singular or non positive definite.
NE_NO_LICENCE: Your licence key may have expired or may not have been installed correctly.
See Section 3.6.5 in the Essential Introduction for further information.
NE_OBSERVATIONS: All the observations were adjudged to be tied.
NE_REAL: On entry, $tol = 〈value〉$ .
Constraint: $tol > 0.0$ .
NE_REAL_ARRAY_ELEM_CONS: On entry, all elements in column $〈value〉$ of $x$ are equal to $〈value〉$ .
NE_SAMPLE: The largest sample size is $〈value〉$ which is not equal to nmax, $nmax = 〈value〉$ .

7 Accuracy

The computations are believed to be stable.

8 Parallelism and Performance

nag_rank_regsn (g08rac) is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.

nag_rank_regsn (g08rac) makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.

Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this function. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9 Further Comments

The time taken by nag_rank_regsn (g08rac) depends on the number of samples, the total number of observations and the number of arguments fitted.

In extreme cases the parameter estimates for certain models can be infinite, although this is unlikely to occur in practice. See Pettitt (1982) for further details.

10 Example

A program to fit a regression model to a single sample of

20

observations using two explanatory variables. The error distribution will be taken to be logistic.

NAG Library Function Documentnag_rank_regsn (g08rac)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG Library Function Document

nag_rank_regsn (g08rac)