# NAG FL Interfaceg08raf (rank_​regsn)

## ▸▿ Contents

Settings help

FL Name Style:

FL Specification Language:

## 1Purpose

g08raf calculates the parameter estimates, score statistics and their variance-covariance matrices for the linear model using a likelihood based on the ranks of the observations.

## 2Specification

Fortran Interface
 Subroutine g08raf ( ns, nv, nsum, y, ip, x, ldx, nmax, tol, prvr, zin, eta, work, iwa,
 Integer, Intent (In) :: ns, nsum, ip, ldx, idist, nmax, ldprvr, lwork Integer, Intent (Inout) :: nv(ns), ifail Integer, Intent (Out) :: irank(nmax), iwa(0) Real (Kind=nag_wp), Intent (In) :: y(nsum), x(ldx,ip), tol Real (Kind=nag_wp), Intent (Inout) :: prvr(ldprvr,ip) Real (Kind=nag_wp), Intent (Out) :: zin(nmax), eta(nmax), vapvec(nmax*(nmax+1)/2), parest(4*ip+1), work(0)
#include <nag.h>
 void g08raf_ (const Integer *ns, Integer nv[], const Integer *nsum, const double y[], const Integer *ip, const double x[], const Integer *ldx, const Integer *idist, const Integer *nmax, const double *tol, double prvr[], const Integer *ldprvr, Integer irank[], double zin[], double eta[], double vapvec[], double parest[], double work[], const Integer *lwork, Integer iwa[], Integer *ifail)
The routine may be called by the names g08raf or nagf_nonpar_rank_regsn.

## 3Description

Analysis of data can be made by replacing observations by their ranks. The analysis produces inference for regression parameters arising from the following model.
For random variables ${Y}_{1},{Y}_{2},\dots ,{Y}_{n}$ we assume that, after an arbitrary monotone increasing differentiable transformation, $h\left(.\right)$, the model
 $h(Yi)= xiT β+εi$ (1)
holds, where ${x}_{i}$ is a known vector of explanatory variables and $\beta$ is a vector of $p$ unknown regression coefficients. The ${\epsilon }_{i}$ are random variables assumed to be independent and identically distributed with a completely known distribution which can be one of the following: Normal, logistic, extreme value or double-exponential. In Pettitt (1982) an estimate for $\beta$ is proposed as $\stackrel{^}{\beta }=M{X}^{\mathrm{T}}a$ with estimated variance-covariance matrix $M$. The statistics $a$ and $M$ depend on the ranks ${r}_{i}$ of the observations ${Y}_{i}$ and the density chosen for ${\epsilon }_{i}$.
The matrix $X$ is the $n×p$ matrix of explanatory variables. It is assumed that $X$ is of rank $p$ and that a column or a linear combination of columns of $X$ is not equal to the column vector of $1$ or a multiple of it. This means that a constant term cannot be included in the model (1). The statistics $a$ and $M$ are found as follows. Let ${\epsilon }_{i}$ have pdf $f\left(\epsilon \right)$ and let $g=-{f}^{\prime }/f$. Let ${W}_{1},{W}_{2},\dots ,{W}_{n}$ be order statistics for a random sample of size $n$ with the density $f\left(.\right)$. Define ${Z}_{i}=g\left({W}_{i}\right)$, then ${a}_{i}=E\left({Z}_{{r}_{i}}\right)$. To define $M$ we need ${M}^{-1}={X}^{\mathrm{T}}\left(B-A\right)X$, where $B$ is an $n×n$ diagonal matrix with ${B}_{ii}=E\left({g}^{\prime }\left({W}_{{r}_{i}}\right)\right)$ and $A$ is a symmetric matrix with ${A}_{ij}=\mathrm{cov}\left({Z}_{{r}_{i}},{Z}_{{r}_{j}}\right)$. In the case of the Normal distribution, the ${Z}_{1}<\cdots <{Z}_{n}$ are standard Normal order statistics and $E\left({g}^{\prime }\left({W}_{i}\right)\right)=1$, for $i=1,2,\dots ,n$.
The analysis can also deal with ties in the data. Two observations are adjudged to be tied if $|{Y}_{i}-{Y}_{j}|<{\mathbf{tol}}$, where tol is a user-supplied tolerance level.
Various statistics can be found from the analysis:
1. (a)The score statistic ${X}^{\mathrm{T}}a$. This statistic is used to test the hypothesis ${H}_{0}:\beta =0$, see (e).
2. (b)The estimated variance-covariance matrix ${X}^{\mathrm{T}}\left(B-A\right)X$ of the score statistic in (a).
3. (c)The estimate $\stackrel{^}{\beta }=M{X}^{\mathrm{T}}a$.
4. (d)The estimated variance-covariance matrix $M={\left({X}^{\mathrm{T}}\left(B-A\right)X\right)}^{-1}$ of the estimate $\stackrel{^}{\beta }$.
5. (e)The ${\chi }^{2}$ statistic $Q={\stackrel{^}{\beta }}^{\mathrm{T}}{M}^{-1}\stackrel{^}{\beta }={a}^{\mathrm{T}}X{\left({X}^{\mathrm{T}}\left(B-A\right)X\right)}^{-1}{X}^{\mathrm{T}}a$ used to test ${H}_{0}:\beta =0$. Under ${H}_{0}$, $Q$ has an approximate ${\chi }^{2}$-distribution with $p$ degrees of freedom.
6. (f)The standard errors ${M}_{ii}^{1/2}$ of the estimates given in (c).
7. (g)Approximate $z$-statistics, i.e., ${Z}_{i}={\stackrel{^}{\beta }}_{i}/se\left({\stackrel{^}{\beta }}_{i}\right)$ for testing ${H}_{0}:{\beta }_{i}=0$. For $i=1,2,\dots ,n$, ${Z}_{i}$ has an approximate $N\left(0,1\right)$ distribution.
In many situations, more than one sample of observations will be available. In this case we assume the model
 $hk(Yk)= XkT β+ek, k=1,2,…,ns,$
where ns is the number of samples. In an obvious manner, ${Y}_{k}$ and ${X}_{k}$ are the vector of observations and the design matrix for the $k$th sample respectively. Note that the arbitrary transformation ${h}_{k}$ can be assumed different for each sample since observations are ranked within the sample.
The earlier analysis can be extended to give a combined estimate of $\beta$ as $\stackrel{^}{\beta }=Dd$, where
 $D-1=∑k=1ns XkT (Bk-Ak)Xk$
and
 $d=∑k= 1ns XkT ak ,$
with ${a}_{k}$, ${B}_{k}$ and ${A}_{k}$ defined as $a$, $B$ and $A$ above but for the $k$th sample.
The remaining statistics are calculated as for the one sample case.

## 4References

Pettitt A N (1982) Inference for the linear model using a likelihood based on ranks J. Roy. Statist. Soc. Ser. B 44 234–243

## 5Arguments

1: $\mathbf{ns}$Integer Input
On entry: the number of samples.
Constraint: ${\mathbf{ns}}\ge 1$.
2: $\mathbf{nv}\left({\mathbf{ns}}\right)$Integer array Input
On entry: the number of observations in the $\mathit{i}$th sample, for $\mathit{i}=1,2,\dots ,{\mathbf{ns}}$.
Constraint: ${\mathbf{nv}}\left(\mathit{i}\right)\ge 1$, for $\mathit{i}=1,2,\dots ,{\mathbf{ns}}$.
3: $\mathbf{nsum}$Integer Input
On entry: the total number of observations.
Constraint: ${\mathbf{nsum}}=\sum _{\mathit{i}=1}^{{\mathbf{ns}}}{\mathbf{nv}}\left(\mathit{i}\right)$.
4: $\mathbf{y}\left({\mathbf{nsum}}\right)$Real (Kind=nag_wp) array Input
On entry: the observations in each sample. Specifically, ${\mathbf{y}}\left(\sum _{k=1}^{i-1}{\mathbf{nv}}\left(k\right)+j\right)$ must contain the $j$th observation in the $i$th sample.
5: $\mathbf{ip}$Integer Input
On entry: the number of parameters to be fitted.
Constraint: ${\mathbf{ip}}\ge 1$.
6: $\mathbf{x}\left({\mathbf{ldx}},{\mathbf{ip}}\right)$Real (Kind=nag_wp) array Input
On entry: the design matrices for each sample. Specifically, ${\mathbf{x}}\left(\sum _{k=1}^{i-1}{\mathbf{nv}}\left(k\right)+j,l\right)$ must contain the value of the $l$th explanatory variable for the $j$th observation in the $i$th sample.
Constraint: ${\mathbf{x}}$ must not contain a column with all elements equal.
7: $\mathbf{ldx}$Integer Input
On entry: the first dimension of the array x as declared in the (sub)program from which g08raf is called.
Constraint: ${\mathbf{ldx}}\ge {\mathbf{nsum}}$.
8: $\mathbf{idist}$Integer Input
On entry: the error distribution to be used in the analysis.
${\mathbf{idist}}=1$
Normal.
${\mathbf{idist}}=2$
Logistic.
${\mathbf{idist}}=3$
Extreme value.
${\mathbf{idist}}=4$
Double-exponential.
Constraint: $1\le {\mathbf{idist}}\le 4$.
9: $\mathbf{nmax}$Integer Input
On entry: the value of the largest sample size.
Constraint: ${\mathbf{nmax}}=\underset{1\le i\le {\mathbf{ns}}}{\mathrm{max}}\phantom{\rule{0.25em}{0ex}}\left({\mathbf{nv}}\left(i\right)\right)$ and ${\mathbf{nmax}}>{\mathbf{ip}}$.
10: $\mathbf{tol}$Real (Kind=nag_wp) Input
On entry: the tolerance for judging whether two observations are tied. Thus, observations ${Y}_{i}$ and ${Y}_{j}$ are adjudged to be tied if $|{Y}_{i}-{Y}_{j}|<{\mathbf{tol}}$.
Constraint: ${\mathbf{tol}}>0.0$.
11: $\mathbf{prvr}\left({\mathbf{ldprvr}},{\mathbf{ip}}\right)$Real (Kind=nag_wp) array Output
On exit: the variance-covariance matrices of the score statistics and the parameter estimates, the former being stored in the upper triangle and the latter in the lower triangle. Thus for $1\le i\le j\le {\mathbf{ip}}$, ${\mathbf{prvr}}\left(i,j\right)$ contains an estimate of the covariance between the $i$th and $j$th score statistics. For $1\le j\le i\le {\mathbf{ip}}-1$, ${\mathbf{prvr}}\left(i+1,j\right)$ contains an estimate of the covariance between the $i$th and $j$th parameter estimates.
12: $\mathbf{ldprvr}$Integer Input
On entry: the first dimension of the array prvr as declared in the (sub)program from which g08raf is called.
Constraint: ${\mathbf{ldprvr}}\ge {\mathbf{ip}}+1$.
13: $\mathbf{irank}\left({\mathbf{nmax}}\right)$Integer array Output
On exit: for the one sample case, irank contains the ranks of the observations.
14: $\mathbf{zin}\left({\mathbf{nmax}}\right)$Real (Kind=nag_wp) array Output
On exit: for the one sample case, zin contains the expected values of the function $g\left(.\right)$ of the order statistics.
15: $\mathbf{eta}\left({\mathbf{nmax}}\right)$Real (Kind=nag_wp) array Output
On exit: for the one sample case, eta contains the expected values of the function $g\prime \left(.\right)$ of the order statistics.
16: $\mathbf{vapvec}\left({\mathbf{nmax}}×\left({\mathbf{nmax}}+1\right)/2\right)$Real (Kind=nag_wp) array Output
On exit: for the one sample case, vapvec contains the upper triangle of the variance-covariance matrix of the function $g\left(.\right)$ of the order statistics stored column-wise.
17: $\mathbf{parest}\left(4×{\mathbf{ip}}+1\right)$Real (Kind=nag_wp) array Output
On exit: the statistics calculated by the routine.
The first ip components of parest contain the score statistics.
The next ip elements contain the parameter estimates.
${\mathbf{parest}}\left(2×{\mathbf{ip}}+1\right)$ contains the value of the ${\chi }^{2}$ statistic.
The next ip elements of parest contain the standard errors of the parameter estimates.
Finally, the remaining ip elements of parest contain the $z$-statistics.
18: $\mathbf{work}\left(0\right)$Real (Kind=nag_wp) array Output
19: $\mathbf{lwork}$Integer Input
20: $\mathbf{iwa}\left(0\right)$Integer array Output
On entry: are no longer required by g08raf but is retained for backwards compatibility.
21: $\mathbf{ifail}$Integer Input/Output
On entry: ifail must be set to $0$, $-1$ or $1$ to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of $0$ causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of $-1$ means that an error message is printed while a value of $1$ means that it is not.
If halting is not appropriate, the value $-1$ or $1$ is recommended. If message printing is undesirable, then the value $1$ is recommended. Otherwise, the value $0$ is recommended. When the value $-\mathbf{1}$ or $\mathbf{1}$ is used it is essential to test the value of ifail on exit.
On exit: ${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see Section 6).

## 6Error Indicators and Warnings

If on entry ${\mathbf{ifail}}=0$ or $-1$, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
${\mathbf{ifail}}=1$
On entry, $⟨\mathit{\text{value}}⟩$ elements of ${\mathbf{nv}}\text{​ are ​}<1$.
Constraint: ${\mathbf{nv}}\left(i\right)\ge 1$.
On entry, ${\mathbf{ip}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{ip}}\ge 1$.
On entry, ${\mathbf{ldprvr}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{ip}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{ldprvr}}\ge {\mathbf{ip}}+1$.
On entry, ${\mathbf{ldx}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{nsum}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{ldx}}\ge {\mathbf{nsum}}$.
On entry, ${\mathrm{max}}_{i}{\mathbf{nv}}\left(i\right)=⟨\mathit{\text{value}}⟩$ and ${\mathbf{nmax}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathrm{max}}_{i}{\mathbf{nv}}\left(i\right)={\mathbf{nmax}}$.
On entry, ${\mathbf{nmax}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{ip}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{nmax}}>{\mathbf{ip}}$.
On entry, ${\mathbf{ns}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{ns}}\ge 1$.
On entry, ${\mathbf{tol}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{tol}}>0.0$.
On entry, ${\sum }_{i}{\mathbf{nv}}\left(i\right)=⟨\mathit{\text{value}}⟩$ and ${\mathbf{nsum}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\sum }_{i}{\mathbf{nv}}\left(i\right)={\mathbf{nsum}}$.
${\mathbf{ifail}}=2$
On entry, ${\mathbf{idist}}=⟨\mathit{\text{value}}⟩$.
On entry, ${\mathbf{idist}}=1$, $2$, $3$ or $4$.
${\mathbf{ifail}}=3$
On entry, all the observations were adjudged to be tied. You are advised to check the value supplied for tol.
${\mathbf{ifail}}=4$
The matrix ${X}^{\mathrm{T}}\left(B-A\right)X$ is either ill-conditioned or not positive definite. This error should only occur with extreme rankings of the data.
${\mathbf{ifail}}=5$
On entry, for $j=⟨\mathit{\text{value}}⟩$, ${\mathbf{x}}\left(i,j\right)=⟨\mathit{\text{value}}⟩$ for all $i$.
Constraint: ${\mathbf{x}}\left(i,j\right)\ne {\mathbf{x}}\left(i+1,j\right)$ for at least one $i$.
${\mathbf{ifail}}=-99$
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

## 7Accuracy

The computations are believed to be stable.

## 8Parallelism and Performance

g08raf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
g08raf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

The time taken by g08raf depends on the number of samples, the total number of observations and the number of parameters fitted.
In extreme cases the parameter estimates for certain models can be infinite, although this is unlikely to occur in practice. See Pettitt (1982) for further details.

## 10Example

A program to fit a regression model to a single sample of $20$ observations using two explanatory variables. The error distribution will be taken to be logistic.

### 10.1Program Text

Program Text (g08rafe.f90)

### 10.2Program Data

Program Data (g08rafe.d)

### 10.3Program Results

Program Results (g08rafe.r)