Integer, Intent (In)	::	n, mx, ldx, isx(mx), ip, my, ldy, iscale, maxfac, maxit, ldxres, ldyres, ldw, ldp, ldt, ldc, ldu, ldycv
Integer, Intent (Inout)	::	ifail
Real (Kind=nag_wp), Intent (In)	::	x(ldx,mx), y(ldy,my), tau
Real (Kind=nag_wp), Intent (Inout)	::	xstd(ip), ystd(my), xres(ldxres,ip), yres(ldyres,my), w(ldw,maxfac), p(ldp,maxfac), t(ldt,maxfac), c(ldc,maxfac), u(ldu,maxfac), ycv(ldycv,my)
Real (Kind=nag_wp), Intent (Out)	::	xbar(ip), ybar(my), xcv(maxfac)

C Header Interface

#include <nag.h>

void

g02lbf_ (const Integer *n, const Integer *mx, const double x[], const Integer *ldx, const Integer isx[], const Integer *ip, const Integer *my, const double y[], const Integer *ldy, double xbar[], double ybar[], const Integer *iscale, double xstd[], double ystd[], const Integer *maxfac, const Integer *maxit, const double *tau, double xres[], const Integer *ldxres, double yres[], const Integer *ldyres, double w[], const Integer *ldw, double p[], const Integer *ldp, double t[], const Integer *ldt, double c[], const Integer *ldc, double u[], const Integer *ldu, double xcv[], double ycv[], const Integer *ldycv, Integer *ifail)

The routine may be called by the names g02lbf or nagf_correg_pls_wold.

3 Description

Let

X_{1}

be the mean-centred

n

m

data matrix

X

n

observations on

m

predictor variables. Let

Y_{1}

be the mean-centred

n

r

data matrix

Y

n

observations on

r

response variables.

The first of the

k

factors PLS methods extract from the data predicts both

X_{1}

and

Y_{1}

by regressing on a

t_{1}

column vector of

n

scores:

\begin{matrix} {\hat{X}}_{1} = t_{1} p_{1}^{T} \\ {\hat{Y}}_{1} = t_{1} c_{1}^{T}, & with ​ t_{1}^{T} t_{1} = 1, \end{matrix}

where the column vectors of

m

x

-loadings

p_{1}

and

r

y

-loadings

c_{1}

are calculated in the least squares sense:

\begin{matrix} p_{1}^{T} = t_{1}^{T} X_{1} \\ c_{1}^{T} = t_{1}^{T} Y_{1} . \end{matrix}

The

x

-score vector

t_{1} = X_{1} w_{1}

is the linear combination of predictor data

X_{1}

that has maximum covariance with the

y

-scores

u_{1} = Y_{1} c_{1}

, where the

x

-weights vector

w_{1}

is the normalised first left singular vector of

X_{1}^{T} Y_{1}

The method extracts subsequent PLS factors by repeating the above process with the residual matrices:

\begin{matrix} X_{i} = X_{i - 1} - {\hat{X}}_{i - 1} \\ Y_{i} = Y_{i - 1} - {\hat{Y}}_{i - 1}, i = 2, 3, \dots, k, \end{matrix}

and with orthogonal scores:

t_{i}^{T} t_{j} = 0, j = 1, 2, \dots, i - 1 .

Optionally, in addition to being mean-centred, the data matrices

X_{1}

and

Y_{1}

may be scaled by standard deviations of the variables. If data are supplied mean-centred, the calculations are not affected within numerical accuracy.

4 References

Wold H (1966) Estimation of principal components and related models by iterative least squares In: Multivariate Analysis (ed P R Krishnaiah) 391–420 Academic Press NY

5 Arguments

1: $n$ – Integer Input

On entry:

n

, the number of observations.

Constraint:

n > 1

2: $mx$ – Integer Input

On entry: the number of predictor variables.

Constraint:

mx > 1

3: $x (ldx, mx)$ – Real (Kind=nag_wp) array Input

On entry:

x (i, j)

must contain the

i

th observation on the

j

th predictor variable, for

i = 1, 2, \dots, n

and

j = 1, 2, \dots, mx

4: $ldx$ – Integer Input

On entry: the first dimension of the array x as declared in the (sub)program from which g02lbf is called.

Constraint:

ldx \geq n

5: $isx (mx)$ – Integer array Input

On entry: indicates which predictor variables are to be included in the model.

$isx (j) = 1$: The $j$ th predictor variable (with variates in the $j$ th column of $X$ ) is included in the model.
$isx (j) = 0$: Otherwise.

Constraint: the sum of elements in isx must equal ip.

6: $ip$ – Integer Input

On entry:

m

, the number of predictor variables in the model.

Constraint:

1 < ip \leq mx

7: $my$ – Integer Input

On entry:

r

, the number of response variables.

Constraint:

my \geq 1

8: $y (ldy, my)$ – Real (Kind=nag_wp) array Input

On entry:

y (i, j)

must contain the

i

th observation for the

j

th response variable, for

i = 1, 2, \dots, n

and

j = 1, 2, \dots, my

9: $ldy$ – Integer Input

On entry: the first dimension of the array y as declared in the (sub)program from which g02lbf is called.

Constraint:

ldy \geq n

10: $xbar (ip)$ – Real (Kind=nag_wp) array Output

On exit: mean values of predictor variables in the model.

11: $ybar (my)$ – Real (Kind=nag_wp) array Output

On exit: the mean value of each response variable.

12: $iscale$ – Integer Input

On entry: indicates how predictor variables are scaled.

$iscale = 1$: Data are scaled by the standard deviation of variables.
$iscale = 2$: Data are scaled by user-supplied scalings.
$iscale = - 1$: No scaling.

Constraint:

iscale = - 1

1

2

13: $xstd (ip)$ – Real (Kind=nag_wp) array Input/Output

On entry: if

iscale = 2

xstd (j)

must contain the user-supplied scaling for the

j

th predictor variable in the model, for

j = 1, 2, \dots, ip

. Otherwise xstd need not be set.

On exit: if

iscale = 1

, standard deviations of predictor variables in the model. Otherwise xstd is not changed.

14: $ystd (my)$ – Real (Kind=nag_wp) array Input/Output

On entry: if

iscale = 2

ystd (j)

must contain the user-supplied scaling for the

j

th response variable in the model, for

j = 1, 2, \dots, my

. Otherwise ystd need not be set.

On exit: if

iscale = 1

, the standard deviation of each response variable. Otherwise ystd is not changed.

15: $maxfac$ – Integer Input

On entry:

k

, the number of latent variables to calculate.

Constraint:

1 \leq maxfac \leq ip

16: $maxit$ – Integer Input

On entry: if

my = 1

, maxit is not referenced; otherwise the maximum number of iterations used to calculate the

x

-weights.

Suggested value:

maxit = 200

Constraint: if

my > 1

maxit > 1

17: $tau$ – Real (Kind=nag_wp) Input

On entry: if

my = 1

, tau is not referenced; otherwise the iterative procedure used to calculate the

x

-weights will halt if the Euclidean distance between two subsequent estimates is less than or equal to tau.

Suggested value:

tau = 1.0E−4

Constraint: if

my > 1

tau > 0.0

18: $xres (ldxres, ip)$ – Real (Kind=nag_wp) array Output

On exit: the predictor variables' residual matrix

X_{k}

19: $ldxres$ – Integer Input

On entry: the first dimension of the array xres as declared in the (sub)program from which g02lbf is called.

Constraint:

ldxres \geq n

20: $yres (ldyres, my)$ – Real (Kind=nag_wp) array Output

On exit: the residuals for each response variable,

Y_{k}

21: $ldyres$ – Integer Input

On entry: the first dimension of the array yres as declared in the (sub)program from which g02lbf is called.

Constraint:

ldyres \geq n

22: $w (ldw, maxfac)$ – Real (Kind=nag_wp) array Output

On exit: the

j

th column of

W

contains the

x

-weights

w_{j}

, for

j = 1, 2, \dots, maxfac

23: $ldw$ – Integer Input

On entry: the first dimension of the array w as declared in the (sub)program from which g02lbf is called.

Constraint:

ldw \geq ip

24: $p (ldp, maxfac)$ – Real (Kind=nag_wp) array Output

On exit: the

j

th column of

P

contains the

x

-loadings

p_{j}

, for

j = 1, 2, \dots, maxfac

25: $ldp$ – Integer Input

On entry: the first dimension of the array p as declared in the (sub)program from which g02lbf is called.

Constraint:

ldp \geq ip

26: $t (ldt, maxfac)$ – Real (Kind=nag_wp) array Output

On exit: the

j

th column of

T

contains the

x

-scores

t_{j}

, for

j = 1, 2, \dots, maxfac

27: $ldt$ – Integer Input

On entry: the first dimension of the array t as declared in the (sub)program from which g02lbf is called.

Constraint:

ldt \geq n

28: $c (ldc, maxfac)$ – Real (Kind=nag_wp) array Output

On exit: the

j

th column of

C

contains the

y

-loadings

c_{j}

, for

j = 1, 2, \dots, maxfac

29: $ldc$ – Integer Input

On entry: the first dimension of the array c as declared in the (sub)program from which g02lbf is called.

Constraint:

ldc \geq my

30: $u (ldu, maxfac)$ – Real (Kind=nag_wp) array Output

On exit: the

j

th column of

U

contains the

y

-scores

u_{j}

, for

j = 1, 2, \dots, maxfac

31: $ldu$ – Integer Input

On entry: the first dimension of the array u as declared in the (sub)program from which g02lbf is called.

Constraint:

ldu \geq n

32: $xcv (maxfac)$ – Real (Kind=nag_wp) array Output

On exit:

xcv (j)

contains the cumulative percentage of variance in the predictor variables explained by the first

j

factors, for

j = 1, 2, \dots, maxfac

33: $ycv (ldycv, my)$ – Real (Kind=nag_wp) array Output

On exit:

ycv (i, j)

is the cumulative percentage of variance of the

j

th response variable explained by the first

i

factors, for

i = 1, 2, \dots, maxfac

and

j = 1, 2, \dots, my

34: $ldycv$ – Integer Input

On entry: the first dimension of the array ycv as declared in the (sub)program from which g02lbf is called.

Constraint:

ldycv \geq maxfac

35: $ifail$ – Integer Input/Output

On entry: ifail must be set to

0

- 1

1

to set behaviour on detection of an error; these values have no effect when no error is detected.

A value of

0

causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of

- 1

means that an error message is printed while a value of

1

means that it is not.

If halting is not appropriate, the value

- 1

1

is recommended. If message printing is undesirable, then the value

1

is recommended. Otherwise, the value

0

is recommended. When the value $- 1$ or $1$ is used it is essential to test the value of ifail on exit.

On exit:

ifail = 0

unless the routine detects an error or a warning has been flagged (see Section 6).

6 Error Indicators and Warnings

If on entry

ifail = 0

- 1

, explanatory error messages are output on the current error message unit (as defined by x04aaf).

Errors or warnings detected by the routine:

$ifail = 1$: On entry, $iscale = 〈value〉$ .
Constraint: $iscale = - 1$ or $1$ .

On entry, $isx (〈value〉)$ is invalid.
Constraint: $isx (j) = 0$ or $1$ , for all $j$ .

On entry, $mx = 〈value〉$ .
Constraint: $mx > 1$ .

On entry, $my = 〈value〉$ .
Constraint: $my \geq 1$ .

On entry, $n = 〈value〉$ .
Constraint: $n > 1$ .

$ifail = 2$: On entry, $ip = 〈value〉$ and $mx = 〈value〉$ .
Constraint: $1 < ip \leq mx$ .

On entry, $ldc = 〈value〉$ and $my = 〈value〉$ .
Constraint: $ldc \geq my$ .

On entry, $ldp = 〈value〉$ and $ip = 〈value〉$ .
Constraint: $ldp \geq ip$ .

On entry, $ldt = 〈value〉$ and $n = 〈value〉$ .
Constraint: $ldt \geq n$ .

On entry, $ldu = 〈value〉$ and $n = 〈value〉$ .
Constraint: $ldu \geq n$ .

On entry, $ldw = 〈value〉$ and $ip = 〈value〉$ .
Constraint: $ldw \geq ip$ .

On entry, $ldx = 〈value〉$ and $n = 〈value〉$ .
Constraint: $ldx \geq n$ .

On entry, $ldxres = 〈value〉$ and $n = 〈value〉$ .
Constraint: $ldxres \geq n$ .

On entry, $ldy = 〈value〉$ and $n = 〈value〉$ .
Constraint: $ldy \geq n$ .

On entry, $ldycv = 〈value〉$ and $maxfac = 〈value〉$ .
Constraint: $ldycv \geq maxfac$ .

On entry, $ldyres = 〈value〉$ and $n = 〈value〉$ .
Constraint: $ldyres < n$ .

On entry, $maxfac = 〈value〉$ and $ip = 〈value〉$ .
Constraint: $1 \leq maxfac \leq ip$ .

On entry, $my = 〈value〉$ and $maxit = 〈value〉$ .
Constraint: if $my > 1$ , $maxit > 1$ .

On entry, $tau = 〈value〉$ .
Constraint: if $my > 1$ , $tau > 0.0$ .

$ifail = 3$: On entry, $ip = 〈value〉$ and $sum (isx) = 〈value〉$ .
Constraint: the sum of elements in isx must equal ip.

$ifail = - 99$: An unexpected error has been triggered by this routine. Please contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.

$ifail = - 399$: Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.

$ifail = - 999$: Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

7 Accuracy

In general, the iterative method used in the calculations is less accurate (but faster) than the singular value decomposition approach adopted by g02laf.

8 Parallelism and Performance

g02lbf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.

Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9 Further Comments

g02lbf allocates internally (

n + r

) elements of real storage.

10 Example

This example reads in data from an experiment to measure the biological activity in a chemical compound, and a PLS model is estimated.

g02lb: FL CL CPP AD

NAG FL Interfaceg02lbf (pls_​wold)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG FL Interface
g02lbf (pls_wold)