naginterfaces.library.rand.kfold_xyw¶

naginterfaces.library.rand.kfold_xyw(k, fold, x, statecomm, sordx=1, y=None, w=None)[source]¶

kfold_xyw generates training and validation datasets suitable for use in cross-validation or jack-knifing.

For full information please refer to the NAG Library document for g05pv

https://support.nag.com/numeric/nl/nagdoc_31.1/flhtml/g05/g05pvf.html

Parameters

kint

$K$ , the number of folds.

foldint

The number of the fold to return as the validation dataset.

On the first call to kfold_xyw $f o l d$ should be set to $1$ and then incremented by one at each subsequent call until all $K$ sets of training and validation datasets have been produced.

See Further Comments for more details on how a different calling sequence can be used.

xfloat, ndarray, shape $(:, :)$ , modified in place

Note: the required extent for this argument in dimension 1 is determined as follows: if $s o r d x = 2$ : $m$ ; otherwise: $n$ .

Note: the required extent for this argument in dimension 2 is determined as follows: if $s o r d x = 1$ : $m$ ; if $s o r d x = 2$ : $n$ ; otherwise: $0$ .

The way the data is stored in $x$ is defined by $s o r d x$ .

If $s o r d x = 1$ , $x [i - 1, j - 1]$ contains the $i$ th observation for the $j$ th variable, for $i = 1, 2, \dots, n$ and $j = 1, 2, \dots, m$ .

If $s o r d x = 2$ , $x [j - 1, i - 1]$ contains the $i$ th observation for the $j$ th variable, for $i = 1, 2, \dots, n$ and $j = 1, 2, \dots, m$ .

On entry: if $f o l d = 1$ , $x$ must hold $X_{o}$ , the values of $X$ for the original dataset, otherwise, $x$ must not be changed since the last call to kfold_xyw.

On exit: values of $X$ for the training and validation datasets, with $X_{t}$ held in observations $1$ to $n t$ and $X_{v}$ in observations $n t + 1$ to $n$ .

statecommdict, RNG communication object, modified in place

RNG communication structure.

This argument must have been initialized by a prior call to init_repeat() or init_nonrepeat().

sordxint, optional

Determines how variables are stored in $x$ .

yNone or float, ndarray, shape $(:)$ , optional, modified in place

Note: the required length for this argument is determined as follows: if $y is not N o n e$ : $n$ ; otherwise: $0$ .

If the original dataset does not include $y_{o}$ then $y$ must be set to None.

Optionally, on entry: $y_{o}$ , the values of $y$ for the original dataset. If $f o l d \neq 1$ , $y$ must hold the vector returned in $sy$ by the last call to kfold_xyw.

On exit, if not None on entry: values of $y$ for the training and validation datasets, with $y_{t}$ held in elements $1$ to $n t$ and $y_{v}$ in elements $n t + 1$ to $n$ .

wNone or float, ndarray, shape $(:)$ , optional, modified in place

Note: the required length for this argument is determined as follows: if $w is not N o n e$ : $n$ ; otherwise: $0$ .

Optionally, on entry: if $f o l d \neq 1$ , $w$ must hold the vector returned in $sw$ by the last call to kfold_xyw.

On exit, if not None on entry: values of $w$ for the training and validation datasets, with $w_{t}$ held in elements $1$ to $n t$ and $w_{v}$ in elements $n t + 1$ to $n$ .

Returns

ntint: $n_{t}$ , the number of observations in the training dataset.

Raises

NagValueError

(errno $11$ )

On entry, $k = ⟨ v a l u e ⟩$ and $n = ⟨ v a l u e ⟩$ .

Constraint: $2 \leq k \leq n$ .

(errno $21$ )

On entry, $f o l d = ⟨ v a l u e ⟩$ and $k = ⟨ v a l u e ⟩$ .

Constraint: $1 \leq f o l d \leq k$ .

(errno $31$ )

On entry, $n = ⟨ v a l u e ⟩$ .

Constraint: $n \geq 1$ .

(errno $41$ )

On entry, $m = ⟨ v a l u e ⟩$ .

Constraint: $m \geq 1$ .

(errno $51$ )

On entry, $s o r d x = ⟨ v a l u e ⟩$ .

Constraint: $s o r d x = 1$ or $2$ .

(errno $71$ )

On entry, $ldx = ⟨ v a l u e ⟩$ and $n = ⟨ v a l u e ⟩$ .

Constraint: if $s o r d x = 1$ , $ldx \geq n$ .

(errno $72$ )

On entry, $ldx = ⟨ v a l u e ⟩$ and $m = ⟨ v a l u e ⟩$ .

Constraint: if $s o r d x = 2$ , $ldx \geq m$ .

(errno $131$ )

On entry, $s t a t e c o m m$ [‘state’] vector has been corrupted or not initialized.

Warns

NagAlgorithmicWarning

(errno $61$ ): More than $50 %$ of the data did not move when the data was shuffled. $⟨ v a l u e ⟩$ of the $⟨ v a l u e ⟩$ observations stayed put.

Notes

Let $X_{o}$ denote a matrix of $n$ observations on $m$ variables and $y_{o}$ and $w_{o}$ each denote a vector of length $n$ . For example, $X_{o}$ might represent a matrix of independent variables, $y_{o}$ the dependent variable and $w_{o}$ the associated weights in a weighted regression.

kfold_xyw generates a series of training datasets, denoted by the matrix, vector, vector triplet $(X_{t}, y_{t}, w_{t})$ of $n_{t}$ observations, and validation datasets, denoted $(X_{v}, y_{v}, w_{v})$ with $n_{v}$ observations. These training and validation datasets are generated as follows.

Each of the original $n$ observations is randomly assigned to one of $K$ equally sized groups or folds. For the $k$ th sample the validation dataset consists of those observations in group $k$ and the training dataset consists of all those observations not in group $k$ . Therefore, at most $K$ samples can be generated.

If $n$ is not divisible by $K$ then the observations are assigned to groups as evenly as possible, therefore, any group will be at most one observation larger or smaller than any other group.

When using $K = n$ the resulting datasets are suitable for leave-one-out cross-validation, or the training dataset on its own for jack-knifing. When using $K \neq n$ the resulting datasets are suitable for $K$ -fold cross-validation. Datasets suitable for reversed cross-validation can be obtained by switching the training and validation datasets, i.e., use the $k$ th group as the training dataset and the rest of the data as the validation dataset.

One of the initialization functions init_repeat() (for a repeatable sequence if computed sequentially) or init_nonrepeat() (for a non-repeatable sequence) must be called prior to the first call to kfold_xyw.

NAG and Python

Return to Front

naginterfaces.library.rand.kfold_xyw¶

naginterfaces.library.rand.kfold_​xyw¶

naginterfaces.library.rand.kfold_xyw¶