PDF version (NAG web site
, 64-bit version, 64-bit version)
NAG Toolbox: nag_rand_subsamp_xyw (g05pw)
Purpose
nag_rand_subsamp_xyw (g05pw) generates a dataset suitable for use with repeated random sub-sampling validation.
Syntax
[
state,
sx,
sy,
sw,
errbuf,
ifail] = g05pw(
nt,
x,
state, 'n',
n, 'm',
m, 'sordx',
sordx, 'y',
y, 'w',
w, 'sordsx',
sordsx)
[
state,
sx,
sy,
sw,
errbuf,
ifail] = nag_rand_subsamp_xyw(
nt,
x,
state, 'n',
n, 'm',
m, 'sordx',
sordx, 'y',
y, 'w',
w, 'sordsx',
sordsx)
Description
Let denote a matrix of observations on variables and and each denote a vector of length . For example, might represent a matrix of independent variables, the dependent variable and the associated weights in a weighted regression.
nag_rand_subsamp_xyw (g05pw) generates a series of training datasets, denoted by the matrix, vector, vector triplet of observations, and validation datasets, denoted with observations. These training and validation datasets are generated by randomly assigning each observation to either the training dataset or the validation dataset.
The resulting datasets are suitable for use with repeated random sub-sampling validation.
One of the initialization functions
nag_rand_init_repeat (g05kf) (for a repeatable sequence if computed sequentially) or
nag_rand_init_nonrepeat (g05kg) (for a non-repeatable sequence) must be called prior to the first call to
nag_rand_subsamp_xyw (g05pw).
References
None.
Parameters
Compulsory Input Parameters
- 1:
– int64int32nag_int scalar
-
, the number of observations in the training dataset.
Constraint:
.
- 2:
– double array
-
The first dimension,
, of the array
x must satisfy
- if , ;
- otherwise .
The second dimension of the array
x must be at least
if
and at least
if
.
The way the data is stored in
x is defined by
sordx.
If , contains the th observation for the th variable, for and .
If , contains the th observation for the th variable, for and .
, the values of
for the original dataset. This may be the array returned in
sx by a previous call to
nag_rand_subsamp_xyw (g05pw).
- 3:
– int64int32nag_int array
-
Note: the actual argument supplied
must be the array
state supplied to the initialization routines
nag_rand_init_repeat (g05kf) or
nag_rand_init_nonrepeat (g05kg).
Contains information on the selected base generator and its current state.
Optional Input Parameters
- 1:
– int64int32nag_int scalar
Default:
- if , ;
- otherwise .
, the number of observations.
Constraint:
.
- 2:
– int64int32nag_int scalar
Default:
- if , ;
- otherwise .
, the number of variables.
Constraint:
.
- 3:
– int64int32nag_int scalar
Default:
Determines how variables are stored in
x.
Constraint:
or .
- 4:
– double array
-
Optionally,
, the values of
for the original dataset. This may be the vector returned in
sy by a previous call to
nag_rand_subsamp_xyw (g05pw).
- 5:
– double array
-
Optionally,
, the values of
for the original dataset. This may be the vector returned in
sw by a previous call to
nag_rand_subsamp_xyw (g05pw).
- 6:
– int64int32nag_int scalar
Default:
Determines how variables are stored in
sx.
Constraint:
or .
Output Parameters
- 1:
– int64int32nag_int array
-
Contains updated information on the state of the generator.
- 2:
– double array
-
The first dimension,
, of the array
sx will be
- if , ;
- if , .
The second dimension of the array
sx will be
if
and
otherwise.
The way the data is stored in
sx is defined by
sordsx.
If , contains the th observation for the th variable, for and .
If , contains the th observation for the th variable, for and .
sx holds the values of
for the training and validation datasets, with
held in observations
to
and
in observations
to
.
- 3:
– double array
-
If
y is supplied then
sy holds the values of
for the training and validation datasets, with
held in elements
to
and
in elements
to
.
- 4:
– double array
-
If
w is supplied then
sw holds the values of
for the training and validation datasets, with
held in elements
to
and
in elements
to
.
- 5:
– string (length at least 200) (length ≥ 200)
-
- 6:
– int64int32nag_int scalar
unless the function detects an error (see
Error Indicators and Warnings).
Error Indicators and Warnings
Errors or warnings detected by the function:
-
-
Constraint: .
-
-
Constraint: .
-
-
Constraint: .
-
-
Constraint: or .
-
-
Constraint: if , .
-
-
Constraint: if , .
-
-
On entry,
state vector has been corrupted or not initialized.
-
-
Constraint: or .
-
An unexpected error has been triggered by this routine. Please
contact
NAG.
-
Your licence key may have expired or may not have been installed correctly.
-
Dynamic memory allocation failed.
Accuracy
Not applicable.
Further Comments
nag_rand_subsamp_xyw (g05pw) will be computationality more efficient if each observation in
x are contiguous, that is
and
.
Example
This example uses nag_rand_subsamp_xyw (g05pw) to facilitate repeated random sub-sampling cross-validation.
A set of simulated data is randomly split into a training and validation datasets.
nag_correg_glm_binomial (g02gb) is used to fit a logistic regression model to each training dataset and then
nag_correg_glm_predict (g02gp) is used to predict the response for the observations in the validation dataset. This process is repeated
times.
The counts of true and false positives and negatives along with the sensitivity and specificity is then reported.
Open in the MATLAB editor:
g05pw_example
function g05pw_example
fprintf('g05pw example results\n\n');
link = 'G';
mean = 'M';
errfn = 'B';
vfobs = false;
x = [ 0.0 -0.1 0.0 1.0; 0.4 -1.1 1.0 1.0; -0.5 0.2 1.0 0.0;
0.6 1.1 1.0 0.0; -0.3 -1.0 1.0 1.0; 2.8 -1.8 0.0 1.0;
0.4 -0.7 0.0 1.0; -0.4 -0.3 1.0 0.0; 0.5 -2.6 0.0 0.0;
-1.6 -0.3 1.0 1.0; 0.4 0.6 1.0 0.0; -1.6 0.0 1.0 1.0;
0.0 0.4 1.0 1.0; -0.1 0.7 1.0 1.0; -0.2 1.8 1.0 1.0;
-0.9 0.7 1.0 1.0; -1.1 -0.5 1.0 1.0; -0.1 -2.2 1.0 1.0;
-1.8 -0.5 1.0 1.0; -0.8 -0.9 0.0 1.0; 1.9 -0.1 1.0 1.0;
0.3 1.4 1.0 1.0; 0.4 -1.2 1.0 0.0; 2.2 1.8 1.0 0.0;
1.4 -0.4 0.0 1.0; 0.4 2.4 1.0 1.0; -0.6 1.1 1.0 1.0;
1.4 -0.6 1.0 1.0; -0.1 -0.1 0.0 0.0; -0.6 -0.4 0.0 0.0;
0.6 -0.2 1.0 1.0; -1.8 -0.3 1.0 1.0; -0.3 1.6 1.0 1.0;
-0.6 0.8 0.0 1.0; 0.3 -0.5 0.0 0.0; 1.6 1.4 1.0 1.0;
-1.1 0.6 1.0 1.0; -0.3 0.6 1.0 1.0; -0.6 0.1 1.0 1.0;
1.0 0.6 1.0 1.0];
y = [0;1;0;0;0;0;1;1;1;0;0;1;1;0;0;0;0;1;1;1;
1;0;1;1;1;0;0;1;0;0;1;1;0;0;1;0;0;0;0;1];
t = ones(size(x,1));
isx = int64(ones(size(x,2),1));
ip = int64(sum(isx) + (upper(mean(1:1)) == 'M'));
seed = int64(42321);
genid = int64(6);
subid = int64(0);
[state,ifail] = g05kf(genid,subid,seed);
nsamp = int64(10);
nt = int64(32);
warn_state = nag_issue_warnings();
nag_issue_warnings(true);
tn = 0;
fn = 0;
fp = 0;
tp = 0;
for i = 1:nsamp
[state,x,y,t,ifail] = g05pw( ...
nt,x,state,'y',y,'w',t);
if (ifail~=0)
break
end
[~,~,b,~,~,cov,~,ifail] = g02gb( ...
link,mean,x,isx,ip,y,t,'n',nt);
if (ifail~=0 & ifail < 6)
break
end
[~,~,pred,~,ifail] = g02gp( ...
errfn,link,mean,x(nt+1:end,:),isx,b,cov, ...
vfobs, 't',t(nt+1:end));
if (ifail~=0)
break
end
obs_val = ceil(y(nt+1:end) + 0.5);
pred_val = (pred >= 0.5) + 1;
count = zeros(2,2);
for i = 1:size(pred_val,1)
count(pred_val(i),obs_val(i)) = count(pred_val(i),obs_val(i)) + 1;
end
tn = tn + count(1,1);
fn = fn + count(1,2);
fp = fp + count(2,1);
tp = tp + count(2,2);
end
nag_issue_warnings(warn_state);
np = tp + fn;
nn = fp + tn;
fprintf(' Observed\n');
fprintf(' --------------------------\n');
fprintf(' Predicted | Negative Positive Total\n');
fprintf(' --------------------------------------\n');
fprintf(' Negative | %5d %5d %5d\n', tn, fn, tn + fn);
fprintf(' Positive | %5d %5d %5d\n', fp, tp, fp + tp);
fprintf(' Total | %5d %5d %5d\n', nn, np, nn + np);
fprintf('\n');
if (np~=0)
fprintf(' True Positive Rate (Sensitivity): %4.2f\n', tp / np);
else
fprintf(' True Positive Rate (Sensitivity): No positives in data\n');
end
if (nn~=0)
fprintf(' True Negative Rate (Specificity): %4.2f\n', tn / nn);
else
fprintf(' True Negative Rate (Specificity): No negatives in data\n');
end
g05pw example results
Observed
--------------------------
Predicted | Negative Positive Total
--------------------------------------
Negative | 38 20 58
Positive | 8 14 22
Total | 46 34 80
True Positive Rate (Sensitivity): 0.41
True Negative Rate (Specificity): 0.83
PDF version (NAG web site
, 64-bit version, 64-bit version)
© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015