PDF version (NAG web site
, 64bit version, 64bit version)
NAG Toolbox: nag_best_subset_given_size_revcomm (h05aa)
Purpose
Given a set of $m$ features and a scoring mechanism for any subset of those features, nag_best_subset_given_size_revcomm (h05aa) selects the best $n$ subsets of size $p$ using a reverse communication branch and bound algorithm.
Syntax
[
irevcm,
drop,
lz,
z,
la,
a,
bscore,
bz,
icomm,
rcomm,
ifail] = h05aa(
irevcm,
mincr,
m,
ip,
drop,
lz,
z,
la,
a,
bscore,
bz,
mincnt,
gamma,
acc,
icomm,
rcomm, 'nbest',
nbest, 'licomm',
licomm, 'lrcomm',
lrcomm)
[
irevcm,
drop,
lz,
z,
la,
a,
bscore,
bz,
icomm,
rcomm,
ifail] = nag_best_subset_given_size_revcomm(
irevcm,
mincr,
m,
ip,
drop,
lz,
z,
la,
a,
bscore,
bz,
mincnt,
gamma,
acc,
icomm,
rcomm, 'nbest',
nbest, 'licomm',
licomm, 'lrcomm',
lrcomm)
Description
Given $\Omega =\left\{{x}_{\mathrm{i}}:i\in \mathbb{Z},1\le i\le m\right\}$, a set of $m$ unique features and a scoring mechanism $f\left(S\right)$ defined for all $S\subseteq \Omega $ then nag_best_subset_given_size_revcomm (h05aa) is designed to find ${S}_{o1}\subseteq \Omega ,\left{S}_{o1}\right=p$, an optimal subset of size $p$. Here $\left{S}_{o1}\right$ denotes the cardinality of ${S}_{o1}$, the number of elements in the set.
The definition of the optimal subset depends on the properties of the scoring mechanism, if
then the optimal subset is defined as one of the solutions to
else if
then the optimal subset is defined as one of the solutions to
If neither of these properties hold then nag_best_subset_given_size_revcomm (h05aa) cannot be used.
As well as returning the optimal subset,
${S}_{o1}$,
nag_best_subset_given_size_revcomm (h05aa) can return the best
$n$ solutions of size
$p$. If
${S}_{o\mathit{i}}$ denotes the
$\mathit{i}$th best subset, for
$\mathit{i}=1,2,\dots ,n1$, then the
$\left(i+1\right)$th best subset is defined as the solution to either
or
depending on the properties of
$f$.
The solutions are found using a branch and bound method, where each node of the tree is a subset of
$\Omega $. Assuming that
(1) holds then a particular node, defined by subset
${S}_{i}$, can be trimmed from the tree if
$f\left({S}_{i}\right)<\hat{f}\left({S}_{on}\right)$ where
$\hat{f}\left({S}_{on}\right)$ is the
$n$th highest score we have observed so far for a subset of size
$p$, i.e., our current best guess of the score for the
$n$th best subset. In addition, because of
(1) we can also drop all nodes defined by any subset
${S}_{j}$ where
${S}_{j}\subseteq {S}_{i}$, thus avoiding the need to enumerate the whole tree. Similar short cuts can be taken if
(2) holds. A full description of this branch and bound algorithm can be found in
Ridout (1988).
Rather than calculate the score at a given node of the tree
nag_best_subset_given_size_revcomm (h05aa) utilizes the fast branch and bound algorithm of
Somol et al. (2004), and attempts to estimate the score where possible. For each feature,
${x}_{i}$, two values are stored, a count
${c}_{i}$ and
${\hat{\mu}}_{i}$, an estimate of the contribution of that feature. An initial value of zero is used for both
${c}_{i}$ and
${\hat{\mu}}_{i}$. At any stage of the algorithm where both
$f\left(S\right)$ and
$f\left(S\left\{{x}_{i}\right\}\right)$ have been calculated (as opposed to estimated), the estimated contribution of the feature
${x}_{i}$ is updated to
and
${c}_{i}$ is incremented by
$1$, therefore at each stage
${\hat{\mu}}_{i}$ is the mean contribution of
${x}_{i}$ observed so far and
${c}_{i}$ is the number of observations used to calculate that mean.
As long as ${c}_{i}\ge k$, for the usersupplied constant $k$, then rather than calculating $f\left(S\left\{{x}_{\mathrm{i}}\right\}\right)$ this function estimates it using $\hat{f}\left(S\left\{{x}_{\mathrm{i}}\right\}\right)=f\left(S\right)\gamma {\hat{\mu}}_{i}$ or $\hat{f}\left(S\right)\gamma {\hat{\mu}}_{i}$ if $f\left(S\right)$ has been estimated, where $\gamma $ is a usersupplied scaling factor. An estimated score is never used to trim a node or returned as the optimal score.
Setting
$k=0$ in this function will cause the algorithm to always calculate the scores, returning to the branch and bound algorithm of
Ridout (1988). In most cases it is preferable to use the fast branch and bound algorithm, by setting
$k>0$, unless the score function is iterative in nature, i.e.,
$f\left(S\right)$ must have been calculated before
$f\left(S\left\{{x}_{i}\right\}\right)$ can be calculated.
References
Narendra P M and Fukunaga K (1977) A branch and bound algorithm for feature subset selection IEEE Transactions on Computers 9 917–922
Ridout M S (1988) Algorithm AS 233: An improved branch and bound algorithm for feature subset selection Journal of the Royal Statistics Society, Series C (Applied Statistics) (Volume 37) 1 139–147
Somol P, Pudil P and Kittler J (2004) Fast branch and bound algorithms for optimal feature selection IEEE Transactions on Pattern Analysis and Machine Intelligence (Volume 26) 7 900–912
Parameters
Note: this function uses
reverse communication. Its use involves an initial entry, intermediate exits and reentries, and a final exit, as indicated by the argument
irevcm. Between intermediate exits and reentries,
all arguments other than bscore must remain unchanged.
Compulsory Input Parameters
 1:
$\mathrm{irevcm}$ – int64int32nag_int scalar

On initial entry: must be set to $0$.
On intermediate reentry:
irevcm must remain unchanged.
Constraint:
${\mathbf{irevcm}}=0$ or $1$.
 2:
$\mathrm{mincr}$ – int64int32nag_int scalar

Flag indicating whether the scoring function
$f$ is increasing or decreasing.
 ${\mathbf{mincr}}=1$
 $f\left({S}_{i}\right)\le f\left({S}_{j}\right)$, i.e., the subsets with the largest score will be selected.
 ${\mathbf{mincr}}=0$
 $f\left({S}_{i}\right)\ge f\left({S}_{j}\right)$, i.e., the subsets with the smallest score will be selected.
For all
${S}_{j}\subseteq \Omega $ and
${S}_{i}\subseteq {S}_{j}$
Constraint:
${\mathbf{mincr}}=0$ or $1$.
 3:
$\mathrm{m}$ – int64int32nag_int scalar

$m$, the number of features in the full feature set.
Constraint:
${\mathbf{m}}\ge 2$.
 4:
$\mathrm{ip}$ – int64int32nag_int scalar

$p$, the number of features in the subset of interest.
Constraint:
$1\le {\mathbf{ip}}\le {\mathbf{m}}$.
 5:
$\mathrm{drop}$ – int64int32nag_int scalar

On initial entry:
drop need not be set.
On intermediate reentry:
drop must remain unchanged.
 6:
$\mathrm{lz}$ – int64int32nag_int scalar

On initial entry:
lz need not be set.
On intermediate reentry:
lz must remain unchanged.
 7:
$\mathrm{z}\left({\mathbf{m}}{\mathbf{ip}}\right)$ – int64int32nag_int array

On initial entry:
z need not be set.
On intermediate reentry:
z must remain unchanged.
 8:
$\mathrm{la}$ – int64int32nag_int scalar

On initial entry:
la need not be set.
On intermediate reentry:
la must remain unchanged.
 9:
$\mathrm{a}\left(\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{nbest}},{\mathbf{m}}\right)\right)$ – int64int32nag_int array

On initial entry:
a need not be set.
On intermediate reentry:
a must remain unchanged.
 10:
$\mathrm{bscore}\left(\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{nbest}},{\mathbf{m}}\right)\right)$ – double array

On initial entry:
bscore need not be set.
On intermediate reentry:
${\mathbf{bscore}}\left(j\right)$ must hold the score for the
$j$th subset as described in
irevcm.
 11:
$\mathrm{bz}\left({\mathbf{m}}{\mathbf{ip}},{\mathbf{nbest}}\right)$ – int64int32nag_int array

On initial entry:
bz need not be set.
On intermediate reentry:
bz must remain unchanged.
 12:
$\mathrm{mincnt}$ – int64int32nag_int scalar

$k$, the minimum number of times the effect of each feature,
${x}_{i}$, must have been observed before
$f\left(S\left\{{x}_{i}\right\}\right)$ is estimated from
$f\left(S\right)$ as opposed to being calculated directly.
If $k=0$ then $f\left(S\left\{{x}_{i}\right\}\right)$ is never estimated. If ${\mathbf{mincnt}}<0$ then $k$ is set to $1$.
 13:
$\mathrm{gamma}$ – double scalar

$\gamma $, the scaling factor used when estimating scores. If ${\mathbf{gamma}}<0$ then $\gamma =1$ is used.
 14:
$\mathrm{acc}\left(2\right)$ – double array

A measure of the accuracy of the scoring function,
$f$.
Letting
${a}_{i}={\epsilon}_{1}\leftf\left({S}_{i}\right)\right+{\epsilon}_{2}$, then when confirming whether the scoring function is strictly increasing or decreasing (as described in
mincr), or when assessing whether a node defined by subset
${S}_{i}$ can be trimmed, then any values in the range
$f\left({S}_{i}\right)\pm {a}_{i}$ are treated as being numerically equivalent.
If $0\le {\mathbf{acc}}\left(1\right)\le 1$ then ${\epsilon}_{1}={\mathbf{acc}}\left(1\right)$, otherwise ${\epsilon}_{1}=0$.
If ${\mathbf{acc}}\left(2\right)\ge 0$ then ${\epsilon}_{2}={\mathbf{acc}}\left(2\right)$, otherwise ${\epsilon}_{2}=0$.
In most situations setting both ${\epsilon}_{1}$ and ${\epsilon}_{2}$ to zero should be sufficient. Using a nonzero value, when one is not required, can significantly increase the number of subsets that need to be evaluated.
 15:
$\mathrm{icomm}\left({\mathbf{licomm}}\right)$ – int64int32nag_int array

On initial entry:
icomm need not be set.
On intermediate reentry:
icomm must remain unchanged.
 16:
$\mathrm{rcomm}\left({\mathbf{lrcomm}}\right)$ – double array

On initial entry:
rcomm need not be set.
On intermediate reentry:
rcomm must remain unchanged.
Optional Input Parameters
 1:
$\mathrm{nbest}$ – int64int32nag_int scalar

Default:
$1$
$n$, the maximum number of best subsets required. The actual number of subsets returned is given by
la on final exit. If on final exit
${\mathbf{la}}\ne {\mathbf{nbest}}$ then
${\mathbf{ifail}}={\mathbf{53}}$ is returned.
Constraint:
${\mathbf{nbest}}\ge 1$.
 2:
$\mathrm{licomm}$ – int64int32nag_int scalar

Default:
the dimension of the array
icomm.
The length of the array
icomm. If
licomm is too small and
${\mathbf{licomm}}\ge 2$ then
${\mathbf{ifail}}={\mathbf{172}}$ is returned and the minimum value for
licomm and
lrcomm are given by
${\mathbf{icomm}}\left(1\right)$ and
${\mathbf{icomm}}\left(2\right)$ respectively.
Constraints:
 if ${\mathbf{mincnt}}=0$, ${\mathbf{licomm}}\ge 2\times \mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{nbest}},{\mathbf{m}}\right)+{\mathbf{m}}\left({\mathbf{m}}+2\right)+\left({\mathbf{m}}+1\right)\times \mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{m}}{\mathbf{ip}},1\right)+27$;
 otherwise ${\mathbf{licomm}}\ge 2\times \mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{nbest}},{\mathbf{m}}\right)+{\mathbf{m}}\left({\mathbf{m}}+3\right)+\left(2{\mathbf{m}}+1\right)\times \mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{m}}{\mathbf{ip}},1\right)+25$.
 3:
$\mathrm{lrcomm}$ – int64int32nag_int scalar

Default:
the dimension of the array
rcomm.
The length of the array
rcomm. If
lrcomm is too small and
${\mathbf{licomm}}\ge 2$ then
${\mathbf{ifail}}={\mathbf{172}}$ is returned and the minimum value for
licomm and
lrcomm are given by
${\mathbf{icomm}}\left(1\right)$ and
${\mathbf{icomm}}\left(2\right)$ respectively.
Constraints:
 if ${\mathbf{mincnt}}=0$, ${\mathbf{lrcomm}}\ge 9+{\mathbf{nbest}}+{\mathbf{m}}\times \mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{m}}{\mathbf{ip}},1\right)$;
 otherwise ${\mathbf{lrcomm}}\ge 8+{\mathbf{m}}+{\mathbf{nbest}}+{\mathbf{m}}\times \mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{m}}{\mathbf{ip}},1\right)$.
Output Parameters
 1:
$\mathrm{irevcm}$ – int64int32nag_int scalar

On intermediate exit:
${\mathbf{irevcm}}=1$ and before reentry the scores associated with
la subsets must be calculated and returned in
bscore.
The
la subsets are constructed as follows:
 ${\mathbf{drop}}=1$
 The $j$th subset is constructed by dropping the features specified in the first lz elements of z and the single feature given in ${\mathbf{a}}\left(j\right)$ from the full set of features, $\Omega $. The subset will therefore contain ${\mathbf{m}}{\mathbf{lz}}1$ features.
 ${\mathbf{drop}}=0$
 The $j$th subset is constructed by adding the features specified in the first lz elements of z and the single feature specified in ${\mathbf{a}}\left(j\right)$ to the empty set, $\varnothing $. The subset will therefore contain ${\mathbf{lz}}+1$ features.
In both cases the individual features are referenced by the integers
$1$ to
m with
$1$ indicating the first feature,
$2$ the second, etc., for some arbitrary ordering of the features. The same ordering must be used in all calls to
nag_best_subset_given_size_revcomm (h05aa).
If
${\mathbf{la}}=0$, the score for a single subset should be returned. This subset is constructed by adding or removing only those features specified in the first
lz elements of
z.
If ${\mathbf{lz}}=0$, this subset will either be $\Omega $ or $\varnothing $.
The score associated with the $j$th subset must be returned in ${\mathbf{bscore}}\left(j\right)$.
On final exit: ${\mathbf{irevcm}}=0$, and the algorithm has terminated.
 2:
$\mathrm{drop}$ – int64int32nag_int scalar

On intermediate exit:
flag indicating whether the intermediate subsets should be constructed by dropping features from the full set (
${\mathbf{drop}}=1$) or adding features to the empty set (
${\mathbf{drop}}=0$). See
irevcm for details.
On final exit:
drop is undefined.
 3:
$\mathrm{lz}$ – int64int32nag_int scalar

On intermediate exit:
the number of features stored in
z.
On final exit:
lz is undefined.
 4:
$\mathrm{z}\left({\mathbf{m}}{\mathbf{ip}}\right)$ – int64int32nag_int array

On intermediate exit:
${\mathbf{z}}\left(\mathit{i}\right)$, for
$\mathit{i}=1,2,\dots ,{\mathbf{lz}}$, contains the list of features which, along with those specified in
a, define the subsets whose score is required. See
irevcm for additional details.
On final exit:
z is undefined.
 5:
$\mathrm{la}$ – int64int32nag_int scalar

On intermediate exit:
if
${\mathbf{la}}>0$, the number of subsets for which a score must be returned.
If
${\mathbf{la}}=0$, the score for a single subset should be returned. See
irevcm for additional details.
On final exit: the number of best subsets returned.
 6:
$\mathrm{a}\left(\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{nbest}},{\mathbf{m}}\right)\right)$ – int64int32nag_int array

On intermediate exit:
${\mathbf{a}}\left(\mathit{j}\right)$, for
$\mathit{j}=1,2,\dots ,{\mathbf{la}}$, contains the list of features which, along with those specified in
z, define the subsets whose score is required. See
irevcm for additional details.
On final exit:
a is undefined.
 7:
$\mathrm{bscore}\left(\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{nbest}},{\mathbf{m}}\right)\right)$ – double array

On intermediate exit:
bscore is undefined.
On final exit: holds the score for the
la best subsets returned in
bz.
 8:
$\mathrm{bz}\left({\mathbf{m}}{\mathbf{ip}},{\mathbf{nbest}}\right)$ – int64int32nag_int array

On intermediate exit:
bz is used for storage between calls to
nag_best_subset_given_size_revcomm (h05aa).
On final exit: the $j$th best subset is constructed by dropping the features specified in
${\mathbf{bz}}\left(\mathit{i},\mathit{j}\right)$, for $\mathit{i}=1,2,\dots ,{\mathbf{m}}{\mathbf{ip}}$ and $\mathit{j}=1,2,\dots ,{\mathbf{la}}$, from the set of all features, $\Omega $. The score for the $j$th best subset is given in ${\mathbf{bscore}}\left(j\right)$.
 9:
$\mathrm{icomm}\left({\mathbf{licomm}}\right)$ – int64int32nag_int array

On intermediate exit:
icomm is used for storage between calls to
nag_best_subset_given_size_revcomm (h05aa).
On final exit:
icomm is not defined. The first two elements,
${\mathbf{icomm}}\left(1\right)$ and
${\mathbf{icomm}}\left(2\right)$ contain the minimum required value for
licomm and
lrcomm respectively.
 10:
$\mathrm{rcomm}\left({\mathbf{lrcomm}}\right)$ – double array

On intermediate exit:
rcomm is used for storage between calls to
nag_best_subset_given_size_revcomm (h05aa).
On final exit:
rcomm is not defined.
 11:
$\mathrm{ifail}$ – int64int32nag_int scalar
On final exit:
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see
Error Indicators and Warnings).
Error Indicators and Warnings
Errors or warnings detected by the function:
Cases prefixed with W are classified as warnings and
do not generate an error of type NAG:error_n. See nag_issue_warnings.
 ${\mathbf{ifail}}=11$

Constraint: ${\mathbf{irevcm}}=0$ or $1$.
 ${\mathbf{ifail}}=21$

Constraint: ${\mathbf{mincr}}=0$ or $1$.
 ${\mathbf{ifail}}=22$

mincr has changed between calls.
 ${\mathbf{ifail}}=31$

Constraint: ${\mathbf{m}}\ge 2$.
 ${\mathbf{ifail}}=32$

m has changed between calls.
 ${\mathbf{ifail}}=41$

Constraint: $1\le {\mathbf{ip}}\le {\mathbf{m}}$.
 ${\mathbf{ifail}}=42$

ip has changed between calls.
 ${\mathbf{ifail}}=51$

Constraint: ${\mathbf{nbest}}\ge 1$.
 ${\mathbf{ifail}}=52$

nbest has changed between calls.
 W ${\mathbf{ifail}}=53$

On entry, ${\mathbf{nbest}}=\_$.
But only $\_$ best subsets could be calculated.
 ${\mathbf{ifail}}=61$

drop has changed between calls.
 ${\mathbf{ifail}}=71$

lz has changed between calls.
 ${\mathbf{ifail}}=91$

la has changed between calls.
 ${\mathbf{ifail}}=111$

${\mathbf{bscore}}\left(\_\right)=\_$, which is inconsistent with the score for the parent node.
 ${\mathbf{ifail}}=131$

mincnt has changed between calls.
 ${\mathbf{ifail}}=141$

gamma has changed between calls.
 ${\mathbf{ifail}}=151$

${\mathbf{acc}}\left(1\right)$ has changed between calls.
 ${\mathbf{ifail}}=152$

${\mathbf{acc}}\left(2\right)$ has changed between calls.
 ${\mathbf{ifail}}=161$

icomm has been corrupted between calls.
 ${\mathbf{ifail}}=171$

licomm is too small.
icomm is too small to return the required array sizes.
 W ${\mathbf{ifail}}=172$

Constraint:
${\mathbf{licomm}}\ge \_$.
The minimum required values for
licomm and
lrcomm are returned in
${\mathbf{icomm}}\left(1\right)$ and
${\mathbf{icomm}}\left(2\right)$ respectively.
 ${\mathbf{ifail}}=181$

rcomm has been corrupted between calls.
 ${\mathbf{ifail}}=99$
An unexpected error has been triggered by this routine. Please
contact
NAG.
 ${\mathbf{ifail}}=399$
Your licence key may have expired or may not have been installed correctly.
 ${\mathbf{ifail}}=999$
Dynamic memory allocation failed.
Accuracy
The subsets returned by nag_best_subset_given_size_revcomm (h05aa) are guaranteed to be optimal up to the accuracy of your calculated scores.
Further Comments
The maximum number of unique subsets of size
$p$ from a set of
$m$ features is
$N=\frac{m!}{\left(mp\right)!p!}$. The efficiency of the branch and bound algorithm implemented in
nag_best_subset_given_size_revcomm (h05aa) comes from evaluating subsets at internal nodes of the tree, that is subsets with more than
$p$ features, and where possible trimming branches of the tree based on the scores at these internal nodes as described in
Narendra and Fukunaga (1977). Because of this it is possible, in some circumstances, for more than
$N$ subsets to be evaluated. This will tend to happen when most of the features have a similar effect on the subset score.
If multiple optimal subsets exist with the same score, and
nbest is too small to return them all, then the choice of which of these optimal subsets is returned is arbitrary.
Example
This example finds the three linear regression models, with five variables, that have the smallest residual sums of squares when fitted to a supplied dataset. The data used in this example was simulated.
Open in the MATLAB editor:
h05aa_example
function h05aa_example
fprintf('h05aa example results\n\n');
n = int64(40);
m = int64(14);
[x,y] = gen_data(n,m);
irevcm = int64(0);
mincr = int64(0);
ip = int64(5);
drop = int64(0);
lz = int64(0);
mip = m  ip;
z = zeros(mip, 1, 'int64');
la = int64(0);
nbest = int64(3);
a = zeros(max(nbest, m), 1, 'int64');
bscore = zeros(max(nbest, m), 1);
bz = zeros(mip, nbest, 'int64');
mincnt = int64(1);
gamma = 1;
acc = [0, 0];
icomm = zeros(2, 1, 'int64');
rcomm = zeros(0, 0);
warning('off', 'NAG:warning');
[irevcm, drop, lz, z, la, a, bscore, bz, icomm, rcomm, ifail] = ...
h05aa(...
irevcm, mincr, m, ip, drop, lz, z, la, a, ...
bscore, bz, mincnt, gamma, acc, icomm, rcomm);
warning('on', 'NAG:warning');
rcomm = zeros(icomm(2), 1);
icomm = zeros(icomm(1), 1, 'int64');
cnt = 0;
[irevcm, drop, lz, z, la, a, bscore, bz, icomm, rcomm, ifail] = ...
h05aa(...
irevcm, mincr, m, ip, drop, lz, z, la, a, ...
bscore, bz, mincnt, gamma, acc, icomm, rcomm);
while not(irevcm == 0)
cnt = cnt + max(1,la);
[bscore] = calc_subset_score(m, drop, lz, z, la, a, x, y, bscore);
[irevcm, drop, lz, z, la, a, bscore, bz, icomm, rcomm, ifail] = ...
h05aa(...
irevcm, mincr, m, ip, drop, lz, z, la, a, ...
bscore, bz, mincnt, gamma, acc, icomm, rcomm);
end
fprintf('\n Score Feature Subset\n');
fprintf('  \n');
ibz = 1:m;
for i = 1:la
mask = ones(1, m, 'int64');
mask(bz(1:mip, i)) = 0;
fprintf('%12.5e %5d %5d %5d %5d %5d\n', bscore(i), ibz(logical(mask)));
end
fprintf('\n%d subsets evaluated in total\n', cnt);
function [bscore] = calc_subset_score(m, drop, lz, z, la, a, x, y, bscore)
if drop == 0
isx = zeros(m, 1, 'int64');
else
isx = ones(m, 1, 'int64');
end
inv_drop = not(drop);
isx(z(1:lz)) = inv_drop;
for i=1:max(la, 1)
if (la > 0)
if (i > 1)
isx(a(i1)) = drop;
end
isx(a(i)) = inv_drop;
end
ip = int64(sum(isx));
rss = g02da('z', x, isx, ip, y);
bscore(i) = rss;
end
function [x, y] = gen_data(n,m)
x = zeros(n,m);
genid = int64(3);
subid = int64(1);
seed(1) = int64(23124124);
[state, ifail] = g05kf(genid, subid, seed);
for i = 1:m
[state, x(1:n,i), ifail] = g05sk( ...
n, 0, sqrt(3), state);
end
[state, b, ifail] = g05sk(m, 1.5, 3, state);
[state, y, ifail] = g05sk(n, 0, 1, state);
y = x*b + y;
h05aa example results
Score Feature Subset
 
1.21567e+03 3 4 6 7 14
1.32495e+03 3 5 6 7 14
1.37244e+03 3 5 6 12 14
337 subsets evaluated in total
PDF version (NAG web site
, 64bit version, 64bit version)
© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015