h05abf (best_subset_given_size) : NAG Library, Mark 29

On entry: flag indicating whether the scoring function

f

is increasing or decreasing.

$mincr = 1$: $f (S_{i}) \leq f (S_{j})$ , i.e., the subsets with the largest score will be selected.
$mincr = 0$: $f (S_{i}) \geq f (S_{j})$ , i.e., the subsets with the smallest score will be selected.

For all

S_{j} \subseteq Ω

and

S_{i} \subseteq S_{j}

.

Constraint:

mincr = 0

or

1

.

On entry:

m

, the number of features in the full feature set.

Constraint:

m \geq 2

.

On entry:

p

, the number of features in the subset of interest.

Constraint:

1 \leq ip \leq m

.

On entry:

n

, the maximum number of best subsets required. The actual number of subsets returned is given by la on final exit. If on final exit

la \neq nbest

then

ifail = 42

is returned.

Constraint:

nbest \geq 1

.

On exit: the number of best subsets returned.

On exit: holds the score for the la best subsets returned in bz.

On exit: the

j

th best subset is constructed by dropping the features specified in

bz (i, j)

, for

i = 1, 2, \dots, m - ip

and

j = 1, 2, \dots, la

, from the set of all features,

Ω

. The score for the

j

th best subset is given in

bscore (j)

.

f must evaluate the scoring function

f

.

The specification of f is:

Fortran Interface

Subroutine f (

m, drop, lz, z, la, a, score, iuser, ruser, info)

Integer, Intent (In)	::	m, drop, lz, z(lz), la, a(la)
Integer, Intent (Inout)	::	iuser(*), info
Real (Kind=nag_wp), Intent (Inout)	::	ruser(*)
Real (Kind=nag_wp), Intent (Out)	::	score(max(la,1))

C Header Interface

void	f (const Integer m, const Integer drop, const Integer lz, const Integer z[], const Integer la, const Integer a[], double score[], Integer iuser[], double ruser[], Integer *info)

1: $m$ – Integer Input

On entry:

m = | Ω |

, the number of features in the full feature set.

2: $drop$ – Integer Input

On entry: flag indicating whether the intermediate subsets should be constructed by dropping features from the full set (

drop = 1

) or adding features to the empty set (

drop = 0

). See score for additional details.

3: $lz$ – Integer Input

On entry: the number of features stored in z.

4: $z (lz)$ – Integer array Input

On entry:

z (i)

, for

i = 1, 2, \dots, lz

, contains the list of features which, along with those specified in a, define the subsets whose score is required. See score for additional details.

5: $la$ – Integer Input

On entry: if

la > 0

, the number of subsets for which a score must be returned.

If

la = 0

, the score for a single subset should be returned. See score for additional details.

6: $a (la)$ – Integer array Input

On entry:

a (j)

, for

j = 1, 2, \dots, la

, contains the list of features which, along with those specified in z, define the subsets whose score is required. See score for additional details.

7: $score (\max (la, 1))$ – Real (Kind=nag_wp) array Output

On exit: the value

f (S_{j})

, for

j = 1, 2, \dots, la

, the score associated with the

j

th subset.

S_{j}

is constructed as follows:

$drop = 1$: $S_{j}$ is constructed by dropping the features specified in the first lz elements of z and the single feature given in $a (j)$ from the full set of features, $Ω$ . The subset will, therefore, contain $m - lz - 1$ features.
$drop = 0$: $S_{j}$ is constructed by adding the features specified in the first lz elements of z and the single feature specified in $a (j)$ to the empty set, $\emptyset$ . The subset will, therefore, contain $lz + 1$ features.

In both cases the individual features are referenced by the integers

1

to m with

1

indicating the first feature,

2

the second, etc., for some arbitrary ordering of the features, chosen by you prior to calling h05abf. For example,

1

might refer to the first variable in a particular set of data,

2

the second, etc..

If

la = 0

, the score for a single subset should be returned. This subset is constructed by adding or removing only those features specified in the first lz elements of z. If

lz = 0

, this subset will either be

Ω

or

\emptyset

.

8: $iuser (*)$ – Integer array User Workspace

9: $ruser (*)$ – Real (Kind=nag_wp) array User Workspace

f is called with the arguments iuser and ruser as supplied to h05abf. You should use the arrays iuser and ruser to supply information to f.

10: $info$ – Integer Input/Output

On entry:

info = 0

.

On exit: set info to a nonzero value if you wish h05abf to terminate with

ifail = 82

.

f must either be a module subprogram USEd by, or declared as EXTERNAL in, the (sub)program from which h05abf is called. Arguments denoted as Input must not be changed by this procedure.

Note: f should not return floating-point NaN (Not a Number) or infinity values, since these are not handled by h05abf. If your code inadvertently does return any NaNs or infinities, h05abf is likely to produce unexpected results.

On entry:

k

, the minimum number of times the effect of each feature,

x_{i}

, must have been observed before

f (S - {x_{i}})

is estimated from

f (S)

as opposed to being calculated directly.

If

k = 0

then

f (S - {x_{i}})

is never estimated. If

mincnt < 0

then

k

is set to

1

.

On entry:

γ

, the scaling factor used when estimating scores. If

gamma < 0

then

γ = 1

is used.

On entry: a measure of the accuracy of the scoring function,

f

.

Letting

a_{i} = ε_{1} | f (S_{i}) | + ε_{2}

, then when confirming whether the scoring function is strictly increasing or decreasing (as described in mincr), or when assessing whether a node defined by subset

S_{i}

can be trimmed, then any values in the range

f (S_{i}) \pm a_{i}

are treated as being numerically equivalent.

If

0 \leq acc (1) \leq 1

then

ε_{1} = acc (1)

, otherwise

ε_{1} = 0

.

If

acc (2) \geq 0

then

ε_{2} = acc (2)

, otherwise

ε_{2} = 0

.

In most situations setting both

ε_{1}

and

ε_{2}

to zero should be sufficient. Using a nonzero value, when one is not required, can significantly increase the number of subsets that need to be evaluated.

iuser and ruser are not used by h05abf, but are passed directly to f and may be used to pass information to this routine.

On entry: ifail must be set to

0

,

−1

or

1

to set behaviour on detection of an error; these values have no effect when no error is detected.

A value of

0

causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of

−1

means that an error message is printed while a value of

1

means that it is not.

If halting is not appropriate, the value

−1

or

1

is recommended. If message printing is undesirable, then the value

1

is recommended. Otherwise, the value

0

is recommended. When the value $- 1$ or $1$ is used it is essential to test the value of ifail on exit.

On exit:

ifail = 0

unless the routine detects an error or a warning has been flagged (see Section 6).

NAG FL Interface
h05abf (best_subset_given_size)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG FL Interfaceh05abf (best_​subset_​given_​size)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG FL Interface
h05abf (best_subset_given_size)