PDF version (NAG web site
, 64-bit version, 64-bit version)
NAG Toolbox: nag_mv_discrim_group (g03dc)
Purpose
nag_mv_discrim_group (g03dc) allocates observations to groups according to selected rules. It is intended for use after
nag_mv_discrim (g03da).
Syntax
[
prior,
p,
iag,
ati,
ifail] = g03dc(
typ,
equal,
priors,
nig,
gmn,
gc,
det,
isx,
x,
prior,
atiq, 'nvar',
nvar, 'ng',
ng, 'nobs',
nobs, 'm',
m)
[
prior,
p,
iag,
ati,
ifail] = nag_mv_discrim_group(
typ,
equal,
priors,
nig,
gmn,
gc,
det,
isx,
x,
prior,
atiq, 'nvar',
nvar, 'ng',
ng, 'nobs',
nobs, 'm',
m)
Note: the interface to this routine has changed since earlier releases of the toolbox:
At Mark 22: |
nobs was made optional |
Description
Discriminant analysis is concerned with the allocation of observations to groups using information from other observations whose group membership is known,
; these are called the training set. Consider
variables observed on
populations or groups. Let
be the sample mean and
the within-group variance-covariance matrix for the
th group; these are calculated from a training set of
observations with
observations in the
th group, and let
be the
th observation from the set of observations to be allocated to the
groups. The observation can be allocated to a group according to a selected rule. The allocation rule or discriminant function will be based on the distance of the observation from an estimate of the location of the groups, usually the group means. A measure of the distance of the observation from the
th group mean is given by the Mahalanobis distance,
:
If the pooled estimate of the variance-covariance matrix
is used rather than the within-group variance-covariance matrices, then the distance is:
Instead of using the variance-covariance matrices
and
,
nag_mv_discrim_group (g03dc) uses the upper triangular matrices
and
supplied by
nag_mv_discrim (g03da) such that
and
.
can then be calculated as
where
or
as appropriate.
In addition to the distances, a set of prior probabilities of group membership, , for , may be used, with . The prior probabilities reflect your view as to the likelihood of the observations coming from the different groups. Two common cases for prior probabilities are , that is, equal prior probabilities, and , for , that is, prior probabilities proportional to the number of observations in the groups in the training set.
nag_mv_discrim_group (g03dc) uses one of four allocation rules. In all four rules the
variables are assumed to follow a multivariate Normal distribution with mean
and variance-covariance matrix
if the observation comes from the
th group. The different rules depend on whether or not the within-group variance-covariance matrices are assumed equal, i.e.,
, and whether a predictive or estimative approach is used. If
is the probability of observing the observation
from group
, then the posterior probability of belonging to group
is:
In the estimative approach, the arguments
and
in
(3) are replaced by their estimates calculated from
. In the predictive approach, a non-informative prior distribution is used for the arguments and a posterior distribution for the arguments,
, is found. A predictive distribution is then obtained by integrating
over the argument space. This predictive distribution then replaces
in
(3). See
Aitchison and Dunsmore (1975),
Aitchison et al. (1977) and
Moran and Murphy (1979) for further details.
The observation is allocated to the group with the highest posterior probability. Denoting the posterior probabilities,
, by
, the four allocation rules are:
(i) |
Estimative with equal variance-covariance matrices – Linear Discrimination
|
(ii) |
Estimative with unequal variance-covariance matrices – Quadratic Discrimination
|
(iii) |
Predictive with equal variance-covariance matrices
|
(iv) |
Predictive with unequal variance-covariance matrices
where
|
In the above the appropriate value of
from
(1) or
(2) is used. The values of the
are standardized so that,
Moran and Murphy (1979) show the similarity between the predictive methods and methods based upon likelihood ratio tests.
In addition to allocating the observation to a group,
nag_mv_discrim_group (g03dc) computes an atypicality index,
. The predictive atypicality index is returned, irrespective of the value of the parameter
typ. This represents the probability of obtaining an observation more typical of group
than the observed
(see
Aitchison and Dunsmore (1975) and
Aitchison et al. (1977)). The atypicality index is computed for unequal within-group variance-covariance matrices as:
where
is the lower tail probability from a beta distribution and
and for equal within-group variance-covariance matrices as:
with
If
is close to
for all groups it indicates that the observation may come from a grouping not represented in the training set.
Moran and Murphy (1979) provide a frequentist interpretation of
.
References
Aitchison J and Dunsmore I R (1975) Statistical Prediction Analysis Cambridge
Aitchison J, Habbema J D F and Kay J W (1977) A critical comparison of two methods of statistical discrimination Appl. Statist. 26 15–25
Kendall M G and Stuart A (1976) The Advanced Theory of Statistics (Volume 3) (3rd Edition) Griffin
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press
Moran M A and Murphy B J (1979) A closer look at two alternative methods of statistical discrimination Appl. Statist. 28 223–232
Morrison D F (1967) Multivariate Statistical Methods McGraw–Hill
Parameters
Compulsory Input Parameters
- 1:
– string (length ≥ 1)
-
Whether the estimative or predictive approach is used.
- The estimative approach is used.
- The predictive approach is used.
Constraint:
or .
- 2:
– string (length ≥ 1)
-
Indicates whether or not the within-group variance-covariance matrices are assumed to be equal and the pooled variance-covariance matrix used.
- The within-group variance-covariance matrices are assumed equal and the matrix stored in the first elements of gc is used.
- The within-group variance-covariance matrices are assumed to be unequal and the matrices
, for , stored in the remainder of gc are used.
Constraint:
or .
- 3:
– string (length ≥ 1)
-
Indicates the form of the prior probabilities to be used.
- Equal prior probabilities are used.
- Prior probabilities proportional to the group sizes in the training set, , are used.
- The prior probabilities are input in prior.
Constraint:
, or .
- 4:
– int64int32nag_int array
-
The number of observations in each group in the training set, .
Constraints:
- if , and , for ;
- if , , for .
- 5:
– double array
-
ldgmn, the first dimension of the array, must satisfy the constraint
.
The
th row of
gmn contains the means of the
variables for the
th group, for
. These are returned by
nag_mv_discrim (g03da).
- 6:
– double array
-
The first
elements of
gc should contain the upper triangular matrix
and the next
blocks of
elements should contain the upper triangular matrices
.
All matrices must be stored packed by column. These matrices are returned by
nag_mv_discrim (g03da). If
only the first
elements are referenced, if
only the elements
to
are referenced.
Constraints:
- if , the diagonal elements of must be ;
- if , the diagonal elements of the must be , for .
- 7:
– double array
-
If
. the logarithms of the determinants of the within-group variance-covariance matrices as returned by
nag_mv_discrim (g03da). Otherwise
det is not referenced.
- 8:
– int64int32nag_int array
-
indicates if the
th variable in
x is to be included in the distance calculations.
If
, the th variable is included, for ; otherwise the th variable is not referenced.
Constraint:
for
nvar values of
.
- 9:
– double array
-
ldx, the first dimension of the array, must satisfy the constraint
.
must contain the th observation for the th variable, for and .
- 10:
– double array
-
If , the prior probabilities for the groups.
Constraint:
if , and , for .
- 11:
– logical scalar
-
atiq must be
true if atypicality indices are required. If
atiq is
false the array
ati is not set.
Optional Input Parameters
- 1:
– int64int32nag_int scalar
-
Default:
the second dimension of the array
gmn.
, the number of variables in the variance-covariance matrices.
Constraint:
.
- 2:
– int64int32nag_int scalar
-
Default:
the dimension of the arrays
nig,
det,
prior and the first dimension of the array
gmn. (An error is raised if these dimensions are not equal.)
The number of groups, .
Constraint:
.
- 3:
– int64int32nag_int scalar
-
Default:
the first dimension of the arrays
gmn,
x. (An error is raised if these dimensions are not equal.)
The number of observations in
x which are to be allocated.
Constraint:
.
- 4:
– int64int32nag_int scalar
-
Default:
the dimension of the array
isx and the second dimension of the array
x. (An error is raised if these dimensions are not equal.)
The number of variables in the data array
x.
Constraint:
.
Output Parameters
- 1:
– double array
-
If
, the computed prior probabilities in proportion to group sizes for the
groups.
If , the input prior probabilities will be unchanged.
If
,
prior is not set.
- 2:
– double array
-
contains the posterior probability for allocating the th observation to the th group, for and .
- 3:
– int64int32nag_int array
-
The groups to which the observations have been allocated.
- 4:
– double array
-
The first dimension of the array
ati will be
.
The second dimension of the array
ati will be
if
and
otherwise.
If
atiq is
true,
will contain the predictive atypicality index for the
th observation with respect to the
th group, for
and
.
If
atiq is
false,
ati is not set.
- 5:
– int64int32nag_int scalar
unless the function detects an error (see
Error Indicators and Warnings).
Error Indicators and Warnings
Errors or warnings detected by the function:
-
-
On entry, | , |
or | , |
or | , |
or | , |
or | , |
or | , |
or | , |
or | or ‘p’, |
or | or ‘U’, |
or | , ‘I’ or ‘p’. |
-
-
On entry, | the number of variables indicated by isx is not equal to nvar, |
or | and , for some , |
or | and , |
or | and for some . |
-
-
On entry, | and for some , |
or | and is not within of . |
-
-
On entry, | and a diagonal element of is zero, |
or | and a diagonal element of for some is zero. |
-
An unexpected error has been triggered by this routine. Please
contact
NAG.
-
Your licence key may have expired or may not have been installed correctly.
-
Dynamic memory allocation failed.
Accuracy
The accuracy of the returned posterior probabilities will depend on the accuracy of the input or matrices. The atypicality index should be accurate to four significant places.
Further Comments
The distances
can be computed using
nag_mv_discrim_mahal (g03db) if other forms of discrimination are required.
Example
The data, taken from
Aitchison and Dunsmore (1975), is concerned with the diagnosis of three ‘types’ of Cushing's syndrome. The variables are the logarithms of the urinary excretion rates (mg/24hr) of two steroid metabolites. Observations for a total of
patients are input and the group means and
matrices are computed by
nag_mv_discrim (g03da). A further six observations of unknown type are input and allocations made using the predictive approach and under the assumption that the within-group covariance matrices are not equal. The posterior probabilities of group membership,
, and the atypicality index are printed along with the allocated group. The atypicality index shows that observations
and
do not seem to be typical of the three types present in the initial
observations.
Open in the MATLAB editor:
g03dc_example
function g03dc_example
fprintf('g03dc example results\n\n');
x = [1.1314, 2.4596;
1.0986, 0.2624;
0.6419, -2.3026;
1.3350, -3.2189;
1.4110, 0.0953;
0.6419, -0.9163;
2.1163, 0.0000;
1.3350, -1.6094;
1.3610, -0.5108;
2.0541, 0.1823;
2.2083, -0.5108;
2.7344, 1.2809;
2.0412, 0.4700;
1.8718, -0.9163;
1.7405, -0.9163;
2.6101, 0.4700;
2.3224, 1.8563;
2.2192, 2.0669;
2.2618, 1.1314;
3.9853, 0.9163;
2.7600, 2.0281];
[n,m] = size(x);
isx = ones(m,1,'int64');
nvar = int64(m);
ing = ones(n,1,'int64');
ing(7:16) = int64(2);
ing(17:n) = int64(3);
ng = int64(3);
[nig, gmean, det, gc, stat, df, sig, ifail] = ...
g03da( ...
x, isx, nvar, ing, ng);
x = [1.6292, -0.9163;
2.5572, 1.6094;
2.5649, -0.2231;
0.9555, -2.3026;
3.4012, -2.3026;
3.0204, -0.2231];
typ = 'P';
equal = 'U';
priors = 'Equal priors';
prior = zeros(3, 1);
atiq = true;
[prior, p, iag, ati, ifail] = ...
g03dc( ...
typ, equal, priors, nig, gmean, gc, det, isx, x, prior, atiq);
fprintf(' Obs Posterior Allocated Atypicality\n');
fprintf(' probabilities to group index\n');
for i=1:6
fprintf('%6d ', i);
fprintf('%6.3f', p(i,:));
fprintf('%6d ', iag(i));
fprintf('%6.3f', ati(i,:));
fprintf('\n');
end
g03dc example results
Obs Posterior Allocated Atypicality
probabilities to group index
1 0.094 0.905 0.002 2 0.596 0.254 0.975
2 0.005 0.168 0.827 3 0.952 0.836 0.018
3 0.019 0.920 0.062 2 0.954 0.797 0.912
4 0.697 0.303 0.000 1 0.207 0.860 0.993
5 0.317 0.013 0.670 3 0.991 1.000 0.984
6 0.032 0.366 0.601 3 0.981 0.978 0.887
PDF version (NAG web site
, 64-bit version, 64-bit version)
© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015