naginterfaces.library.mv.discrim_group¶

naginterfaces.library.mv.discrim_group(typ, equal, priors, nig, gmn, gc, det, isx, x, prior, atiq)[source]¶

discrim_group allocates observations to groups according to selected rules. It is intended for use after discrim().

For full information please refer to the NAG Library document for g03dc

https://support.nag.com/numeric/nl/nagdoc_30.3/flhtml/g03/g03dcf.html

Parameters

typstr, length 1

Whether the estimative or predictive approach is used.

$t y p ='E'$

The estimative approach is used.

$t y p ='P'$

The predictive approach is used.

equalstr, length 1

Indicates whether or not the within-group variance-covariance matrices are assumed to be equal and the pooled variance-covariance matrix used.

$e q u a l ='E'$

The within-group variance-covariance matrices are assumed equal and the matrix $R$ stored in the first $p (p + 1) / 2$ elements of $g c$ is used.

$e q u a l ='U'$

The within-group variance-covariance matrices are assumed to be unequal and the matrices $R_{i}$ , for $i = 1, 2, \dots, n_{g}$ , stored in the remainder of $g c$ are used.

priorsstr, length 1

Indicates the form of the prior probabilities to be used.

$p r i o r s ='E'$

Equal prior probabilities are used.

$p r i o r s ='P'$

Prior probabilities proportional to the group sizes in the training set, $n_{j}$ , are used.

$p r i o r s ='I'$

The prior probabilities are input in $p r i o r$ .

nigint, array-like, shape $(ng)$

The number of observations in each group in the training set, $n_{j}$ .

gmnfloat, array-like, shape $(ng, nvar)$

The $j$ th row of $g m n$ contains the means of the $p$ variables for the $j$ th group, for $j = 1, 2, \dots, n_{j}$ . These are returned by discrim().

gcfloat, array-like, shape $((ng + 1) \times nvar \times (nvar + 1) / 2)$

The first $p (p + 1) / 2$ elements of $g c$ should contain the upper triangular matrix $R$ and the next $n_{g}$ blocks of $p (p + 1) / 2$ elements should contain the upper triangular matrices $R_{j}$ .

All matrices must be stored packed by column.

These matrices are returned by discrim().

If $e q u a l ='E'$ only the first $p (p + 1) / 2$ elements are referenced, if $e q u a l ='U'$ only the elements $p (p + 1) / 2 + 1$ to $(n_{g} + 1) p (p + 1) / 2$ are referenced.

detfloat, array-like, shape $(ng)$

If $e q u a l ='U'$ . the logarithms of the determinants of the within-group variance-covariance matrices as returned by discrim(). Otherwise $d e t$ is not referenced.

isxint, array-like, shape $(m)$

$i s x [l - 1]$ indicates if the $l$ th variable in $x$ is to be included in the distance calculations.

If $i s x [l - 1] > 0$ , the $l$ th variable is included, for $l = 1, 2, \dots, m$ ; otherwise the $l$ th variable is not referenced.

xfloat, array-like, shape $(nobs, m)$

$x [k - 1, l - 1]$ must contain the $k$ th observation for the $l$ th variable, for $l = 1, 2, \dots, m$ , for $k = 1, 2, \dots, nobs$ .

priorfloat, array-like, shape $(ng)$

If $p r i o r s ='I'$ , the prior probabilities for the $n_{g}$ groups.

atiqbool

$a t i q$ must be $T r u e$ if atypicality indices are required. If $a t i q$ is $F a l s e$ the array $a t i$ is not set.

Returns

priorfloat, ndarray, shape $(ng)$

If $p r i o r s ='P'$ , the computed prior probabilities in proportion to group sizes for the $n_{g}$ groups.

If $p r i o r s ='I'$ , the input prior probabilities will be unchanged.

If $p r i o r s ='E'$ , $p r i o r$ is not set.

pfloat, ndarray, shape $(nobs, ng)$

$p [k - 1, j - 1]$ contains the posterior probability $p_{k j}$ for allocating the $k$ th observation to the $j$ th group, for $j = 1, 2, \dots, n_{g}$ , for $k = 1, 2, \dots, nobs$ .

iagint, ndarray, shape $(nobs)$

The groups to which the observations have been allocated.

atifloat, ndarray, shape $(nobs, :)$

If $a t i q$ is $T r u e$ , $a t i [k - 1, j - 1]$ will contain the predictive atypicality index for the $k$ th observation with respect to the $j$ th group, for $j = 1, 2, \dots, n_{g}$ , for $k = 1, 2, \dots, nobs$ .

If $a t i q$ is $F a l s e$ , $a t i$ is not set.

Raises

NagValueError

(errno $1$ )

On entry, $t y p = ⟨ v a l u e ⟩$ .

Constraint: $t y p ='E'$ or $'P'$ .

(errno $1$ )

On entry, $p r i o r s = ⟨ v a l u e ⟩$ .

Constraint: $p r i o r s ='E'$ , $'I'$ or $'P'$ .

(errno $1$ )

On entry, $e q u a l = ⟨ v a l u e ⟩$ .

Constraint: $e q u a l ='E'$ or $'U'$ .

(errno $1$ )

On entry, $m = ⟨ v a l u e ⟩$ and $nvar = ⟨ v a l u e ⟩$ .

Constraint: $m \geq nvar$ .

(errno $1$ )

On entry, $nobs = ⟨ v a l u e ⟩$ .

Constraint: $nobs \geq 1$ .

(errno $1$ )

On entry, $ng = ⟨ v a l u e ⟩$ .

Constraint: $ng \geq 2$ .

(errno $1$ )

On entry, $nvar = ⟨ v a l u e ⟩$ .

Constraint: $nvar \geq 1$ .

(errno $2$ )

On entry, $i = ⟨ v a l u e ⟩$ and $n i g [i - 1] = ⟨ v a l u e ⟩$ .

Constraint: $n i g [i - 1] > 0$ .

(errno $2$ )

On entry, $\sum_{i} (n i g [i]) \leq ng + nvar$ .

(errno $2$ )

On entry, $i = ⟨ v a l u e ⟩$ , $n i g [i - 1] = ⟨ v a l u e ⟩$ and $nvar = ⟨ v a l u e ⟩$ .

Constraint: $n i g [i - 1] < nvar$ .

(errno $2$ )

On entry, $nvar = ⟨ v a l u e ⟩$ and $⟨ v a l u e ⟩$ values of $i s x > 0$ .

Constraint: exactly $nvar$ elements of $i s x > 0$ .

(errno $3$ )

On entry, $\sum_{j} (p r i o r [j]) \neq 1.0$ .

(errno $3$ )

On entry, $j = ⟨ v a l u e ⟩$ and $p r i o r [j - 1] = ⟨ v a l u e ⟩$ .

Constraint: $p r i o r [j - 1] > 0$ .

(errno $4$ )

On entry, a diagonal element of $R$ or $R_{j}$ is zero.

Notes

In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.

Discriminant analysis is concerned with the allocation of observations to groups using information from other observations whose group membership is known, $X_{t}$ ; these are called the training set. Consider $p$ variables observed on $n_{g}$ populations or groups. Let ${¯ x}_{j}$ be the sample mean and $S_{j}$ the within-group variance-covariance matrix for the $j$ th group; these are calculated from a training set of $n$ observations with $n_{j}$ observations in the $j$ th group, and let $x_{k}$ be the $k$ th observation from the set of observations to be allocated to the $n_{g}$ groups. The observation can be allocated to a group according to a selected rule. The allocation rule or discriminant function will be based on the distance of the observation from an estimate of the location of the groups, usually the group means. A measure of the distance of the observation from the $j$ th group mean is given by the Mahalanobis distance, $D_{k j}$ :

D_{k j}^{2} = {(x_{k} - {¯ x}_{j})}_{k}^{T} S_{j}^{- 1} (x_{k} - {¯ x}_{j}) .

If the pooled estimate of the variance-covariance matrix $S$ is used rather than the within-group variance-covariance matrices, then the distance is:

D_{k j}^{2} = {(x_{k} - {¯ x}_{j})}_{k}^{T} S^{- 1} (x_{k} - {¯ x}_{j}) .

Instead of using the variance-covariance matrices $S$ and $S_{j}$ , discrim_group uses the upper triangular matrices $R$ and $R_{j}$ supplied by discrim() such that $S = R^{T} R$ and $S_{j} = R_{j}^{T} R_{j}$ . $D_{k j}^{2}$ can then be calculated as $z^{T} z$ where $R_{j}^{T} z = (x_{k} - x_{j})$ or $R^{T} z = (x_{k} - x)$ as appropriate.

In addition to the distances, a set of prior probabilities of group membership, $π_{j}$ , for $j = 1, 2, \dots, n_{g}$ , may be used, with $\sum π_{j} = 1$ . The prior probabilities reflect your view as to the likelihood of the observations coming from the different groups. Two common cases for prior probabilities are $π_{1} = π_{2} = \dots = π_{n_{g}}$ , that is, equal prior probabilities, and $π_{j} = n_{j} / n$ , for $j = 1, 2, \dots, n_{g}$ , that is, prior probabilities proportional to the number of observations in the groups in the training set.

discrim_group uses one of four allocation rules. In all four rules the $p$ variables are assumed to follow a multivariate Normal distribution with mean $μ_{j}$ and variance-covariance matrix $Σ_{j}$ if the observation comes from the $j$ th group. The different rules depend on whether or not the within-group variance-covariance matrices are assumed equal, i.e., $Σ_{1} = Σ_{2} = \dots = Σ_{n_{g}}$ , and whether a predictive or estimative approach is used. If $p (x_{k} | μ_{j}, Σ_{j})$ is the probability of observing the observation $x_{k}$ from group $j$ , then the posterior probability of belonging to group $j$ is:

p (j | x_{k}, μ_{j}, Σ_{j}) \propto p (x_{k} | μ_{j}, Σ_{j}) π_{j} .

In the estimative approach, the parameters $μ_{j}$ and $Σ_{j}$ in (3) are replaced by their estimates calculated from $X_{t}$ . In the predictive approach, a non-informative prior distribution is used for the parameters and a posterior distribution for the parameters, $p (μ_{j}, Σ_{j} | X_{t})$ , is found. A predictive distribution is then obtained by integrating $p (j | x_{k}, μ_{j}, Σ_{j}) p (μ_{j}, Σ_{j} | X)$ over the parameter space. This predictive distribution then replaces $p (x_{k} | μ_{j}, Σ_{j})$ in (3). See Aitchison and Dunsmore (1975), Aitchison et al. (1977) and Moran and Murphy (1979) for further details.

The observation is allocated to the group with the highest posterior probability. Denoting the posterior probabilities, $p (j | x_{k}, μ_{j}, Σ_{j})$ , by $q_{j}$ , the four allocation rules are:

Estimative with equal variance-covariance matrices – Linear Discrimination

$log (q_{j}) \propto - \frac{1}{2} D_{k j}^{2} + log (π_{j})$
Estimative with unequal variance-covariance matrices – Quadratic Discrimination

$log (q_{j}) \propto - \frac{1}{2} D_{k j}^{2} + log (π_{j}) - \frac{1}{2} log (∣ ∣ S_{j} ∣ ∣)$
Predictive with equal variance-covariance matrices

$q_{j}^{- 1} \propto {((n_{j} + 1) / n_{j})}^{p / 2} {1 + [n_{j} / ((n - n_{g}) (n_{j} + 1))] D_{k j}^{2}}^{(n + 1 - n_{g}) / 2}$
Predictive with unequal variance-covariance matrices

$q_{j}^{- 1} \propto C {((n_{j}^{2} - 1) / n_{j}) ∣ ∣ S_{j} ∣ ∣}^{p / 2} {1 + (n_{j} / (n_{j}^{2} - 1)) D_{k j}^{2}}^{n_{j} / 2},$

where

$C = \frac{Γ (\frac{1}{2} (n_{j} - p))}{Γ (\frac{1}{2} n_{j})} .$

In the above the appropriate value of $D_{k j}^{2}$ from (1) and (2) is used. The values of the $q_{j}$ are standardized so that,

n_{g} \sum j = 1 q_{j} = 1 .

Moran and Murphy (1979) show the similarity between the predictive methods and methods based upon likelihood ratio tests.

In addition to allocating the observation to a group, discrim_group computes an atypicality index, $I_{j} (x_{k})$ . The predictive atypicality index is returned, irrespective of the value of the parameter $t y p$ . This represents the probability of obtaining an observation more typical of group $j$ than the observed $x_{k}$ (see Aitchison and Dunsmore (1975) and Aitchison et al. (1977)). The atypicality index is computed for unequal within-group variance-covariance matrices as:

I_{j} (x_{k}) = P (B \leq z : \frac{1}{2} p, \frac{1}{2} (n_{j} - p))

where $P (B \leq β : a, b)$ is the lower tail probability from a beta distribution and

z = D_{k j}^{2} / (D_{k j}^{2} + (n_{j}^{2} - 1) / n_{j}),

and for equal within-group variance-covariance matrices as:

I_{j} (x_{k}) = P (B \leq z : \frac{1}{2} p, \frac{1}{2} (n - n_{g} - p + 1)),

with

z = D_{k j}^{2} / (D_{k j}^{2} + (n - n_{g}) (n_{j} + 1) / n_{j}) .

If $I_{j} (x_{k})$ is close to $1$ for all groups it indicates that the observation may come from a grouping not represented in the training set. Moran and Murphy (1979) provide a frequentist interpretation of $I_{j} (x_{k})$ .

References

Aitchison, J and Dunsmore, I R, 1975, Statistical Prediction Analysis, Cambridge

Aitchison, J, Habbema, J D F and Kay, J W, 1977, A critical comparison of two methods of statistical discrimination, Appl. Statist. (26), 15–25

Kendall, M G and Stuart, A, 1976, The Advanced Theory of Statistics (Volume 3), (3rd Edition), Griffin

Krzanowski, W J, 1990, Principles of Multivariate Analysis, Oxford University Press

Moran, M A and Murphy, B J, 1979, A closer look at two alternative methods of statistical discrimination, Appl. Statist. (28), 223–232

Morrison, D F, 1967, Multivariate Statistical Methods, McGraw–Hill

NAG and Python

Return to Front

naginterfaces.library.mv.discrim_group¶

naginterfaces.library.mv.discrim_​group¶

naginterfaces.library.mv.discrim_group¶