NAG Library Routine Document
G03CAF
1 Purpose
G03CAF computes the maximum likelihood estimates of the parameters of a factor analysis model. Either the data matrix or a correlation/covariance matrix may be input. Factor loadings, communalities and residual correlations are returned.
2 Specification
SUBROUTINE G03CAF ( |
MATRIX, WEIGHT, N, M, X, LDX, NVAR, ISX, NFAC, WT, E, STAT, COM, PSI, RES, FL, LDFL, IOP, IWK, WK, LWK, IFAIL) |
INTEGER |
N, M, LDX, NVAR, ISX(M), NFAC, LDFL, IOP(5), IWK(4*NVAR+2), LWK, IFAIL |
REAL (KIND=nag_wp) |
X(LDX,M), WT(*), E(NVAR), STAT(4), COM(NVAR), PSI(NVAR), RES(NVAR*(NVAR-1)/2), FL(LDFL,NFAC), WK(LWK) |
CHARACTER(1) |
MATRIX, WEIGHT |
|
3 Description
Let
variables,
, with variance-covariance matrix
be observed. The aim of factor analysis is to account for the covariances in these
variables in terms of a smaller number,
, of hypothetical variables, or factors,
. These are assumed to be independent and to have unit variance. The relationship between the observed variables and the factors is given by the model:
where
, for
and
, are the factor loadings and
, for
, are independent random variables with variances
, for
. The
represent the unique component of the variation of each observed variable. The proportion of variation for each variable accounted for by the factors is known as the communality. For this routine it is assumed that both the
factors and the
's follow independent Normal distributions.
The model for the variance-covariance matrix,
, can be written as:
where
is the matrix of the factor loadings,
, and
is a diagonal matrix of unique variances,
, for
.
The estimation of the parameters of the model,
and
, by maximum likelihood is described by
Lawley and Maxwell (1971). The log-likelihood is:
where
is the number of observations,
is the sample variance-covariance matrix or, if weights are used,
is the weighted sample variance-covariance matrix and
is the effective number of observations, that is, the sum of the weights. The constant is independent of the parameters of the model. A two stage maximization is employed. It makes use of the function
, which is, up to a constant,
times the log-likelihood maximized over
. This is then minimized with respect to
to give the estimates,
, of
. The function
can be written as:
where values
, for
are the eigenvalues of the matrix:
The estimates
, of
, are then given by scaling the eigenvectors of
, which are denoted by
:
where
is the diagonal matrix with elements
, and
is the identity matrix.
The minimization of
is performed using
E04LBF which uses a modified Newton algorithm. The computation of the Hessian matrix is described by
Clark (1970). However, instead of using the eigenvalue decomposition of the matrix
as described above, the singular value decomposition of the matrix
is used, where
is obtained either from the
decomposition of the (scaled) mean centred data matrix or from the Cholesky decomposition of the correlation/covariance matrix. The routine
E04LBF ensures that the values of
are greater than a given small positive quantity,
, so that the communality is always less than one. This avoids the so called Heywood cases.
In addition to the values of
,
and the communalities, G03CAF returns the residual correlations, i.e., the off-diagonal elements of
where
is the sample correlation matrix. G03CAF also returns the test statistic:
which can be used to test the goodness-of-fit of the model
(1), see
Lawley and Maxwell (1971) and
Morrison (1967).
4 References
Clark M R B (1970) A rapidly convergent method for maximum likelihood factor analysis British J. Math. Statist. Psych.
Hammarling S (1985) The singular value decomposition in multivariate statistics SIGNUM Newsl. 20(3) 2–25
Lawley D N and Maxwell A E (1971) Factor Analysis as a Statistical Method (2nd Edition) Butterworths
Morrison D F (1967) Multivariate Statistical Methods McGraw–Hill
5 Parameters
- 1: MATRIX – CHARACTER(1)Input
On entry: selects the type of matrix on which factor analysis is to be performed.
- The data matrix will be input in X and factor analysis will be computed for the correlation matrix.
- The data matrix will be input in X and factor analysis will be computed for the covariance matrix, i.e., the results are scaled as described in Section 8.
- The correlation/variance-covariance matrix will be input in X and factor analysis computed for this matrix.
Constraint:
, or .
- 2: WEIGHT – CHARACTER(1)Input
On entry: if
or
,
WEIGHT indicates if weights are to be used.
- No weights are used.
- Weights are used and must be supplied in WT.
Note: if
,
WEIGHT is not referenced.
Constraint:
if or , or .
- 3: N – INTEGERInput
On entry: if
or
the number of observations in the data array
X.
If
the (effective) number of observations used in computing the (possibly weighted) correlation/variance-covariance matrix input in
X.
Constraint:
.
- 4: M – INTEGERInput
On entry: the number of variables in the data/correlation/variance-covariance matrix.
Constraint:
.
- 5: X(LDX,M) – REAL (KIND=nag_wp) arrayInput
On entry: the input matrix.
If
or
,
X must contain the data matrix, i.e.,
must contain the
th observation for the
th variable, for
and
.
If
,
X must contain the correlation or variance-covariance matrix. Only the upper triangular part is required.
- 6: LDX – INTEGERInput
On entry: the first dimension of the array
X as declared in the (sub)program from which G03CAF is called.
Constraints:
- if or , ;
- if , .
- 7: NVAR – INTEGERInput
On entry: , the number of variables in the factor analysis.
Constraint:
.
- 8: ISX(M) – INTEGER arrayInput
On entry:
indicates whether or not the
th variable is included in the factor analysis. If
, the variable represented by the
th column of
X is included in the analysis; otherwise it is excluded, for
.
Constraint:
for
NVAR values of
.
- 9: NFAC – INTEGERInput
On entry: , the number of factors.
Constraint:
.
- 10: WT() – REAL (KIND=nag_wp) arrayInput
-
Note: the dimension of the array
WT
must be at least
if
and
or
, and at least
otherwise.
On entry: if
and
or
,
WT must contain the weights to be used in the factor analysis. The effective number of observations in the analysis will then be the sum of weights. If
, the
th observation is not included in the analysis.
If
or
,
WT is not referenced and the effective number of observations is
.
Constraint:
if , , , for .
- 11: E(NVAR) – REAL (KIND=nag_wp) arrayOutput
On exit: the eigenvalues
, for .
- 12: STAT() – REAL (KIND=nag_wp) arrayOutput
On exit: the test statistics.
- Contains the value .
- Contains the test statistic, .
- Contains the degrees of freedom associated with the test statistic.
- Contains the significance level.
- 13: COM(NVAR) – REAL (KIND=nag_wp) arrayOutput
On exit: the communalities.
- 14: PSI(NVAR) – REAL (KIND=nag_wp) arrayOutput
On exit: the estimates of
, for .
- 15: RES() – REAL (KIND=nag_wp) arrayOutput
On exit: the residual correlations. The residual correlation for the th and th variables is stored in , .
- 16: FL(LDFL,NFAC) – REAL (KIND=nag_wp) arrayOutput
On exit: the factor loadings.
contains , for and .
- 17: LDFL – INTEGERInput
On entry: the first dimension of the array
FL as declared in the (sub)program from which G03CAF is called.
Constraint:
.
- 18: IOP() – INTEGER arrayInput
On entry: options for the optimization. There are four options to be set:
| controls iteration monitoring; |
| if , then there is no printing of information else if , then information is printed at every iprint iterations. The information printed consists of the value of at that iteration, the number of evaluations of , the current estimates of the communalities and an indication of whether or not they are at the boundary. |
| the maximum number of function evaluations. |
| the required accuracy for the estimates of . |
| a lower bound for the values of , see Section 3. |
Let
then if
, then the following default values are used:
If
, then
- where
- where
Constraint:
if , must be such that , and , for .
- 19: IWK() – INTEGER arrayWorkspace
- 20: WK(LWK) – REAL (KIND=nag_wp) arrayWorkspace
- 21: LWK – INTEGERInput
On entry: the dimension of the array
WK as declared in the (sub)program from which G03CAF is called. The length of the workspace.
Constraints:
- if or , ;
- if , .
- 22: IFAIL – INTEGERInput/Output
-
On entry:
IFAIL must be set to
,
. If you are unfamiliar with this parameter you should refer to
Section 3.3 in the Essential Introduction for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value
is recommended. If the output of error messages is undesirable, then the value
is recommended. Otherwise, because for this routine the values of the output parameters may be useful even if
on exit, the recommended value is
.
When the value is used it is essential to test the value of IFAIL on exit.
On exit:
unless the routine detects an error or a warning has been flagged (see
Section 6).
6 Error Indicators and Warnings
If on entry
or
, explanatory error messages are output on the current error message unit (as defined by
X04AAF).
Note: G03CAF may return useful information for one or more of the following detected errors or warnings.
Errors or warnings detected by the routine:
On entry, | , |
or | , |
or | , |
or | , |
or | , |
or | , |
or | or and , |
or | and , |
or | , or , |
or | or and or , |
or | and is such that , |
or | and is such that , |
or | and is such that , |
or | and is such that , |
or | and is such that , |
or | and , |
or | or and . |
On entry, | and a value of . |
On entry, there are not exactly
NVAR elements of
, or the effective number of observations
.
On entry, or and the data matrix is not of full column rank, or and the input correlation/variance-covariance matrix is not positive definite.
This exit may also be caused by two of the eigenvalues of
being equal; this is rare (see
Lawley and Maxwell (1971)), and may be due to the data/correlation matrix being almost singular.
A singular value decomposition has failed to converge. This is a very unlikely error exit.
The estimation procedure has failed to converge in the given number of iterations. Change
IOP to either increase number of iterations
or increase the value of
.
The convergence is not certain but a lower point could not be found. See
E04LBF for further details. In this case all results are computed.
7 Accuracy
The accuracy achieved is discussed in
E04LBF with the value of the parameter
XTOL given by
as described in parameter
IOP.
The factor loadings may be orthogonally rotated by using
G03BAF and factor score coefficients can be computed using
G03CCF. The maximum likelihood estimators are invariant to a change in scale. This means that the results obtained will be the same (up to a scaling factor) if either the correlation matrix or the variance-covariance matrix is used. As the correlation matrix ensures that all values of
are between
and
it will lead to a more efficient optimization. In the situation when the data matrix is input the results are always computed for the correlation matrix and then scaled if the results for the covariance matrix are required. When you input the covariance/correlation matrix the input matrix itself is used and you are advised to input the correlation matrix rather than the covariance matrix.
9 Example
This example is taken from
Lawley and Maxwell (1971). The correlation matrix for nine variables is input and the parameters of a factor analysis model with three factors are estimated and printed.
9.1 Program Text
Program Text (g03cafe.f90)
9.2 Program Data
Program Data (g03cafe.d)
9.3 Program Results
Program Results (g03cafe.r)