NAG FL Interface
g11saf (binary)

1 Purpose

g11saf fits a latent variable model (with a single factor) to data consisting of a set of measurements on individuals in the form of binary-valued sequences (generally referred to as score patterns). Various measures of goodness-of-fit are calculated along with the factor (theta) scores.

2 Specification

Fortran Interface
Subroutine g11saf ( ip, n, gprob, ns, x, ldx, irl, a, c, iprint, cgetol, maxit, chisqr, ishow, niter, alpha, pigam, cm, ldcm, g, expp, ldexpp, obs, exf, y, xl, iob, rlogl, chi, idf, siglev, w, lw, ifail)
Integer, Intent (In) :: ip, n, ns, ldx, iprint, maxit, ishow, ldcm, ldexpp, lw
Integer, Intent (Inout) :: irl(ns), ifail
Integer, Intent (Out) :: niter, iob(ns), idf
Real (Kind=nag_wp), Intent (In) :: cgetol
Real (Kind=nag_wp), Intent (Inout) :: a(ip), c(ip), cm(ldcm,2*ip), expp(ldexpp,ip), obs(ldexpp,ip)
Real (Kind=nag_wp), Intent (Out) :: alpha(ip), pigam(ip), g(2*ip), exf(ns), y(ns), xl(ns), rlogl, chi, siglev, w(lw)
Logical, Intent (In) :: gprob, chisqr
Logical, Intent (Inout) :: x(ldx,ip)
C Header Interface
#include <nag.h>
void  g11saf_ (const Integer *ip, const Integer *n, const logical *gprob, const Integer *ns, logical x[], const Integer *ldx, Integer irl[], double a[], double c[], const Integer *iprint, const double *cgetol, const Integer *maxit, const logical *chisqr, const Integer *ishow, Integer *niter, double alpha[], double pigam[], double cm[], const Integer *ldcm, double g[], double expp[], const Integer *ldexpp, double obs[], double exf[], double y[], double xl[], Integer iob[], double *rlogl, double *chi, Integer *idf, double *siglev, double w[], const Integer *lw, Integer *ifail)
The routine may be called by the names g11saf or nagf_contab_binary.

3 Description

Given a set of p dichotomous variables x~=x1,x2,,xp, where denotes vector or matrix transpose, the objective is to investigate whether the association between them can be adequately explained by a latent variable model of the form (see Bartholomew (1980) and Bartholomew (1987))
Gπiθ=αi0+αi1θ. (1)
The xi are called item responses and take the value 0 or 1. θ denotes the latent variable assumed to have a standard Normal distribution over a population of individuals to be tested on p items. Call πiθ=Pxi=1θ the item response function: it represents the probability that an individual with latent ability θ will produce a positive response (1) to item i. αi0 and αi1 are item parameters which can assume any real values. The set of parameters, αi1, for i=1,2,,p, being coefficients of the unobserved variable θ, can be interpreted as ‘factor loadings’.
G is a function selected by you as either Φ-1 or logit, mapping the interval 0,1 onto the whole real line. Data from a random sample of n individuals takes the form of the matrices X and R defined below:
Xs×p= x11 x12 x1p x21 x22 x2p xs1 xs2 xsp = x~1 x~2 x~s ,  Rs×1= r1 r2 rs  
where x~l = xl1,xl2,,xlp denotes the lth score pattern in the sample, rl the frequency with which x~l occurs and s the number of different score patterns observed. (Thus l=1srl=n). It can be shown that the log-likelihood function is proportional to
l=1 s rl logPl ,  
where
Pl = P x~ = x~l = - P x~ = x~l θ ϕθ dθ (2)
(ϕθ being the probability density function of a standard Normal random variable).
Pl denotes the unconditional probability of observing score pattern x~l. The integral in (2) is approximated using Gauss–Hermite quadrature. If we take Gz=logitz=logz1-z in (1) and reparameterise as follows,
αi = αi1, πi = logit-1αi0,  
then (1) reduces to the logit model (see Bartholomew (1980))
πiθ = πi πi + 1-πi exp - αi θ .  
If we take Gz=Φ-1z (where Φ is the cumulative distribution function of a standard Normal random variable) and reparameterise as follows,
αi = αi11+αi12 γi = -αi01+αi12 ,  
then (1) reduces to the probit model (see Bock and Aitkin (1981))
πiθ=ϕ αiθ-γi 1-αi2 .  
An E-M algorithm (see Bock and Aitkin (1981)) is used to maximize the log-likelihood function. The number of quadrature points used is set initially to 10 and once convergence is attained increased to 20.
The theta score of an individual responding in score pattern x~l is computed as the posterior mean, i.e., Eθx~l. For the logit model the component score Xl=j=1pαjxlj is also calculated. (Note that in calculating the theta scores and measures of goodness-of-fit g11saf automatically reverses the coding on item j if αj<0; it is assumed in the model that a response at the one level is showing a higher measure of latent ability than a response at the zero level.)
The frequency distribution of score patterns is required as input data. If your data is in the form of individual score patterns (uncounted), then g11sbf may be used to calculate the frequency distribution.

4 References

Bartholomew D J (1980) Factor analysis for categorical data (with Discussion) J. Roy. Statist. Soc. Ser. B 42 293–321
Bartholomew D J (1987) Latent Variable Models and Factor Analysis Griffin
Bock R D and Aitkin M (1981) Marginal maximum likelihood estimation of item parameters: Application of an E-M algorithm Psychometrika 46 443–459

5 Arguments

1: ip Integer Input
On entry: p, the number of dichotomous variables.
Constraint: ip3.
2: n Integer Input
On entry: n, the number of individuals in the sample.
Constraint: n7.
3: gprob Logical Input
On entry: must be set equal to .TRUE. if Gz=Φ-1z and .FALSE. if Gz=logitz.
4: ns Integer Input
On entry: ns must be set equal to the number of different score patterns in the sample, s.
Constraint: 2×ip<nsmin2ip,n.
5: xldxip Logical array Input/Output
On entry: the first s rows of x must contain the s different score patterns. The lth row of x must contain the lth score pattern with xlj set equal to .TRUE. if xlj=1 and .FALSE. if xlj=0. All rows of x must be distinct.
On exit: given a valid parameter set then the first s rows of x still contain the s different score patterns. However, the following points should be noted:
  1. (i)If the estimated factor loading for the jth item is negative then that item is re-coded, i.e., 0s and 1s (or .TRUE. and .FALSE.) in the jth column of x are interchanged.
  2. (ii)The rows of x will be reordered so that the theta scores corresponding to rows of x are in increasing order of magnitude.
6: ldx Integer Input
On entry: the first dimension of the array x as declared in the (sub)program from which g11saf is called.
Constraint: ldxns.
7: irlns Integer array Input/Output
On entry: the ith component of irl must be set equal to the frequency with which the ith row of x occurs.
Constraints:
  • irli0, for i=1,2,,s;
  • i=1 sirli=n.
On exit: given a valid parameter set then the first s components of irl are reordered as are the rows of x.
8: aip Real (Kind=nag_wp) array Input/Output
On entry: aj must be set equal to an initial estimate of αj1. In order to avoid divergence problems with the E-M algorithm you are strongly advised to set all the aj to 0.5.
On exit: aj contains the latest estimate of αj1, for j=1,2,,p. (Because of possible recoding all elements of a will be positive.)
9: cip Real (Kind=nag_wp) array Input/Output
On entry: cj must be set equal to an initial estimate of αj0. In order to avoid divergence problems with the E-M algorithm you are strongly advised to set all the cj to 0.0.
On exit: cj contains the latest estimate of αj0, for j=1,2,,p.
10: iprint Integer Input
On entry: the frequency with which the maximum likelihood search routine is to be monitored.
iprint>0
The search is monitored once every iprint iterations, and when the number of quadrature points is increased, and again at the final solution point.
iprint=0
The search is monitored once at the final point.
iprint<0
The search is not monitored at all.
iprint should normally be set to a small positive number.
Suggested value: iprint=1.
11: cgetol Real (Kind=nag_wp) Input
On entry: the accuracy to which the solution is required.
If cgetol is set to 10-l and on exit ifail=0 or 7, then all elements of the gradient vector will be smaller than 10-l in absolute value. For most practical purposes the value 10-4 should suffice. You should be wary of setting cgetol too small since the convergence criterion may then have become too strict for the machine to handle.
If cgetol has been set to a value which is less than the square root of the machine precision, ε, then g11saf will use the value ε instead.
12: maxit Integer Input
On entry: the maximum number of iterations to be made in the maximum likelihood search. There will be an error exit (see Section 6) if the search routine has not converged in maxit iterations.
Suggested value: maxit=1000.
Constraint: maxit1.
13: chisqr Logical Input
On entry: if chisqr is set equal to .TRUE., a likelihood ratio statistic will be calculated (see chi).
If chisqr is set equal to .FALSE., no such statistic will be calculated.
14: ishow Integer Input
On entry: indicates which of the following three quantities are to be printed before exit from the routine (given a valid parameter set):
  1. (a)Table of maximum likelihood estimates and standard errors (as returned in the output arrays a, c, alpha, pigam and cm).
  2. (b)Table of observed and expected first- and second-order margins (as returned in the output arrays expp and obs).
  3. (c)Table of observed and expected frequencies of score patterns along with theta scores (as returned in the output arrays irl, exf, y, xl and iob) and the likelihood ratio statistic (if required).
ishow=0
None of the above are printed.
ishow=1
(a) only is printed.
ishow=2
(b) only is printed.
ishow=3
(c) only is printed.
ishow=4
(a) and (b) are printed.
ishow=5
(a) and (c) are printed.
ishow=6
(b) and (c) are printed.
ishow=7
(a), (b) and (c) are printed.
Constraint: 0ishow7.
15: niter Integer Output
On exit: given a valid parameter set then niter contains the number of iterations performed by the maximum likelihood search routine.
16: alphaip Real (Kind=nag_wp) array Output
On exit: given a valid parameter set then alphaj contains the latest estimate of αj. (Because of possible recoding all elements of alpha will be positive.)
17: pigamip Real (Kind=nag_wp) array Output
On exit: given a valid parameter set then pigamj contains the latest estimate of either πj if gprob=.FALSE. (logit model) or γj if gprob=.TRUE. (probit model).
18: cmldcm2×ip Real (Kind=nag_wp) array Output
On exit: given a valid parameter set then the strict lower triangle of cm contains the correlation matrix of the parameter estimates held in alpha and pigam on exit. The diagonal elements of cm contain the standard errors. Thus:
cm2×i-12×i-1 = standard error alphai
cm2×i2×i = standard error pigami
cm2×i2×i-1 = correlation pigami,alphai,
for i=1,2,,p;
cm2×i-12×j-1 = correlation alphai,alphaj
cm2×i2×j = correlation pigami,pigamj
cm2×i-12×j = correlation alphai,pigamj
cm2×i2×j-1 = correlation alphaj,pigami,
for j=1,2,,i-1.
If the second derivative matrix cannot be computed then all the elements of cm are returned as zero.
19: ldcm Integer Input
On entry: the first dimension of the array cm as declared in the (sub)program from which g11saf is called.
Constraint: ldcm2×ip.
20: g2×ip Real (Kind=nag_wp) array Output
On exit: given a valid parameter set then g contains the estimated gradient vector corresponding to the final point held in the arrays alpha and pigam. g2×j-1 contains the derivative of the log-likelihood with respect to alphaj, for j=1,2,,p. g2×j contains the derivative of the log-likelihood with respect to pigamj, for j=1,2,,p.
21: exppldexppip Real (Kind=nag_wp) array Output
On exit: given a valid parameter set then exppij contains the expected percentage of individuals in the sample who respond positively to items i and j (ji), corresponding to the final point held in the arrays alpha and pigam.
22: ldexpp Integer Input
On entry: the first dimension of the array obs and the first dimension of the array expp as declared in the (sub)program from which g11saf is called.
Constraint: ldexppip.
23: obsldexppip Real (Kind=nag_wp) array Output
On exit: given a valid parameter set then obsij contains the observed percentage of individuals in the sample who responded positively to items i and j (ji).
24: exfns Real (Kind=nag_wp) array Output
On exit: given a valid parameter set then exfl contains the expected frequency of the lth score pattern (lth row of x), corresponding to the final point held in the arrays alpha and pigam.
25: yns Real (Kind=nag_wp) array Output
On exit: given a valid parameter set then yl contains the estimated theta score corresponding to the lth row of x, for the final point held in the arrays alpha and pigam.
26: xlns Real (Kind=nag_wp) array Output
On exit: if gprob has been set equal to .FALSE. (logit model) then, given a valid parameter set, xll contains the estimated component score corresponding to the lth row of x for the final point held in the arrays alpha and pigam.
If gprob is set equal to .TRUE. (probit model), this array is not used.
27: iobns Integer array Output
On exit: given a valid parameter set then iobl contains the number of items in the lth row of x for which the response was positive (.TRUE.).
28: rlogl Real (Kind=nag_wp) Output
On exit: given a valid parameter set then rlogl contains the value of the log-likelihood kernel corresponding to the final point held in the arrays alpha and pigam, namely
l=1 sirll×logexfl/n.  
29: chi Real (Kind=nag_wp) Output
On exit: if chisqr was set equal to .TRUE. on entry, then given a valid parameter set, chi will contain the value of the likelihood ratio statistic corresponding to the final parameter estimates held in the arrays alpha and pigam, namely
2×l=1 sirll×logexfl/irll.  
The summation is over those elements of irl which are positive. If exfl is less than 5.0, then adjacent score patterns are pooled (the score patterns in x being first put in order of increasing theta score).
If chisqr has been set equal to .FALSE., then chi is not used.
30: idf Integer Output
On exit: if chisqr was set equal to .TRUE. on entry, then given a valid parameter set, idf will contain the degrees of freedom associated with the likelihood ratio statistic, chi.
idf=s0-2×p if s0<2p;
idf=s0-2×p-1 if s0=2p,
where s0 denotes the number of terms summed to calculate chi (s0=s only if there is no pooling).
If chisqr has been set equal to .FALSE., idf is not used.
31: siglev Real (Kind=nag_wp) Output
On exit: if chisqr was set equal to .TRUE. on entry, then given a valid parameter set, siglev will contain the significance level of chi based on idf degrees of freedom. If idf is zero or negative then siglev is set to zero.
If chisqr was set equal to .FALSE., siglev is not used.
32: wlw Real (Kind=nag_wp) array Workspace
33: lw Integer Input
On entry: the dimension of the array w as declared in the (sub)program from which g11saf is called.
Constraint: lw4×ip×ip+16.
34: ifail Integer Input/Output
On entry: ifail must be set to 0, -1 or 1. If you are unfamiliar with this argument you should refer to Section 4 in the Introduction to the NAG Library FL Interface for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value -1 or 1 is recommended. If the output of error messages is undesirable, then the value 1 is recommended. Otherwise, because for this routine the values of the output arguments may be useful even if ifail0 on exit, the recommended value is -1. When the value -1 or 1 is used it is essential to test the value of ifail on exit.
On exit: ifail=0 unless the routine detects an error or a warning has been flagged (see Section 6).

6 Error Indicators and Warnings

If on entry ifail=0 or -1, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
Note: in some cases g11saf may return useful information.
ifail=1
On entry, i=value and irli=value.
Constraint: irli0.
On entry, i=value and j=value.
Constraint: rows i and j of x should not be identical.
On entry, ip=value.
Constraint: ip3.
On entry, ishow=value.
Constraint: ishow=0, 1, 2, 3, 4, 5, 6 or 7.
On entry, ldcm=value and ip=value.
Constraint: ldcm2×ip.
On entry, ldexpp=value and ip=value.
Constraint: ldexppip
On entry, ldx=value and ns=value.
Constraint: ldxns.
On entry, lw=value and minimum value for lw=value.
Constraint: lw4×ip×ip+16.
On entry, maxit=value.
Constraint: maxit1.
On entry, n=value.
Constraint: n7.
On entry, ns=value and ip=value.
Constraint: ns>2×ip.
On entry, ns=value and ip=value.
Constraint: ns2ip.
On entry, ns=value and n=value.
Constraint: nsn.
On entry, iirli=value and n=value.
Constraint: iirli=n.
ifail=2
For at least one of the ip items the responses are all at the same level.
ifail=3
maxit iterations have been performed: maxit=value. If steady increases in the log-likelihood kernel were monitored up to the point where this exit occurred, the exit probably occurred simply because maxit was set too small, so the calculations should be restarted from the final point held in a and c. This type of exit may also indicate that there is no maximum to the likelihood surface.
ifail=4
One of the elements of a has exceeded 10 in absolute. This is the Heywood case as described in Section 9.3. If steady increases in the log-likelihood kernel were monitored up to the point where this exit occurred then this exit may indicate that there is no maximum to the likelihood surface. You are advised to restart the calculations from a different point to see whether the E-M algorithm moves off in the same direction.
ifail=5
Failure to invert Hessian matrix and maxit iterations made: maxit=value. This indicates a failure to invert the second derivative matrix for calculating the variance-covariance matrix of parameter estimates in the specified number of iterations. The elements of cm have been set to zero. Try restarting the calculations with a larger value for maxit.
ifail=6
Failure to invert Hessian matrix plus Heywood case encountered. This indicates a failure to invert the second derivative matrix for calculating the variance-covariance matrix of parameter estimates. In addition, an element of a has exceeded 10 in absolute value. The elements of cm will have then been set to zero on exit. You are advised to restart the calculations from a different point to see whether the E-M algorithm moves off in the same direction.
ifail=7
χ2 statistic has less than one degree of freedom. The χ2 statistic is meaningless and siglev is set to zero. All other returned information should be correct.
ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
ifail=-399
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
ifail=-999
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

7 Accuracy

On exit from g11saf if ifail=0 or 7 then the following condition will be satisfied:
max 1i2×p gi < cgetol .  
If ifail=3 or 5 on exit (i.e., maxit iterations have been performed but the above condition does not hold), then the elements in a, c, alpha and pigam may still be good approximations to the maximum likelihood estimates. You are advised to inspect the elements of g to see whether this is confirmed.

8 Parallelism and Performance

g11saf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
g11saf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9 Further Comments

9.1 Timing

The number of iterations required in the maximum likelihood search depends upon the number of observed variables, p, and the distance of the starting point you supplied from the solution. The number of multiplications and divisions performed in an iteration is proportional to p.

9.2 Initial Estimates

You are strongly advised to use the recommended starting values for the elements of a and c. Divergence may result from values you supplied even if they are very close to the solution. Divergence may also occur when an item has nearly all its responses at one level.

9.3 Heywood Cases

As in normal factor analysis, Heywood cases can often occur, particularly when p is small and n not very big. To overcome this difficulty the maximum likelihood search routine is terminated when the absolute value of one of the αj1 exceeds 10.0. You have the option of deciding whether to exit from g11saf (by setting ifail=0 on entry) or to permit g11saf to proceed onwards as if it had exited normally from the maximum likelihood search routine (setting ifail=-1 on entry). The elements in a, c, alpha and pigam may still be good approximations to the maximum likelihood estimates. You are advised to inspect the elements g to see whether this is confirmed.

9.4 Goodness of Fit Statistic

When n is not very large compared to s a goodness-of-fit statistic should not be calculated as many of the expected frequencies will then be less than 5.

9.5 First and Second Order Margins

The observed and expected percentages of sample members responding to individual and pairs of items held in the arrays obs and expp on exit can be converted to observed and expected numbers by multiplying all elements of these two arrays by n/100.0.

10 Example

A program to fit the logit latent variable model to the following data:
Index Score Pattern Observed Frequency
01 0000 0154
02 1000 0011
03 0001 0042
04 0100 0049
05 1001 0002
06 1100 0010
07 0101 0027
08 0010 0084
09 1101 0010
10 1010 0025
11 0011 0075
12 0110 0129
13 1011 0030
14 1110 0050
15 0111 0181
16 1111 0121
    ––––
Total   1000

10.1 Program Text

Program Text (g11safe.f90)

10.2 Program Data

Program Data (g11safe.d)

10.3 Program Results

Program Results (g11safe.r)