naginterfaces.library.contab.binary

naginterfaces.library.contab.binary(n, gprob, x, irl, a, c, cgetol, chisqr, iprint=1, maxit=1000, ishow=0, io_manager=None)[source]

binary fits a latent variable model (with a single factor) to data consisting of a set of measurements on individuals in the form of binary-valued sequences (generally referred to as score patterns). Various measures of goodness-of-fit are calculated along with the factor (theta) scores.

For full information please refer to the NAG Library document for g11sa

https://support.nag.com/numeric/nl/nagdoc_30/flhtml/g11/g11saf.html

Parameters
nint

, the number of individuals in the sample.

gprobbool

Must be set equal to if and if .

xbool, array-like, shape

The first rows of must contain the different score patterns. The th row of must contain the th score pattern with set equal to if and if . All rows of must be distinct.

irlint, array-like, shape

The th component of must be set equal to the frequency with which the th row of occurs.

afloat, array-like, shape

must be set equal to an initial estimate of . In order to avoid divergence problems with the E-M algorithm you are strongly advised to set all the to .

cfloat, array-like, shape

must be set equal to an initial estimate of . In order to avoid divergence problems with the E-M algorithm you are strongly advised to set all the to .

cgetolfloat

The accuracy to which the solution is required.

If is set to and on exit the function exits successfully or = 7, then all elements of the gradient vector will be smaller than in absolute value.

For most practical purposes the value should suffice.

You should be wary of setting too small since the convergence criterion may then have become too strict for the machine to handle.

If has been set to a value which is less than the square root of the machine precision, , then binary will use the value instead.

chisqrbool

If is set equal to , a likelihood ratio statistic will be calculated (see ).

If is set equal to , no such statistic will be calculated.

iprintint, optional

The frequency with which the maximum likelihood search function is to be monitored.

The search is monitored once every iterations, and when the number of quadrature points is increased, and again at the final solution point.

The search is monitored once at the final point.

The search is not monitored at all.

should normally be set to a small positive number.

maxitint, optional

The maximum number of iterations to be made in the maximum likelihood search. There will be an error exit (see Exceptions) if the search function has not converged in iterations.

ishowint, optional

Indicates which of the following three quantities are to be printed before exit from the function (given a valid parameter set):

  1. Table of maximum likelihood estimates and standard errors (as returned in the output arrays , , , and ).

  2. Table of observed and expected first - and second-order margins (as returned in the output arrays and ).

  3. Table of observed and expected frequencies of score patterns along with theta scores (as returned in the output arrays , , , and ) and the likelihood ratio statistic (if required).

None of the above are printed.

(a) only is printed.

(b) only is printed.

(c) only is printed.

(a) and (b) are printed.

(a) and (c) are printed.

(b) and (c) are printed.

(a), (b) and (c) are printed.

io_managerFileObjManager, optional

Manager for I/O in this routine.

Returns
xbool, ndarray, shape

Given a valid parameter set then the first rows of still contain the different score patterns. However, the following points should be noted:

  1. If the estimated factor loading for the th item is negative then that item is re-coded, i.e., s and s (or and ) in the th column of are interchanged.

  2. The rows of will be reordered so that the theta scores corresponding to rows of are in increasing order of magnitude.

irlint, ndarray, shape

Given a valid parameter set then the first components of are reordered as are the rows of .

afloat, ndarray, shape

contains the latest estimate of , for . (Because of possible recoding all elements of will be positive.)

cfloat, ndarray, shape

contains the latest estimate of , for .

niterint

Given a valid parameter set then contains the number of iterations performed by the maximum likelihood search function.

alphafloat, ndarray, shape

Given a valid parameter set then contains the latest estimate of . (Because of possible recoding all elements of will be positive.)

pigamfloat, ndarray, shape

Given a valid parameter set then contains the latest estimate of either if (logit model) or if (probit model).

cmfloat, ndarray, shape

Given a valid parameter set then the strict lower triangle of contains the correlation matrix of the parameter estimates held in and on exit. The diagonal elements of contain the standard errors. Thus:

=

standard error

=

standard error

=

correlation ,

for ;

=

correlation

=

correlation

=

correlation

=

correlation ,

for .

If the second derivative matrix cannot be computed then all the elements of are returned as zero.

gfloat, ndarray, shape

Given a valid parameter set then contains the estimated gradient vector corresponding to the final point held in the arrays and . contains the derivative of the log-likelihood with respect to , for . contains the derivative of the log-likelihood with respect to , for .

exppfloat, ndarray, shape

Given a valid parameter set then contains the expected percentage of individuals in the sample who respond positively to items and (), corresponding to the final point held in the arrays and .

obsfloat, ndarray, shape

Given a valid parameter set then contains the observed percentage of individuals in the sample who responded positively to items and ().

exffloat, ndarray, shape

Given a valid parameter set then contains the expected frequency of the th score pattern (th row of ), corresponding to the final point held in the arrays and .

yfloat, ndarray, shape

Given a valid parameter set then contains the estimated theta score corresponding to the th row of , for the final point held in the arrays and .

xlfloat, ndarray, shape

If has been set equal to (logit model) then, given a valid parameter set, contains the estimated component score corresponding to the th row of for the final point held in the arrays and .

If is set equal to (probit model), this array is not used.

iobint, ndarray, shape

Given a valid parameter set then contains the number of items in the th row of for which the response was positive ().

rloglfloat

Given a valid parameter set then contains the value of the log-likelihood kernel corresponding to the final point held in the arrays and , namely

chifloat

If was set equal to on entry, then given a valid parameter set, will contain the value of the likelihood ratio statistic corresponding to the final parameter estimates held in the arrays and , namely

The summation is over those elements of which are positive. If is less than , then adjacent score patterns are pooled (the score patterns in being first put in order of increasing theta score).

If has been set equal to , then is not used.

idfint

If was set equal to on entry, then given a valid parameter set, will contain the degrees of freedom associated with the likelihood ratio statistic, .

if ;

if ,

where denotes the number of terms summed to calculate ( only if there is no pooling).

If has been set equal to , is not used.

siglevfloat

If was set equal to on entry, then given a valid parameter set, will contain the significance level of based on degrees of freedom. If is zero or negative then is set to zero.

If was set equal to , is not used.

Raises
NagValueError
(errno )

On entry, and .

Constraint: .

(errno )

On entry, and .

Constraint: .

(errno )

On entry, and .

Constraint: rows and of should not be identical.

(errno )

On entry, and .

Constraint: .

(errno )

On entry, .

Constraint: .

(errno )

On entry, .

Constraint: .

(errno )

On entry, .

Constraint: , , , , , , or .

(errno )

On entry, and .

Constraint: .

(errno )

On entry, and .

Constraint: .

(errno )

On entry, .

Constraint: .

(errno )

For at least one of the items the responses are all at the same level.

(errno )

iterations have been performed: .

(errno )

One of the elements of has exceeded in absolute.

(errno )

Failure to invert Hessian matrix and iterations made: .

(errno )

Failure to invert Hessian matrix plus Heywood case encountered.

Warns
NagAlgorithmicWarning
(errno )

statistic has less than one degree of freedom.

Notes

Given a set of dichotomous variables , where denotes vector or matrix transpose, the objective is to investigate whether the association between them can be adequately explained by a latent variable model of the form (see Bartholomew (1980) and Bartholomew (1987))

The are called item responses and take the value or . denotes the latent variable assumed to have a standard Normal distribution over a population of individuals to be tested on items. Call the item response function: it represents the probability that an individual with latent ability will produce a positive response (1) to item . and are item parameters which can assume any real values. The set of parameters, , for , being coefficients of the unobserved variable , can be interpreted as ‘factor loadings’.

is a function selected by you as either or logit, mapping the interval onto the whole real line. Data from a random sample of individuals takes the form of the matrices and defined below:

where denotes the th score pattern in the sample, the frequency with which occurs and the number of different score patterns observed. (Thus ). It can be shown that the log-likelihood function is proportional to

where

( being the probability density function of a standard Normal random variable).

denotes the unconditional probability of observing score pattern . The integral in (2) is approximated using Gauss–Hermite quadrature. If we take in (1) and reparameterise as follows,

then (1) reduces to the logit model (see Bartholomew (1980))

If we take (where is the cumulative distribution function of a standard Normal random variable) and reparameterise as follows,

then (1) reduces to the probit model (see Bock and Aitkin (1981))

An E-M algorithm (see Bock and Aitkin (1981)) is used to maximize the log-likelihood function. The number of quadrature points used is set initially to and once convergence is attained increased to .

The theta score of an individual responding in score pattern is computed as the posterior mean, i.e., . For the logit model the component score is also calculated. (Note that in calculating the theta scores and measures of goodness-of-fit binary automatically reverses the coding on item if ; it is assumed in the model that a response at the one level is showing a higher measure of latent ability than a response at the zero level.)

The frequency distribution of score patterns is required as input data. If your data is in the form of individual score patterns (uncounted), then binary_service() may be used to calculate the frequency distribution.

References

Bartholomew, D J, 1980, Factor analysis for categorical data (with Discussion), J. Roy. Statist. Soc. Ser. B (42), 293–321

Bartholomew, D J, 1987, Latent Variable Models and Factor Analysis, Griffin

Bock, R D and Aitkin, M, 1981, Marginal maximum likelihood estimation of item parameters: Application of an E-M algorithm, Psychometrika (46), 443–459