The routine may be called by the names g11baf or nagf_contab_tabulate_stat.
3Description
A dataset may include both classification variables and general variables. The classification variables, known as factors, take a small number of values known as levels. For example, the factor sex would have the levels male and female. These can be coded as and respectively. Given several factors, a multi-way table can be constructed such that each cell of the table represents one level from each factor. For example, the two factors sex and habitat, habitat having three levels (inner-city, suburban and rural) define the contingency table
Habitat
Sex
Inner-city
Suburban
Rural
Male
Female
For each cell statistics can be computed. If a third variable in the dataset was age, then for each cell the average age could be computed:
Habitat
Sex
Inner-city
Suburban
Rural
Male
25.5
30.3
35.6
Female
23.2
29.1
30.4
That is the average age for all observations for males living in rural areas is . Other statistics can also be computed: the number of observations, the total, the variance, the largest value and the smallest value.
g11baf computes a table for one of the selected statistics. The factors have to be coded with levels . Weights can be used to eliminate values from the calculations, e.g., if they represent ‘missing values’. There is also the facility to update an existing table with the addition of new observations.
4References
John J A and Quenouille M H (1977) Experiments: Design and Analysis Griffin
Kendall M G and Stuart A (1969) The Advanced Theory of Statistics (Volume 1) (3rd Edition) Griffin
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM22 532–555
5Arguments
1: – Character(1)Input
On entry: indicates which statistic is to be computed for the table cells.
The largest value for the variable in y for each cell.
The smallest value for the variable in y for each cell.
Constraint:
, , , , or .
2: – Character(1)Input
On entry: indicates if an existing table is to be updated by further observation.
The table cells will be initialized to zero before tabulations take place.
The table input in table will be updated. The arguments ncells, table, icount and auxt must remain unchanged from the previous call to g11baf.
Constraint:
or .
3: – Character(1)Input
On entry: indicates if weights are to be used.
Weights are not used and unit weights are assumed.
or
Weights are used and must be supplied in wt. The only difference between and is if the variance is computed.
The divisor for the variance is the sum of the weights minus one and if , the divisor is the number of observations with nonzero weights minus one. The former is useful if the weights represent the frequency of the observed values.
If or , the weighted total or mean is computed respectively.
If , or , the only effect of weights is to eliminate values with zero weights from the computations.
Constraint:
, or .
4: – IntegerInput
On entry: the number of observations.
Constraint:
.
5: – IntegerInput
On entry: the number of classifying factors in ifac.
Constraint:
.
6: – Integer arrayInput
On entry: indicates which factors in ifac are to be used in the tabulation.
If the th factor in ifac is included in the tabulation.
Note that if
, for then the statistic for the whole sample is calculated and returned in a table.
7: – Integer arrayInput
On entry: the number of levels of the classifying factors in ifac.
Constraint:
if , , for .
8: – Integer arrayInput
On entry: the nfac coded classification factors for the n observations.
Constraint:
, for and .
9: – IntegerInput
On entry: the first dimension of the array ifac as declared in the (sub)program from which g11baf is called.
Constraint:
.
10: – Real (Kind=nag_wp) arrayInput
On entry: the variable to be tabulated. If , y is not referenced.
11: – Real (Kind=nag_wp) arrayInput
Note: the dimension of the array wt
must be at least
if or , and at least otherwise.
On entry: if or , wt must contain the n weights. Otherwise wt is not referenced.
Constraint:
if or , , for .
12: – Real (Kind=nag_wp) arrayInput/Output
On entry: if , table must be unchanged from the previous call to g11baf, otherwise table need not be set.
On exit: the computed table. The ncells cells of the table are stored so that for any two factors the index relating to the factor referred to later in lfac and ifac changes faster. For further details see Section 9.
13: – IntegerInput
On entry: the maximum size of the table to be computed.
Constraint:
product of the levels of the factors included in the tabulation.
14: – IntegerInput/Output
On entry: if , ncells must be unchanged from the previous call to g11baf, otherwise ncells need not be set.
On exit: the number of cells in the table.
15: – IntegerOutput
On exit: the number of factors defining the table.
16: – Integer arrayOutput
On exit: the first ndim elements contain the number of levels for the factors defining the table.
17: – Integer arrayInput/Output
On entry: if , icount must be unchanged from the previous call to g11baf, otherwise icount need not be set.
On exit: a table containing the number of observations contributing to each cell of the table, stored identically to table. Note if this is the same as is returned in table.
18: – Real (Kind=nag_wp) arrayInput/Output
Note: the dimension of the array auxt
must be at least
if and at least if .
On entry: if , auxt must be unchanged from the previous call to g11baf, otherwise auxt need not be set.
On exit: if or , the first ncells values hold the table containing the sum of the weights for the observations contributing to each cell, stored identically to table.
If , the second set of ncells values hold the table of cell means. Otherwise auxt is not referenced.
19: – Integer arrayWorkspace
20: – IntegerInput/Output
On entry: ifail must be set to , or to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of means that an error message is printed while a value of means that it is not.
If halting is not appropriate, the value or is recommended. If message printing is undesirable, then the value is recommended. Otherwise, the value is recommended. When the value or is used it is essential to test the value of ifail on exit.
On exit: unless the routine detects an error or a warning has been flagged (see Section 6).
6Error Indicators and Warnings
If on entry or , explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
On entry, and .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: , , , , or .
On entry, .
Constraint: or .
On entry, .
Constraint: , or .
On entry, , , and .
Constraint: .
On entry, , and .
Constraint: .
On entry, and .
On entry, .
On entry, and .
Constraint: .
On entry, and minimum value for .
Constraint: of the levels of the factors included in the tabulation.
An unexpected error has been triggered by this routine. Please
contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.
7Accuracy
Only applicable when . In this case a one pass algorithm is used as described by West (1979).
8Parallelism and Performance
Background information to multithreading can be found in the Multithreading documentation.
g11baf is not threaded in any implementation.
9Further Comments
The tables created by g11baf and stored in table, icount and, depending on stat, also in auxt are stored in the following way. Let there be factors defining the table with factor having levels, then the cell defined by the levels , of the factors is stored in the th cell given by
where , for and .
10Example
The data, given by John and Quenouille (1977), is for a factorial experiment in blocks of units. The data is input in the order, blocks, factor with levels, factor with levels, yield. The table of treatment means for yield over blocks is computed and printed.