NAG CL Interface
g11bac (tabulate_stat)
1
Purpose
g11bac computes a table from a set of classification factors using a selected statistic.
2
Specification
void |
g11bac (Nag_TableStats stat,
Nag_TableUpdate update,
Nag_Weightstype weight,
Integer n,
Integer nfac,
const Integer sf[],
const Integer lfac[],
const Integer factor[],
Integer tdf,
const double y[],
const double wt[],
double table[],
Integer maxt,
Integer *ncells,
Integer *ndim,
Integer idim[],
Integer count[],
double comm_ar[],
NagError *fail) |
|
The function may be called by the names: g11bac, nag_contab_tabulate_stat or nag_tabulate_stats.
3
Description
A dataset may include both classification variables and general variables. The classification variables, known as factors, take a small number of values known as levels. For example, the factor sex would have the levels male and female. These can be coded as 1 and 2 respectively. Given several factors, a multi-way table can be constructed such that each cell of the table represents one level from each factor. For example, the two factors sex and habitat, habitat having three levels: inner-city, suburban and rural, define the 2 by 3 contingency table:
Sex |
Habitat |
|
Inner-city |
Suburban |
Rural |
Male |
|
|
|
Female |
|
|
|
For each cell statistics can be computed. If a third variable in the dataset was age, then for each cell the average age could be computed:
Sex |
Habitat |
|
Inner-city |
Suburban |
Rural |
Male |
25.5 |
30.3 |
35.6 |
Female |
23.2 |
29.1 |
30.4 |
That is the average age for all observations for males living in rural areas is . Other statistics can also be computed: the number of observations, the total, the variance, the largest value and the smallest value.
g11bac computes a table for one of the selected statistics. The factors have to be coded with levels . Weights can be used to eliminate values from the calculations, e.g., if they represent ‘missing values’. There is also the facility to update an existing table with the addition of new observations.
4
References
John J A and Quenouille M H (1977) Experiments: Design and Analysis Griffin
Kendall M G and Stuart A (1969) The Advanced Theory of Statistics (Volume 1) (3rd Edition) Griffin
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555
5
Arguments
-
1:
– Nag_TableStats
Input
-
On entry: indicates which statistic is to be computed for the table cells.
- The number of observations for each cell.
- The total for the variable in y for each cell.
- The average (mean) for the variable in y for each cell.
- The variance for the variable in y for each cell.
- The largest value for the variable in y for each cell.
- The smallest value for the variable in y for each cell.
Constraint:
, , , , or .
-
2:
– Nag_TableUpdate
Input
-
On entry: indicates if an existing table is to be updated by further observation.
- The table cells will be initialized to zero before tabulations take place.
- The table input in table will be updated. The arguments ncells, table, count and comm_ar must remain unchanged from the previous call to g11bac.
Constraint:
or .
-
3:
– Nag_Weightstype
Input
-
On entry: indicates if weights are to be used.
- Weights are not used and unit weights are assumed.
- or
- Weights are used and must be supplied in wt. The only difference between and is if the variance is computed.
- The divisor for the variance is the sum of the weights minus one and if , the divisor is the number of observations with nonzero weights minus one. The former is useful if the weights represent the frequency of the observed values.
If or , the weighted total or mean is computed respectively.
If , or the only effect of weights is to eliminate values with zero weights from the computations.
Constraint:
, or .
-
4:
– Integer
Input
-
On entry: the number of observations.
Constraint:
.
-
5:
– Integer
Input
-
On entry: the number of classifying factors in
factor.
Constraint:
.
-
6:
– const Integer
Input
-
On entry: indicates which factors in
factor are to be used in the tabulation.
If
the
th factor in
factor is included in the tabulation.
Note that if for then the statistic for the whole sample is calculated and returned in a 1 by 1 table.
-
7:
– const Integer
Input
-
On entry: the number of levels of the classifying factors in
factor.
Constraint:
if ,
, for .
-
8:
– const Integer
Input
-
On entry: the
nfac coded classification factors for the
n observations.
Constraint:
, for and .
-
9:
– Integer
Input
-
On entry: the stride separating matrix column elements in the array
factor.
Constraint:
.
-
10:
– const double
Input
-
On entry: the variable to be tabulated.
If
,
y is not referenced.
-
11:
– const double
Input
-
On entry: if
or
,
wt must contain the
n weights. Otherwise
wt is not referenced and can be set to null,
(double *)0.
Constraint:
if or ,
, for .
-
12:
– double
Input/Output
-
On entry: if
,
table must be unchanged from the previous call to
g11bac, otherwise
table need not be set.
On exit: the computed table. The
ncells cells of the table are stored so that for any two factors the index relating to the factor referred to later in
lfac and
factor changes faster. For further details see
Section 9.
-
13:
– Integer
Input
-
On entry: the maximum size of the table to be computed.
Constraint:
product of the levels of the factors included in the tabulation.
-
14:
– Integer *
Input/Output
-
On entry: if
,
ncells must be unchanged from the previous call to
g11bac, otherwise
ncells need not be set.
On exit: the number of cells in the table.
-
15:
– Integer *
Output
-
On exit: the number of factors defining the table.
-
16:
– Integer
Output
-
On exit: the first
ndim elements contain the number of levels for the factors defining the table.
-
17:
– Integer
Input/Output
-
On entry: if
,
count must be unchanged from the previous call to
g11bac, otherwise
count need not be set.
On exit: a table containing the number of observations contributing to each cell of the table, stored identically to
table. Note if
this is the same as is returned in
table.
-
18:
– double
Input/Output
-
On entry: if
,
comm_ar must be unchanged from the previous call to
g11bac, otherwise
comm_ar need not be set.
On exit: if
or
, the first
ncells values hold the table containing the sum of the weights for the observations contributing to each cell, stored identically to
table. If
, then the second set of
ncells values hold the table of cell means. Otherwise
comm_ar is not referenced.
-
19:
– NagError *
Input/Output
-
The NAG error argument (see
Section 7 in the Introduction to the NAG Library CL Interface).
6
Error Indicators and Warnings
- NE_2_INT_ARG_LT
-
On entry, while . These arguments must satisfy .
- NE_2_INT_ARRAY_CONS
-
On entry, while .
Constraint: if , for .
- NE_2D_1D_INT_ARRAYS_CONS
-
On entry, while .
Constraint: , for and .
- NE_2D_INT_ARRAY_CONS
-
On entry, .
Constraint: , for and .
- NE_ALLOC_FAIL
-
Dynamic memory allocation failed.
- NE_BAD_PARAM
-
On entry, argument
stat had an illegal value.
On entry, argument
update had an illegal value.
On entry, argument
weight had an illegal value.
- NE_G11BA_CHANGED
-
and at least one of
ncells,
table,
comm_ar or
count have been changed since previous call to
g11bac.
- NE_INT_ARG_LT
-
On entry, .
Constraint: .
On entry, .
Constraint: .
- NE_INTERNAL_ERROR
-
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact
NAG for assistance.
- NE_MAXT
-
The maximum size of the table to be computed,
maxt is too small.
- NE_REAL_ARRAY_CONS
-
On entry, .
Constraint: if or , .
- NE_VAR_DIV
-
and the divisor for the variance .
- NE_WT_ARGS
-
The
wt array argument must not be
NULL when the
weight argument indicates weights.
7
Accuracy
Only applicable when
. In this case a one pass algorithm is used as described by
West (1979).
8
Parallelism and Performance
g11bac is not threaded in any implementation.
The tables created by
g11bac and stored in
table,
count and, depending on
stat, also in
comm_ar are stored in the following way. Let there be
factors defining the table with factor
having
levels, then the cell defined by the levels
,
of the factors is stored in
th cell given by:
where
, for
and
.
10
Example
The data, given by
John and Quenouille (1977), is for a 3 by 6 factorial experiment in 3 blocks of 18 units. The data is input in the order: blocks, factor with 3 levels, factor with 6 levels, yield. The 3 by 6 table of treatment means for yield over blocks is computed and printed.
10.1
Program Text
10.2
Program Data
10.3
Program Results