Note: the interface to this routine has changed since earlier releases of the toolbox:
At Mark 24: |
weight was removed from the interface; wt was made optional |
A dataset may include both classification variables and general variables. The classification variables, known as factors, take a small number of values known as levels. For example, the factor sex would have the levels male and female. These can be coded as
and
respectively. Given several factors, a multi-way table can be constructed such that each cell of the table represents one level from each factor. For example, the two factors sex and habitat, habitat having three levels (inner-city, suburban and rural) define the
contingency table
Sex |
Habitat |
|
Inner-city |
Suburban |
Rural |
Male |
|
|
|
Female |
|
|
|
For each cell statistics can be computed. If a third variable in the dataset was age then for each cell the median age could be computed:
Sex |
Habitat |
|
Inner-city |
Suburban |
Rural |
Male |
24 |
31 |
37 |
Female |
21.5 |
28.5 |
33 |
That is, the median age for all observations for males living in rural areas is
, the median being the 50% quantile. Other quantiles can also be computed: the
percent quantile or percentile,
, is the estimate of the value such that
percent of observations are less than
. This is calculated in two different ways depending on whether the tabulated variable is continuous or discrete. Let there be
values in a cell and let
,
be the values for that cell sorted into ascending order. Also, associated with each value there is a weight,
,
, which could represent the observed frequency for that value, with
and
. For the
percentile let
and
, then the percentiles for the two cases are as given below.
If the variable is discrete, that is, it takes only a limited number of (usually integer) values, then the percentile is defined as
If the data is continuous then the quantiles are estimated by linear interpolation.
where
.
Not applicable.
The tables created by
nag_contab_tabulate_percentile (g11bb) and stored in
table and
icount are stored in the following way. Let there be
factors defining the table with factor
having
levels, then the cell defined by the levels
,
of the factors is stored in the
th cell given by:
where
, for
and
.
The data, given by
John and Quenouille (1977), is for a
factorial experiment in
blocks of
units. The data is input in the order, blocks, factor with
levels, factor with
levels, yield, and the
table of treatment medians for yield over blocks is computed and printed.
function g11bb_example
fprintf('g11bb example results\n\n');
ifac = [int64(1),1,1; 1,2,1; 1,3,1; 1,1,2; 1,2,2; 1,3,2;
1,1,3; 1,2,3; 1,3,3; 1,1,4; 1,2,4; 1,3,4;
1,1,5; 1,2,5; 1,3,5; 1,1,6; 1,2,6; 1,3,6;
2,1,1; 2,2,1; 2,3,1; 2,1,2; 2,2,2; 2,3,2;
2,1,3; 2,2,3; 2,3,3; 2,1,4; 2,2,4; 2,3,4;
2,1,5; 2,2,5; 2,3,5; 2,1,6; 2,2,6; 2,3,6;
3,1,1; 3,2,1; 3,3,1; 3,1,2; 3,2,2; 3,3,2;
3,1,3; 3,2,3; 3,3,3; 3,1,4; 3,2,4; 3,3,4;
3,1,5; 3,2,5; 3,3,5; 3,1,6; 3,2,6; 3,3,6];
y = [ 274; 361; 253; 325; 317; 339;
326; 402; 336; 379; 345; 361;
352; 334; 318; 339; 393; 358;
350; 340; 203; 397; 356; 298;
382; 376; 355; 418; 387; 379;
432; 339; 293; 322; 417; 342;
82; 297; 133; 306; 352; 361;
220; 333; 270; 388; 379; 274;
336; 307; 266; 389; 333; 353];
lfac = [int64(3); 3; 6];
isf = [int64(0); 1; 1];
maxt = prod(lfac(isf~=0));
maxt = int64(maxt);
typ = 'C';
percnt = 50;
[table, ncells, ndim, idim, icount, ifail] = ...
g11bb( ...
typ, isf, lfac, ifac, percnt, y, maxt);
fprintf(' Table for %4dth percentile\n\n', percnt);
ncol = idim(ndim);
nrow = ncells/ncol;
table = transpose(reshape(table,[ncol,nrow]));
icount = transpose(reshape(icount,[ncol,nrow]));
for i = 1:nrow
row = [table(i,:); double(icount(i,:))];
fprintf('%8.2f(%2d)', row);
fprintf('\n');
end