PDF version (NAG web site
, 64-bit version, 64-bit version)
NAG Toolbox: nag_stat_plot_stem_leaf (g01ar)
Purpose
nag_stat_plot_stem_leaf (g01ar) produces a stem and leaf display for a single sample of observations.
Syntax
[
unit,
lines,
ifail,
plot,
sorty] = g01ar(
y,
nstepx,
nstepy, 'range',
range, 'prt',
prt, 'n',
n, 'unit',
unit)
[
unit,
lines,
ifail,
plot,
sorty] = nag_stat_plot_stem_leaf(
y,
nstepx,
nstepy, 'range',
range, 'prt',
prt, 'n',
n, 'unit',
unit)
Note: the interface to this routine has changed since earlier releases of the toolbox:
At Mark 23: |
range was made optional (default 'E'); prt was made optional (default 'P'); unit was made optional (default 0); output parameters were reordered |
Description
nag_stat_plot_stem_leaf (g01ar) produces a stem and leaf display for a single sample of observations. The stem and leaf display shows data values separated into the form of a ‘stem’ and a ‘leaf’. For example, a value of could be represented as where the stem is and the leaf is . The data is scaled using a value known as the ‘leaf digit unit’. In the above example the leaf digit unit would be .
The following example illustrates a stem and leaf display.
For the
observations:
the stem and leaf display is:
1 1 8
3 1 99
5 2 00
5 2 111
2 2
2 2 3
1 2 4
where the leaf digit unit is
so that
represents
(i.e.,
). The leaf digit unit distinguishes between the numbers
,
,
, etc. which may otherwise all be represented by
.
Included in the above display is an initial column specifying the cumulative count of values, up to and including that particular line, from either the top or bottom of the display, whichever is smaller. An exception to this is when the line on which the median lies is reached, in which case the actual count of values on that line is displayed, rather than a cumulative count, and this is highlighted by enclosing the count in parentheses. In this case the median is and thus falls between the two lines at which the cumulative count has reached where is the number of observations.
Some of the other features of the stem and leaf display are illustrated by the following two examples.
the stem and leaf display may be:
1 1. 9
1 1*
1 -0.
3 -0* 13
15 +0* 012233344444
15 +0. 55556667788
5 1* 011
2 1. 3
1 2
1 2.
1 3 1
In the above display all the data are plotted and the leaf digit unit is
. Also in this display different leaves, that is different digits, may be plotted on a particular line. In this case we have
possible digits per line, that is
lines per stem, and these are represented as follows:
- * indicates that the line may contain the digits to 4;
- . indicates that the line may contain the digits to .
Alternatively the stem and leaf display may look like:
LO -19
2 -0* 3
3 +0T 1
5 +0* 01
10 +0T 22333
( 9) +0F 444445555
11 +0* 66677
6 +0T 8
5 1* 011
2 1T 3
HI 31
Again the leaf digit unit is but in this display just the data between the fences, which are the hinges the inter-hinge range, are plotted. Any data points that fall outside the fences are presented separately in the display under the headings LO for those points below the lower fence and HI for those points above the upper fence.
Again in this display different leaves, that is different digits, may be plotted on a particular line. However in this case we have
possible digits per line, that is
lines per stem, and these are represented as follows
- * indicates that the line may contain the digits or ;
- T indicates that the line may contain the digits or ;
- F indicates that the line may contain the digits or ;
- S indicates that the line may contain the digits or ;
- . indicates that the line may contain the digits or .
A display may also allow different digits ( to ) per line, that is line per stem, or just digit per line, that is lines per stem, as in the first of the three examples above.
Note that the median here is . This falls between two lines in the first display but is highlighted on the second display since it lies on a particular line.
Finally if there are positive and negative numbers on the display these are highlighted by a or sign where the distinction is required, that is near the zero-point.
If there are too many leaves to fit in the plot width allowed, nag_stat_plot_stem_leaf (g01ar) plots as many leaves as possible and places an asterisk to the right to indicate that some leaves are not displayed. If this occurs and you wish to be able to plot all the leaves then the width of the plot may be adjusted.
Options also allow the leaf unit and the height of the display to be specified by you or calculated by
nag_stat_plot_stem_leaf (g01ar). These arguments may be used to control the type of the display you wish to obtain. Fixing the unit and changing the height of the display may alter the number of lines used per stem, that is the number of different digits per line.
nag_stat_plot_stem_leaf (g01ar) will choose a display for the fixed unit that attempts to make as much use of the available height as possible, thus increasing the height may allow for more lines per stem whereas decreasing the height may force the display to use fewer lines per stem. Similarly you may wish to fix the height and vary the leaf digit unit used on the display. See
Further Comments for further details.
The display is returned in a character array with the option of printing the display.
References
Erickson B H and Nosanchuk T A (1985) Understanding Data Open University Press, Milton Keynes
Tukey J W (1977) Exploratory Data Analysis Addison–Wesley
Velleman P F and Hoaglin D C (1981) Applications, Basics, and Computing of Exploratory Data Analysis Duxbury Press, Boston, MA
Parameters
Compulsory Input Parameters
- 1:
– double array
-
The observations.
- 2:
– int64int32nag_int scalar
-
The number of character positions to be plotted horizontally.
Constraint:
.
- 3:
– int64int32nag_int scalar
-
The maximum number of character positions to be plotted vertically.
If a suitable value will be used by nag_stat_plot_stem_leaf (g01ar) for the number of character positions to be plotted vertically. This will clearly be less than or equal to the value of ldplot.
Constraint:
or .
Optional Input Parameters
- 1:
– string (length ≥ 1)
Default:
Indicates whether you wish to scale the plot to the extremes of the data or to the fences.
- The display is a plot to the extremes, that is a plot of all the data.
- The display is a plot of the data between the fences.
Constraint:
or .
- 2:
– string (length ≥ 1)
Default:
Indicates whether the stem and leaf display is to be output to an external file.
- The display is not output to an external file.
- The display is output to the current advisory message unit as defined by nag_file_set_unit_advisory (x04ab). Only the first characters of each line are actually printed.
Constraint:
or .
- 3:
– int64int32nag_int scalar
-
Default:
the dimension of the array
y.
, the number of observations.
Constraint:
.
- 4:
– double scalar
Default:
Indicates the leaf digit unit to be used.
If and is not a power of ten, it will be converted to the nearest power of ten below the input value for unit.
If , the optimum unit will be used. This is based on the range of the data to be plotted and the number of lines available for the display.
Output Parameters
- 1:
– double scalar
Default:
Contains the actual unit used in the stem and leaf display.
- 2:
– int64int32nag_int scalar
-
The actual number of lines needed for the display.
- 3:
– int64int32nag_int scalar
unless the function detects an error (see
Error Indicators and Warnings).
- 4:
– cell array of strings
-
The stem and leaf display.
- 5:
– double array
-
The observations sorted into ascending order.
Error Indicators and Warnings
Errors or warnings detected by the function:
-
-
On entry, | , |
or | , |
or | , |
or | , |
or | . |
-
-
On entry, | or , |
or | or . |
-
-
The number of lines needed to produce the display exceeds the maximum number of lines allowed. You may wish to increase
nstepy.
-
-
One of the observations is too large and causes a value to exceed the maximum integer allowed.
-
An unexpected error has been triggered by this routine. Please
contact
NAG.
-
Your licence key may have expired or may not have been installed correctly.
-
Dynamic memory allocation failed.
Accuracy
Accuracy is limited by the number of significant figures that may be represented on the display which will depend on the data, the number of lines available and the unit used.
Further Comments
nag_stat_plot_stem_leaf (g01ar) uses integer representations of the data. If very large data values are being used they should be scaled before using this function. The largest integer can be found by calling
nag_machine_integer_max (x02bb).
If an asterisk is plotted at the end of a line to indicate that some leaves are not displayed you should increase
nstepx if they wish to be able to print the rest of the leaves on that line.
Note that if you request
nag_stat_plot_stem_leaf (g01ar) to print the plot only the first
characters of each line are printed. The full plot is stored in the array
plot so you do have the option of printing a plot which has more than
characters on a line.
When the leaf digit unit is set, the number of lines per stem is decided as follows:
Let
be the range of the data to be plotted:
- = largest observation – smallest observation: if all the data to both extremes are to be plotted (that is if ),
- = upper fence – lower fence: if only the data between the fences are to be plotted (that is if ).
Let
be the number of lines available for the plot:
- if ,
- if .
- The lines are subtracted to allow space for the display headings. If only the data between the fences are to be plotted then must be further reduced to allow space to present those values outside the fences. This will involve a minimum of another lines.
Let
,
- then the number of lines per stem is:
The time taken by the function increases with .
Example
A program to produce two stem and leaf displays for a sample of observations. The first illustrates a plot produced automatically by nag_stat_plot_stem_leaf (g01ar) and the second shows how to print the display under your control.
Open in the MATLAB editor:
g01ar_example
function g01ar_example
fprintf('g01ar example results\n\n');
y = [31; 1; 2; 3; 4; 5; 6; 7; 8; -9;
1; 2; 3; 4; 5; 6; 7; 8;
2; 3; 4; 5; 6; 7;
3; 4; 5; 6;
4; 5];
nstepx = int64(72);
nstepy = int64(20);
[unit, lines, ifail, plot, sorty] = ...
g01ar( ...
y, nstepx, nstepy, 'range', 'Fences');
[unit, lines, ifail, plot, sorty] = ...
g01ar( ...
y, nstepx, nstepy, 'range', 'Extremes', 'prt', 'Noprint');
fprintf('\n');
for i = 1:lines
fprintf('%s\n', char(plot(i,1:nstepx)));
end
g01ar example results
Stem-and-leaf display
Leaf digit unit = 1.0
1 2 represents 12.
LO -9
3 0 11
6 0 222
10 0 3333
15 0 44444
15 0 55555
10 0 6666
6 0 777
3 0 88
HI 31
Stem-and-leaf display
Leaf digit unit = 1.0
1 2 represents 12.
1 -0. 9
1 -0*
15 +0* 11222333344444
15 +0. 55555666677788
1 1*
1 1.
1 2*
1 2.
1 3* 1
PDF version (NAG web site
, 64-bit version, 64-bit version)
© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015